CA1338601C - Relational database representation with relational database operation capability - Google Patents
Relational database representation with relational database operation capabilityInfo
- Publication number
- CA1338601C CA1338601C CA000579597A CA579597A CA1338601C CA 1338601 C CA1338601 C CA 1338601C CA 000579597 A CA000579597 A CA 000579597A CA 579597 A CA579597 A CA 579597A CA 1338601 C CA1338601 C CA 1338601C
- Authority
- CA
- Canada
- Prior art keywords
- column
- relation
- vector
- value
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An apparatus and/or method utilizing a computer, for creating a relational database, the relational database contains a plurality of relations and each of the relations contains one or more columns and rows. A
column has one or more values, which all have a common characteristic. Each value of the column corresponds to one of the rows of the relation. Each row contain some or more values in each value is from a different column.
The values in each row have one or more characteristics.
The creation of a relational database occurs in three steps. First, for each characteristic of the relational database, a set containing a plurality of unique values is formed. Second, for each relation of the relational database, one or more subsets of each set containing unique values of the relation is formed. Each of the subsets contains one or more of the unique values of one of the sets. Third, the relations of the relational database are formed. More particularly, for each subset associated with a particular relation, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more rows of the relation.
column has one or more values, which all have a common characteristic. Each value of the column corresponds to one of the rows of the relation. Each row contain some or more values in each value is from a different column.
The values in each row have one or more characteristics.
The creation of a relational database occurs in three steps. First, for each characteristic of the relational database, a set containing a plurality of unique values is formed. Second, for each relation of the relational database, one or more subsets of each set containing unique values of the relation is formed. Each of the subsets contains one or more of the unique values of one of the sets. Third, the relations of the relational database are formed. More particularly, for each subset associated with a particular relation, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more rows of the relation.
Description
13386Ql N104 :19675-CANADA -l-A RELATIONAL DATABASE ~;~ ATION
WITH RELATIONAL DATABASE OPER~ATION CAPABILITY
Back~rn~n~ of the Inven~ i on Fleld o the Tnvent i nn Thl3 invention relates to a computer apparatu3 and/or method for creating a relational database. This invention also relatQs to a computer apparatus and/or method for efEiciently processing relational operations on one or more relations.
Rrior Art A database system is basically a computerized record-keeping machine -- that is, a system whose overall purpose i8 to maintain data and to make that data available. The data maintained by the system cap bQ any data deemed to be o~ significance. The basic purpose o~ a data~ase system is to assist in the process o~ acces3ing data.
Almost all database systems developed over the past few years are called relationQl type databases. The concept is described in Date, An Introduation to ~at~h~ce Syst~c, 4th Edition, 19-20 (1986). "A
relational database is a database that is perceived by users as a collection oE tables and nothing but tables . "
I~L. at 96. The relational database system enables the user to generate new tables Erom old tables and to ~338501 eXtract and combine subGets of information from one or more tables.
Although the current database systems are user friendly and efficient for maintaining and manipulating relatively small quantities of data, they are not amenable to larger data requirement6 such as for large company employee records, government agency statistLcs, etc. Typically, such relational databases require an enormous amount of memory to be maintained by the underlying systems because the 6mallest data value representation is in the form of a "byte".
Additionally, due to complex indexing structures, a large amount o~ overhead software is necessary for maintaining the relational database. As a result, data manipulation is inefficient, and additional memory must be allocated to accommodate the software. ~ost importantly, unique data values may be stored redundantly by the system. The redundant storage of values causes even more unnecessary memory allocation.
WITH RELATIONAL DATABASE OPER~ATION CAPABILITY
Back~rn~n~ of the Inven~ i on Fleld o the Tnvent i nn Thl3 invention relates to a computer apparatu3 and/or method for creating a relational database. This invention also relatQs to a computer apparatus and/or method for efEiciently processing relational operations on one or more relations.
Rrior Art A database system is basically a computerized record-keeping machine -- that is, a system whose overall purpose i8 to maintain data and to make that data available. The data maintained by the system cap bQ any data deemed to be o~ significance. The basic purpose o~ a data~ase system is to assist in the process o~ acces3ing data.
Almost all database systems developed over the past few years are called relationQl type databases. The concept is described in Date, An Introduation to ~at~h~ce Syst~c, 4th Edition, 19-20 (1986). "A
relational database is a database that is perceived by users as a collection oE tables and nothing but tables . "
I~L. at 96. The relational database system enables the user to generate new tables Erom old tables and to ~338501 eXtract and combine subGets of information from one or more tables.
Although the current database systems are user friendly and efficient for maintaining and manipulating relatively small quantities of data, they are not amenable to larger data requirement6 such as for large company employee records, government agency statistLcs, etc. Typically, such relational databases require an enormous amount of memory to be maintained by the underlying systems because the 6mallest data value representation is in the form of a "byte".
Additionally, due to complex indexing structures, a large amount o~ overhead software is necessary for maintaining the relational database. As a result, data manipulation is inefficient, and additional memory must be allocated to accommodate the software. ~ost importantly, unique data values may be stored redundantly by the system. The redundant storage of values causes even more unnecessary memory allocation.
2 o Consequently, current relational databases are not cost effective and they do not lend themselves to efficient manipulation of data, particularly when data requirements are large. ~
, . , S ry Q~ the InYention - -The present invention introduces a computer apparatus and/or method for efficiently accessing, representing and manipulating data in a relational database. ~$ore particularly, the present invention introduces a computer apparatus and/or method for crcating and representing a relational database and an apparatus and/or method for efficiently processing relational operations on one or more relations.
Briefly, one aspect of the preferred embodiment of ~s 'che preseDe 1nvest~on ~DClUd~S ~D pp-r~eU5 ~Dd/Or 133~01 method, utili~ing a computer, for creating a relational database. The relational database contain6 a plurality of relations and each of the relations contains one or more columns and rows. ~ column has one or more values, which all have a common characteristic. Each value in ~he column corresponds to one of the rows of the relation. Each row contains one or more values, and each value is from a different column. The values in each row thus have one or more characteristics. The creation of a relational database occurs in three steps.
First, for each characteristic of the relational , ~ database, a set, called a "value set" or "domain" i8 formed containing all the values entered in the data base having the same characteristic. Second, for each 1~ relation (table) of the relational database, one or more subsets of each value set containing values unique to the particular relation is formed. Each of the subsets contains one or more of the unique values of one of the value sets. Third, the relations (tables) of the relational database are formed. More particularly, for each subset assSociated with a particular relatlon, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more of the rows of the relation.
Typically, the first step, discussed above, involves an additional step of forming the set of unique values in some desired order of occurrence (e.g., numerical, lexical, or order of entry into the system).
The second step, discussed above, may involve a step of representing a subset in the form of an entity select vector of binary bits ,' where each of the binary bits of the binary bit vector has an order of occurrence which corresponds to the order of occurrence of the uniqu~
values in the set. The binary bits of the bit vector .
:: ~
represent the pre6ence or absence of each of the unique values in the subset, l.e., the vector selects which entities or values in the value set are in the associated subset forming the column of the relation.
The thlrd 6tep, discussed above, may also involve the additional step of forming a binary representation of each relation in the relational database. In this step, for each subset (column) associated with one o~
the relations (tables) of the relational databa6e, a vector of binary bits is created for each unique value in the subset. Each binary bit of this binary bit vector corresponds to one of the rows of the column, and the binary bits can be formed in an order of occurrence corresponding to the order of occurrence of the rows in the column or in some other appropriate order. Al60, each binary bit represents the presence or absence of each unique value in one of the rows of the column.
Significantly, by forming subsets o~ each set of unique values associated with a particular relation, all relational operations are ~acllitate~ in terms of speed and total functionality. In addition, a substantial savings in memory occurs. This savings occurs because an entire value set, which has one or more values in the relation, need not be directly employed within the relation. Instead, only the entity select vector for each column of a relation need be referenced. As a result, all unique values-within a domain are stored in conventional format only once in memory from which any number of subse~s may be formed. Additionally, by forming binary bit vector representations of each subset associated with each relation, and by ~orming binary bit vector representations of each relation, an even further savings in memory occurs. The binary bit vector representations are also compressible.
,, : ~ -~~ -5- 1338601 Most importantly, the binary representation of the sets and relations creates a content-addressable system. In the past, content addressability has been implemented as a very expensive memory hardware solution, or as a computationally intensive inverted tree approach. These approaches have allowed processing to occur, but in a conventional and inefficient way. The present invention, on the other hand, is the first implementation that uses conventional memory, but imposes an efficient and unique processing solution by representing the relations in a binary bit vector representation .
The present invention also has the capability of efficiently performing database operations on one or more relations. Binary bit vector represented relations need not be decompressed in order to perform database operations. Computer time is saved by not having to decompress the data, and processing is made more efficient, because only a relatively small amount of compressed data needs to be considered while performing the relational operations. In the present invention, the relational operations are reduced to operations on bit strings. This aspect of the present invention involves a method and/or apparatus using a computer, ~or efficiently performing a relational operation on one or more relations of a relational database in order to produce a binary representation of a resùlting relation. i~
TypicaLly, the relational operations performed are in the form of "SELECT, ~' "JOIN~' ("EQurJoIN" or "NATURAL
JoINll) and "PROJECT. ~' -In the preferred embodiment, the SELECr operation r determlnes which one or more rows of a relation .
-6- 1338~01 corresponds to one or more selected unique values, The 6electLon process occurs in two steps. First, the binary bit vector(s) associated with the one or more selected unique values are retrieved. (Each binary bit of the binary bit vector lndicates the one or more`rows - of the binary represented relation which contain the selected unique value). Second, when there is more than one selected unique value, a Boolean OR operation i~
performed on the retrieved binary bit vectors to - 10 determine the binary representation o~ a resultant relation. More particularly, the resultant relation is represented by a binary bit vector which indicates the one or more rows of the relation which contain the i 6elected unique values from one column of the relation.
The quantity of the rows of the relation which contain the one or mora selected unique value3 can be quickly determined by counting the binary bits of the resultant relation which indicate the presence of the selected values .
The preferred ~mhASir-nt also has the capability of performing SELECT on more than one selected unique value which occur in more than one column of the relation.
The selection process occur~ in three steps. First, for each column, the binary bit vectors associated with the selected unique values are retrieved. Each binary bit of the binary bit vector indicates a row of the binary represented column which contains the selected unique values. Second, for each column, when there is more than one selected unique value from the column, a Boolean OR operation is performed on the retrieved binary bit vectors to determine a SELECT binary bit vector indicating the 'one or more rows of the relation which contain the selected unique values. This bit vector characterize~ a subset of the rows of the original relation . Third, a soolean operation (i . e., _7_ 1338601 AND, OR, etc. ) is performed on the SELECT binary bit vectors to determine a resultant relation.
In the preferred embodiment of the present invention, the 6tep for performing a relational operation al60 may include the step for performing a - JOIN on one or more rows of a first binary reprefiented relation with one or more rows of a second binary represented relation. More particularly, the JOIN
operation may be specif ied to occur only when a column of the first relation contains values having a particular characteristic (i. e., belonging to particular domain) and a column of the second relation contains values having the 6ame characteristic, which bear a desired relationship with values from the column of the first relation. That is, they are greater than, less than, equal to, equal to or greater than, equal to or less than the corre6ponding values of the first relation. (The second relation need not be distinct from the first. ) , 20 In the preferred embodiment, performing a PROJECT
operation is also included. The PROJECT operation oomprises generating the binary bit vectors associated with any subset of columns of a particular relation.
The columns selected can ~e reconstructed for a user to interpret.
Any relational operation which creates a second resultant relation can be implemented with the present invention. Although only SELECT, PRO,JECT and JOIN are thoroughly discussed, one skilled in the art of database 6ystems understands that there are comparatively few practical pro~lems that cannot be solved by SELECT, PROJECT and JOIN alone'. Additionally, by following the strategy set forth in the specification for implementing SELECT, PROJECT and JoIN, one skilled in the art of relational databases will see that the other relational operations such as PRODUCT, UNION and ~lrrri~incri can be implemented by the present invention as well. Lastly, functions such as INSERT and DELETE for updating rows o a ~inary represented relation and for updating unique value6 5 to sets of unique values, are also lmplemented by the present invention.
The present invention significantly enhances a com-puter ' s capability of processing relational operations by performing the relational operations directly at the binary 10 bit level. Additionally, the Boolean operations can also be performed more efriciently by a special Boolean Logic Unit .
The aspects of the present invention, aiscu~sed above, together create a relational database ~ystem which can more 15 efficiently access, represent and r~-nir ll~te data than any oth~r r-l~tion~ t~b~e sy-te~ pr~erltly ~v~ blo.
;:
. .
.
9 13386~1 Brief DescriPtion FIG. 1 depicts a computer system e~uLpped with a Relational Database Management System (RDMS) ~or creating a binary representation of a relational database and/or for efficiently processing relational operations on the one or more ~1inary represented relations in accordance with the present inVention;
FIG. lA depicts the RDMS on which software programs and/or hardware are performed for representing the relations and for performing relational operations;
FIG. 2 represents a typical relational database provided by a user of the system depicted in FIG. 1;
FIG. 3A represents an ordered set S where the ordering is a lexical ordering;
FIG. 3B represents an (ordered) subset of the ordered set represented in FIG. 3A;
FIG. 3C represents an empty or null set A;
FIG. 3D depicts a binary bit vector expression a for characterizing the subset depicted in FIG. 3B;
FIG. 3E is a binary bit vector a' for characterizing the ordered set shown in FIG. 3A;
FIG. 3F depicts a binary bit vector representation ` of a null subset a" shown in FIG. 3C;
FIG. 4 depicts a binary representation of the Suppliers relation as shown in FIG. 2;
FIG. 5 depicts a binary representation of the Parts relation as shown in FIG. 2;
FIG. 6 depicts a 'oinary representation of the Shipments relation as shown in FIG. 2;
FIG. 7 is a flow diagram o~ the BINARY
REPRESENTATION routine;
FIGS. 8A and 8B are a table depicting the results of executing a series of commands for setting up the columns oî the relational database shown in FIG. 2;
-lO- 1338601 FIGS. ~A, 9B and 9C are a table depicting the results of constructing the binary reprcsentatLon of the Suppliers re:Lation (FIG. 2) by the BINARY Rr:~RE~ ATION
routine;
FIG. 10A is a flow diagram of the INSERT routine:
FIG. 10B is a flow diagram of the INSERT VALUE INT0 VALUE SET routine;
FIG. 10C is a flow diagram of the UPDATE SUBSET
routine;
FIG. 10D is a flow diagram of the ADD VALUE TO
COLUMN routine;
FIG. llA is a flow diagram of the DELETE routlne;
FIG. llB is a flow diagram of the DELETE VALUE FROM
COLUMN routine;
FIG. llC is a flow diagram of the DELETE VALUE FROM
SUBSET routine:
FIG. llD is a flow diagram of the DELETE VALUE F~OM
VALUE SET routine;
FIG. 12 is a results table depicting the operations performed by the INSERT routine (FIG. 10A);
FIG. 13 is a table depicting the results of the operations performed by the DELETE routine (FIG. llA~;
FIG. 14 is a flow diagram of the SELECT routine;
FIGS. 15A and 15B depict a table of thc results of the operations performed by the SELECT routine for a two-column SELECT for two values;
FIGS. 16A and 16B depict a table of results of the operations performed by the SELECT routine (FIG. 14) on two columns for multiple values;
, . , S ry Q~ the InYention - -The present invention introduces a computer apparatus and/or method for efficiently accessing, representing and manipulating data in a relational database. ~$ore particularly, the present invention introduces a computer apparatus and/or method for crcating and representing a relational database and an apparatus and/or method for efficiently processing relational operations on one or more relations.
Briefly, one aspect of the preferred embodiment of ~s 'che preseDe 1nvest~on ~DClUd~S ~D pp-r~eU5 ~Dd/Or 133~01 method, utili~ing a computer, for creating a relational database. The relational database contain6 a plurality of relations and each of the relations contains one or more columns and rows. ~ column has one or more values, which all have a common characteristic. Each value in ~he column corresponds to one of the rows of the relation. Each row contains one or more values, and each value is from a different column. The values in each row thus have one or more characteristics. The creation of a relational database occurs in three steps.
First, for each characteristic of the relational , ~ database, a set, called a "value set" or "domain" i8 formed containing all the values entered in the data base having the same characteristic. Second, for each 1~ relation (table) of the relational database, one or more subsets of each value set containing values unique to the particular relation is formed. Each of the subsets contains one or more of the unique values of one of the value sets. Third, the relations (tables) of the relational database are formed. More particularly, for each subset assSociated with a particular relatlon, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more of the rows of the relation.
Typically, the first step, discussed above, involves an additional step of forming the set of unique values in some desired order of occurrence (e.g., numerical, lexical, or order of entry into the system).
The second step, discussed above, may involve a step of representing a subset in the form of an entity select vector of binary bits ,' where each of the binary bits of the binary bit vector has an order of occurrence which corresponds to the order of occurrence of the uniqu~
values in the set. The binary bits of the bit vector .
:: ~
represent the pre6ence or absence of each of the unique values in the subset, l.e., the vector selects which entities or values in the value set are in the associated subset forming the column of the relation.
The thlrd 6tep, discussed above, may also involve the additional step of forming a binary representation of each relation in the relational database. In this step, for each subset (column) associated with one o~
the relations (tables) of the relational databa6e, a vector of binary bits is created for each unique value in the subset. Each binary bit of this binary bit vector corresponds to one of the rows of the column, and the binary bits can be formed in an order of occurrence corresponding to the order of occurrence of the rows in the column or in some other appropriate order. Al60, each binary bit represents the presence or absence of each unique value in one of the rows of the column.
Significantly, by forming subsets o~ each set of unique values associated with a particular relation, all relational operations are ~acllitate~ in terms of speed and total functionality. In addition, a substantial savings in memory occurs. This savings occurs because an entire value set, which has one or more values in the relation, need not be directly employed within the relation. Instead, only the entity select vector for each column of a relation need be referenced. As a result, all unique values-within a domain are stored in conventional format only once in memory from which any number of subse~s may be formed. Additionally, by forming binary bit vector representations of each subset associated with each relation, and by ~orming binary bit vector representations of each relation, an even further savings in memory occurs. The binary bit vector representations are also compressible.
,, : ~ -~~ -5- 1338601 Most importantly, the binary representation of the sets and relations creates a content-addressable system. In the past, content addressability has been implemented as a very expensive memory hardware solution, or as a computationally intensive inverted tree approach. These approaches have allowed processing to occur, but in a conventional and inefficient way. The present invention, on the other hand, is the first implementation that uses conventional memory, but imposes an efficient and unique processing solution by representing the relations in a binary bit vector representation .
The present invention also has the capability of efficiently performing database operations on one or more relations. Binary bit vector represented relations need not be decompressed in order to perform database operations. Computer time is saved by not having to decompress the data, and processing is made more efficient, because only a relatively small amount of compressed data needs to be considered while performing the relational operations. In the present invention, the relational operations are reduced to operations on bit strings. This aspect of the present invention involves a method and/or apparatus using a computer, ~or efficiently performing a relational operation on one or more relations of a relational database in order to produce a binary representation of a resùlting relation. i~
TypicaLly, the relational operations performed are in the form of "SELECT, ~' "JOIN~' ("EQurJoIN" or "NATURAL
JoINll) and "PROJECT. ~' -In the preferred embodiment, the SELECr operation r determlnes which one or more rows of a relation .
-6- 1338~01 corresponds to one or more selected unique values, The 6electLon process occurs in two steps. First, the binary bit vector(s) associated with the one or more selected unique values are retrieved. (Each binary bit of the binary bit vector lndicates the one or more`rows - of the binary represented relation which contain the selected unique value). Second, when there is more than one selected unique value, a Boolean OR operation i~
performed on the retrieved binary bit vectors to - 10 determine the binary representation o~ a resultant relation. More particularly, the resultant relation is represented by a binary bit vector which indicates the one or more rows of the relation which contain the i 6elected unique values from one column of the relation.
The quantity of the rows of the relation which contain the one or mora selected unique value3 can be quickly determined by counting the binary bits of the resultant relation which indicate the presence of the selected values .
The preferred ~mhASir-nt also has the capability of performing SELECT on more than one selected unique value which occur in more than one column of the relation.
The selection process occur~ in three steps. First, for each column, the binary bit vectors associated with the selected unique values are retrieved. Each binary bit of the binary bit vector indicates a row of the binary represented column which contains the selected unique values. Second, for each column, when there is more than one selected unique value from the column, a Boolean OR operation is performed on the retrieved binary bit vectors to determine a SELECT binary bit vector indicating the 'one or more rows of the relation which contain the selected unique values. This bit vector characterize~ a subset of the rows of the original relation . Third, a soolean operation (i . e., _7_ 1338601 AND, OR, etc. ) is performed on the SELECT binary bit vectors to determine a resultant relation.
In the preferred embodiment of the present invention, the 6tep for performing a relational operation al60 may include the step for performing a - JOIN on one or more rows of a first binary reprefiented relation with one or more rows of a second binary represented relation. More particularly, the JOIN
operation may be specif ied to occur only when a column of the first relation contains values having a particular characteristic (i. e., belonging to particular domain) and a column of the second relation contains values having the 6ame characteristic, which bear a desired relationship with values from the column of the first relation. That is, they are greater than, less than, equal to, equal to or greater than, equal to or less than the corre6ponding values of the first relation. (The second relation need not be distinct from the first. ) , 20 In the preferred embodiment, performing a PROJECT
operation is also included. The PROJECT operation oomprises generating the binary bit vectors associated with any subset of columns of a particular relation.
The columns selected can ~e reconstructed for a user to interpret.
Any relational operation which creates a second resultant relation can be implemented with the present invention. Although only SELECT, PRO,JECT and JOIN are thoroughly discussed, one skilled in the art of database 6ystems understands that there are comparatively few practical pro~lems that cannot be solved by SELECT, PROJECT and JOIN alone'. Additionally, by following the strategy set forth in the specification for implementing SELECT, PROJECT and JoIN, one skilled in the art of relational databases will see that the other relational operations such as PRODUCT, UNION and ~lrrri~incri can be implemented by the present invention as well. Lastly, functions such as INSERT and DELETE for updating rows o a ~inary represented relation and for updating unique value6 5 to sets of unique values, are also lmplemented by the present invention.
The present invention significantly enhances a com-puter ' s capability of processing relational operations by performing the relational operations directly at the binary 10 bit level. Additionally, the Boolean operations can also be performed more efriciently by a special Boolean Logic Unit .
The aspects of the present invention, aiscu~sed above, together create a relational database ~ystem which can more 15 efficiently access, represent and r~-nir ll~te data than any oth~r r-l~tion~ t~b~e sy-te~ pr~erltly ~v~ blo.
;:
. .
.
9 13386~1 Brief DescriPtion FIG. 1 depicts a computer system e~uLpped with a Relational Database Management System (RDMS) ~or creating a binary representation of a relational database and/or for efficiently processing relational operations on the one or more ~1inary represented relations in accordance with the present inVention;
FIG. lA depicts the RDMS on which software programs and/or hardware are performed for representing the relations and for performing relational operations;
FIG. 2 represents a typical relational database provided by a user of the system depicted in FIG. 1;
FIG. 3A represents an ordered set S where the ordering is a lexical ordering;
FIG. 3B represents an (ordered) subset of the ordered set represented in FIG. 3A;
FIG. 3C represents an empty or null set A;
FIG. 3D depicts a binary bit vector expression a for characterizing the subset depicted in FIG. 3B;
FIG. 3E is a binary bit vector a' for characterizing the ordered set shown in FIG. 3A;
FIG. 3F depicts a binary bit vector representation ` of a null subset a" shown in FIG. 3C;
FIG. 4 depicts a binary representation of the Suppliers relation as shown in FIG. 2;
FIG. 5 depicts a binary representation of the Parts relation as shown in FIG. 2;
FIG. 6 depicts a 'oinary representation of the Shipments relation as shown in FIG. 2;
FIG. 7 is a flow diagram o~ the BINARY
REPRESENTATION routine;
FIGS. 8A and 8B are a table depicting the results of executing a series of commands for setting up the columns oî the relational database shown in FIG. 2;
-lO- 1338601 FIGS. ~A, 9B and 9C are a table depicting the results of constructing the binary reprcsentatLon of the Suppliers re:Lation (FIG. 2) by the BINARY Rr:~RE~ ATION
routine;
FIG. 10A is a flow diagram of the INSERT routine:
FIG. 10B is a flow diagram of the INSERT VALUE INT0 VALUE SET routine;
FIG. 10C is a flow diagram of the UPDATE SUBSET
routine;
FIG. 10D is a flow diagram of the ADD VALUE TO
COLUMN routine;
FIG. llA is a flow diagram of the DELETE routlne;
FIG. llB is a flow diagram of the DELETE VALUE FROM
COLUMN routine;
FIG. llC is a flow diagram of the DELETE VALUE FROM
SUBSET routine:
FIG. llD is a flow diagram of the DELETE VALUE F~OM
VALUE SET routine;
FIG. 12 is a results table depicting the operations performed by the INSERT routine (FIG. 10A);
FIG. 13 is a table depicting the results of the operations performed by the DELETE routine (FIG. llA~;
FIG. 14 is a flow diagram of the SELECT routine;
FIGS. 15A and 15B depict a table of thc results of the operations performed by the SELECT routine for a two-column SELECT for two values;
FIGS. 16A and 16B depict a table of results of the operations performed by the SELECT routine (FIG. 14) on two columns for multiple values;
3~ FIG. 17A is a flow diagram of the PROJECT routine;
FIG, 17B is a flow diagram of the DISPLAY/RECONSTRUCT routine;
FIGS. 18A, 18B, 18C and 18D depict a table of the results of the operations performed by the PROJECT
routine (FIG. 17A);
FIG. 19 depicts a binary representation of a JOIN
relation:
FIG. 20 depicts a more detailed version of the binary representation of the Suppliers portion of the JOIN relatLon (FIG. 19);
- FIG. 21 depicts a more detailed version of the Parts relation portion of the JOIN relation. (FIG. 19);
FIG. 22A is a flow diagram of the EQUIJOIN routine;
FIG. 22B is a flow diagram of the BUILD ROW USE
SETS routine;
FIG. 22C is a flow diagram of the EVALUATE ROW USE
.~ SETS routine;
FIG. 22D is a flow diagram of the CoNb~ l JOIN
' ROW USE VECTORS routine;
FIG. 22E is a flow diagram of the PRODUCTS routine;
FIG. 22F ls a flow diagram of the NUMS routine;
FIG. 22G is a flow diagram of the GENERATE ;3IT
STRING routine;
, FIGS. 23A, 23B, and 23C represent a table of the results of the operations performed by the EQUIJOIN
routine ( FIG . 2 2A );
i FIG. 24 is a flow diagram of the GREATER THAN JOIN
; I routine;
FIG. 25A is a flow diagram of the DISPLAY/RECONSTRUCT FOR JOIN routine;
FIG. 25B is a flow diagram of the ~;~ 'NC~i RELATION routine;
FIG. 25C is a flow diagram of the REFERENCE VALUE
SET routine;
FIGS. 26A, 26B, 26C, 26D, 26E, 26F and 26G depict a table of resu~ts for the operations performed by the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A);
FIG. 27 represents a mapping of the elements of Set S into Set T; Set S is the "domain~ and Set T is the "range. "
.
FIG. 28 represents the binary representation of Suppliers (FIG. 4), with entity use vectors added;
FIG. 29 depicts a binary representation of the Suppliers portion of the JOIN relation (FIG. 20) with entity use vectors added;
- FIG. 30A is a flow block diagram of the DISPLAY/RECONSTRUCT WITH ENTITY USE VECTORS routine;
FIG. 30B is a flow block diagram of the REFERENCE
RELATION routine;
FIG. 30C is a flow block diagram of the REFERENCE
VALUE SET routine;
FIG. 31 depi..cts a relational database;
!~ j FIG. 32 depicts a SYSTEM RELATION
'' ¦ FIG. 33 depicts the domains of the relational database shown in FIG . 31 t FIG. 34 depicts the ENTITY SELECT SET associated with the SYSTEM RELATION (FIG. 32);
FI~. 35 depicts the ENTITY USE SET, which is associated with the SYSTEM RELATION (FIG. 32);
FIG. 36 depicts the ROW SELECT SET associated with the SYSTEM RELATION (FIG. 32);
FIGS. 37A, 37B, 37C, and 37D depict the ROW USE
SETS associated with the SYSTEM RELATION (FIG. 31);
FIGS. 38A, 38B, 38C, and 38D are flow block diagrams of the DATABASE IDENTIFICATION routine.
.
13386~1 TART~E OF ~ N~T,NT~
DETAILED DESCRIPTION
I. Hardware Level of the Preferred Embodiments II. A Detailed Discussion on Relational Databases III. Binary Bit-Vector Technology TV. Binary Representation of a Relational Database A. Example of Generating a Binary Representation of a Relation B. Example of Building a Binary Represented Relation V. Operations Performed on B1nary Representations of Relations A. INSERT
1. Detalled Example for the INSERT
~ Function L! 2 0 B . DELETE
'i 1. Detailed Example for the DELETE
Operation C. SELECT
1. DetaiLed Example of a Two-CoLumn SELEC~ for Two Values.
2. Detailed Example of Two Column SELECT for Multiple Values D. RECONSTRUCT
1. Detailed Example of Performing 3 0 PROJECT Operation E. JOIN
1. Bil~ary Representation of a JOIN
Relation 2.Constructlng a BinQry Representation of a JOIN
Relation 3. Detailed Bxample For Constructing a Binary - - Representation of the JOIN
Relation 4. Constructing a BINARY
REPRESENTATION of a GREATER
THAN JOIN
F. DISPLAY/RECONSTRUCT For JOIN
' Operation 1. Example of` the DISPLAY/RECONSTRUCT Operation For A JOIN Relation VI. ENTITY USE Vectors VII. Database Identification A. Performing the Database Identification Scheme . 120 . . .
31~
133860~
1 Det~ i l ed Descri~tion I. Hardware Levf~l o the Preferrcd r ~
FI~URE 1 depicts a computer system having a ~L~yL -~ le computer and computer programs for creating a relational database and for processing operations on one or more relations (also called tables) of a relational database. The system includes ~LuyL~lllullable computer 2, display 3, entry device 11 for the computer and external device 12 such as a disk for storage o data. Hardware/software for representing the relations and hardware/software for performing relational operations are housed in a Relational Databas~
Management Sy6tem (RDMS) 10 (shown in phantom lines), which is connected within the computer 2. The RDMS 10 coordinates the various activities related to representing relations in the relational databas~ and to performing relational operatLons on one or more relations. Conventionally, RDMS 10 is a pLùyL~ llable computer on a printed circuit board which can be easily employed within most standard computers, including per-; ~ sonal, mini-, and mainframe computers. It is envisioned that RDM- 10 may be a special purpose computer formed by one or more integrated chips.
More particularly, referring to FIG. lA, RDMS 10 includes an optional Binary Bit Vector Processor (BBVP) 14, an optional Bit Vector l~ncoder (BV~) 16, an optional Map Vector Proces~or (MVP) 15, an optional memory 18, a Relational Processing Unit (RPU) 22, including a Boolean Logic Unit (BLU) 24, and a Command Interpreter 28. When software programs for generating binary represented relations, for processing relational operations, and for coordinating data transfer between components are loaded into the RDMS lo, the RDMS 10 is formed and ready for processing operation.
3i - 16 - 13~86~1 ~
A detailed discussion of the specific components of the RDMS 10 is now pre6ented. External device 12 is a porr-n~nt of buffered storage, typically in the form of a hard disk, for storing information used in relations which 5 are represented in expanded form where each value is typically no smaller than a byte. The contents of the external device are typically maintained in records which are divided into fields where the nth field of each record corresponds to a specific type of data. The contents of the external device 12 is loaded via bus 30 to the RDMS 10 and specifically to RPrJ 22. The RPU instructs BBVP 14 to convert each relation stored on the external device into a binary representation (to be more thoroughly ~ c~ l in PART IV). Bus 32 then transfers the binary representation 15 of each relation to optional BVE 16. The BVE 16 compresses the binary representation. Xore partLcularly, the BVE 16, employed withln the RDMS 10, evaluates lln~ ssed bit-string representations of each relation and separates the bit-string6 into one or more "impulses". An impulse is a 20 run, which is a strlng of one or more bits of a same binary value or a polarity (e.g., "O's" or "l's"), and an ending bit which has ~ polarity opposite the polarity of the run.
Software programs executed by the bit-vector encoder encode the bit vectors into one of several different ~ ~_essed 25 impulse formats. The compressed bit vectors are then sent via bus 32 back to the BBVP and then in turn to the op-tional memory 18. Memory 18 may be a memory component included in the host computer such as an external device or a memory component included within the RDMS 10 (as shown in FIG.
lA). Memory 18 holds the compres6ed binary representations of relations before processing the relations at the RPU 22 or stores the relations after processing at RPU 22. RPU 22 via the BBVP 14 performs relational type operations (e.g., SELECT, PROJECT, JOIN, INSERT, DELETE, etc. ) on one or ~more relations by processing unique software programs at the RPU 22 via a microcontroller (e.g., Intel 80386). Ir the relations are in the form of encoded bit strings, snd if Boolean operations are required to be performed by the relational operation, then the Boolean operations can be per~or~ed by the harowa~e or software embodi~ents of the BLU 2~
Even though the relational operations in the preferred embodiment may be implemented primarily by software, the RPU 22 can perform relational operations more effioiently in terms of storage, speed, etc., than presently known techniques for performing relational 2 0 operatiJns on one or more relations . At the hardware level, BLU 24 can take full advantage of the unique properties of the latest components, such as 32-blt microprocessors, CPU ' S, etc .
The MVP 15 generates map vectors, called entity use vectors, and the map vectors are used by the RPU 22 to map each row of a column of a relation to a unique value of the relational database. The purpose of the entity use vectors is for facilitating the reconstruction and display of the information represented by the binzry representations of the relations. The MVP 15 is optional because the RDMS 10 can reconstruot and display the relations without having entity use vectors.
~}owever, the entity use vectors are used to perform the DISPLAY/RECONSTRUCT process more ef~iciently.
`~r 13386~1 Although there are many ways by which the system could be interconnected with users or programs, the preferred embodiment contains a command interpreter 28 which interprets instructions for processlng data in the relational database. The instructions used could be those found in Struotured Query Language (SQL) which has become an industry standard language for enabling users to communicate with relational databases. The operation of the components in the RDMS 10 are controlled by software programs implemented by RPU 22. Data lines 30, 31, 41, 44, 46, 48, 50, and 52 of FIG. lA depict the data ~low between the various components of the RDMS 10.
The command interpreter 28 need not be a part of thc RDMS 10; instead, it could be loaded into the host computer 2. In the preferred embodiment, however, the command interpreter 28 is located within the RDMS 10.
In certain circumstances, relations stored in the external device 12 need to be updated by inserting or deleting values in the different domains, etc. The information can be updated by transferring data, via bus 30, to RPU 22, which then lnserts or deletes necessary values, in the form of bytes, in the domain, etc., and restores the domain back to external device 12 via bus 30. This capability enab~es the RDMS 10 to add new and uni~ue values to the relational database.
II. A Det~iled D~cussion on ~elation~l Dat~hs~es The following discussion is an explanation of the relational database depicted in FIG. 2. FIG. 2 is an example of a relational database, namely, the "SUPPLIERS
and PARTS" relational database, and it has been adopted from Date, An IntrsductiQn to Daf:~h~ce SV9t~ , 4th Ed., Chapters 4 & 11 (1986) ~
FIG. z depicts three relations, a table for 35 suppliers 63, a table for parts 65, and a table for -19- 133860~
shipments 67. "Relation" is just a mathematical term for the word "table", and these two words are used interchangeably in the speciîication.
Table 63 represents the information on the suppliers for a particular company. Each supplier has a Supplier ID column 80, a Supplier Name column 82, a Status column 84, and a City column 86, indicating where the supplier is located. As shown in table 63, each row of the table depicts information on a different supplier, and each column 80, 82, 84, 86 represents a different characteristic of each supplier. For example, row 69 of table 63 depicts supplier "Sl'l havLng the name "Smith" with a Status "20" and located in "London. "
¦ Table 65 represents the parts the supplier3 may sell. Bach part has a Part ID 88, a Part Name column 90, a Color column 92, a Neight column 94, and a Clty column 96. As shown by each row of the table, it is also assumed that the part only comes in one color and it is 6tored in a warehouse in exactly one city. For example, row 77 of table 65 corresponds to a single part having part having part ID "Pl", a part name "NUT", a color "RED", a weight "12", and a location or city, " LON DON . "
Table 67 represents the shipments o~ parts (table 65) made by suppliers (table 63). Table 67 is really a connection between tables 63 and 65 (to be discussed in Part V). The first row 87 o~ table 67 connects a specific supplier (Sl) with a specific part (Pl): statcd differently, it represents a shipment Or parts of kind Pl, by supplier S1, and the shipment ~uantity is 300.
Thus, each shipment is uniquely described by supplier ID, corresponding to cdlumn 98, a part ID, ~uLL~C~ nl1in~
to column 100, and a quantity, corr~p~n~1n~ to column 102. It is assumed that at most, one shipment at any given time ~or a given supplier can be made for a given ..
part. For example, the combination of S1 and Pl having a quantity shipment of 300 at row 87 is unique with respect to the set of the shipments appearing in table 67 .
The supplier, part and quantity, together in each row of table 67, create a unique "entity". The Shipments table is a "relationship" between a particular supplier and a particular part. The "SUPPLIER5 and PARTS" relational database ~FIG. 2) describes, in reality, a very elementary database. Databases are likely to be much more involved, containing many more entities and relationships. This database, however, i6 sufficient to illustrate what a relational database is and to illustrate the novel features of the present invention.
A couple of properties regarding each relation are worth noting. First, each of the data values depicted in the tables 63, 65, 67 are "atomic". That is, at every row and column position in every table, 63, 65, 67, there is always exactly one ~ata value, never a set of values. For example, in table 63, at row 69 and column 89, the status for the supplier Smith is a single value "20" and not a set of statuses. Also, note there are no links connecting one table to another table. In the example of FIG. 2, there is a relationship between the supplier row 69 of table 63 and the part row 77 of table 65, because supplier S1 supplies P1 as shown by the existence of row 87 of table 67, in which Sl has sold 300 P1 ' s. However, there are no links extending from table 63 to table 65 to show this unique relationship. By contrast, in non-relational systems, this information is ty'pically represented by Rome kind of "link" that is visible to the applications P ~ L . ^ r, . .
133860i To this point, we have discus6ed a high level theoretical construct of how relational databases can be defined. Although relations at the external level in sy6tems today create this construct for users to work with, the internal levels of the relational systems use a variety of structures, which are not in the form of relations or tables. The idea of a relational database construct only applies to the external levels of the relational system, ahd not to the internal level of a present-day relational database. Stated differently, the relational models as disclosed by the prlor art (e . g., Date) repreGent database systems "at a level of abstraction that is somewnat removed from the details in the underlying machine. " Whereas in the present invention, although the relational database at the external level remains the same, the internal level of the relational database is actually depicted in the form o~ columns, as discussed in PART IV.
~eferring to FIG. 2 at 60, a set of domains i8 depicted. A domain is a set of unique values, and each domain has only one characteristic. For example, the domain for supplier identifiers 66 ls the set of all possible unique supplier identifiers which are referenced in the sy6tem. I.ikewise, the domain for person names 70 is the set of all unique names, the set of numbers 78 is the set of all integers greater than 0 and less than lO, ooo (for example) . Domains are pools of unique values, from which actual values appearing in the columns of the tables 63, 65 and 67 are drawn.
Typically, there will be values in a given domain that do not concurrently appear in one of the columns of one of the relations. For~ example, the value S8 or SlO may appear in the domain of supplier identifiers 66, but note that no suppl ier S8 or S10 actually appears in relation 63. They may appear in some other column of some other relation. Furthermore, each column of a relation corresponds to one of the domains. For example, supplier identifier domaln 66 corresponds to column 80 of table 63.
When a particular relation is ready to be taken from the external devLce 12, RPU 22 acts as a file manager to retrieve each of the records associated with the particular columns. The columns will then be sent via bus 30 to the RDMs 10 or for further processing. In reality, the RDMS 10 ' s view of the database on the external device 12 is a collection of stored columns ( in byte form), and that view is supported by a file manager and disk manager (not shown) which may or may not be a ': I part of the RDMS system.
Assuming that the RDMS 10 system via RPU 22 summons a relation stored in the external device 12 over the bus 30, the relation will be converted into a binary representation via the BBVP. This unique tabular representation of the relation is one aspect of the invention and it will be thoroughly discussed in part IV. ~owever, be~ore proceeding to the binary representation model of a relation, some background regarding bit-vector technology must be discussed.
Part III is devoted entirely to creating a fundament~l framework in which a binary representation o~ a relation --- can be developed.
III. Binary Bit-Vector T~-hn~loqv One aspect of the present invention is the representation of a relational database by bit strings which do not contain the raw data values of the relational database. ' Instead, each value o~ the database is represented by a single binary bit within an ordered set of binary bits. To more fully understand 3~ how ehe~e conc~pt ~r~ ~h1ev~ e ~lled ~ ou~lor~
133860~
., ~23--is presented regarding binary bit-vector charac-terlzations of sets. (For an excellent discussion on this matter see "Elements of Set Tneory, " ~rt~om~ c Press 1977, by Enderton. ) The basLc building block for representing a relation of a relatlonal database, in the pre6Qnt invention, relies on the concept of ordered sets.
Ordered sets consist of ordered pairs, <M, X>, where each ordered pair represents an element of a set, where M ls a symbol that defines the location or position of the value X within the set. In the preferred oml~tlrlir - t of the invention, M is a non-negative integer. Stated dif~erently, every element o~ the ordered set constitutes a function between the lnteger doscribing the ordinal posLtion M and the corresponding value X in the set. Typically, the value M increases by one for each forward progression of one element in the set. An ordered set with unigue values can be defined as a set 2 o having elements X and Y which have ordinal positions M
and N, respectively, and for all <M, X>, <N, Y> in the set, M i6 equal to N if, and only if, X is equal to Y.
Two elements of an ordered set can only be equal when they are at the same ordinal position in the set, as all ordinal positions in the ordered set are unique. The ordering of values within ordered sets (ordering rule), namely, the way ln which each value matches an ordinal posltion, is arbitrary. It depends on the system's or user ' s choices and - requirements .
3 0 A "vector" consists of an ordered set of elements .
HowQver, the ordering of values (a value X of the ordered pair <M, X>, where N gives the ordinal position) i8 implied by the posltion in the vector. Thus, a vector consisting of a set "S" of elements is a one-dimensional array, where eaoh value corresponds to a .
i ,~
1338~01 vector element who6e ordinal position over the vector 18 implied. More particularly, a vector may define an ordering to a set of elements in an ordered or non-ordered set. Given the ordered set "5" 148 of FIG. 3A, a "binary-bit vector" a can uniquely define a subset l'AI' ~FIG. 3D) of "S" 148 (FIG. 3A), by representing the elements of "S" in "A". Only two data values are required for representing the elements of "S" in subset "A". r~ore preclsely, each data value mu6t represent whether an element of set "S" is in subset "A" or is not - in subset "A" and, thus, a binary bit 6tring, where each bit corresponds to an ordinal position of set "S", can represent subset "A". In the preferred ~ 1 t, binary bit-vectors contain l's and 0~6, a "1" indicating a set element is present, and a "0" indicating that it i8 not present. Each position of each binary bit within a binary bit-vector corresponds to an ordinal po6ition of an element in subset "A" (FIG. 3A). Thus, a ~inary bit-value, combined with its implied ordinal position within the vector, is an "element" of the vector. rhe presence of a "1" bit in the vector ls equlvalent to representlng a value o~ the ordered set by only its ordinal position. In this way, representations of data can be operated on without operating on the data values themselves.
- A6 mentioned above, binary bit-vectors characterize a set by identifying which elements of the set exist in a subset; where each binary "1" bit corresponds to an ordinal position of the set. For example, given the ordered set o~ states, (<1, Cali~ornla~) ' (<2, Colorado>), (<3, Hawaii~ <4, Maine~), (<5, New rqexico~ <6, New York>), (<~, Oregon>), (<8, I~exas>), we can characterize all elements in this set by the binary bit-vector a ~ s (11111111) (FIG. 3E) . Likewise, to represent only a subset of set "S"~ for example, the 133860~
subset "A" (FIG. 3B), contalning (<2, Colorado>), (<3l Hawaii~), (<5, New Mexico>), (<8, Texas>), only the binary bit vector a = (01101001) (FIG. 3D) is necessary.
Lastly, to represent an empty subset (FIG. 3C) of set "S", only the bit-vector a" = (00000000) (FIG. 3G) is needed .
Assuming that a relational database comprises ordered sets, binary bit-vector analysis discussed above can be used to represent relations of the relation21 database. A more detailed discussion on the proaedure in the "Binary Representation of a Relational Database"
is discussed in PART IV of the specification.
IV. R~n~ry Rel~resentation of a ~lAt~on;~l DatAh~
FIG. 2 dep~cts a relational database for suppliers, parts and shipments. As stated in PART II, the relational database consists of tables 63, 65, and 67, which depict the Suppliers, Parts and Shl ~nt.c relations, respectlvely. In addition, domain 60 contains the unique values which are referenced by each of the tables 63, 65 and 67. As shown, the domain 60 contains unique values for supplier identifiers 66, part identifiers 68, person names 70, part names 72, cities 74, colors 76, and numbers 78.
One aspect of the pre6ent invention is to create a binary representation of a relational data base, such as the one shown in FIG. 2. Speci~ically, the binary representation for the relational database shown in FIG.
2 is shown in FIGS. 4, 5 and 6. More particularly, FIG.
4 shows the binary representation of the suppliers table 63 (FIG. 2), FI~. 5 is a binary representation of the parts table 65 and FIG.~ 6 is a binary representation of the Shipments table 67 (FIG. 2).
FIG. 4 can be broken up into two portions as shown.
The top ~ortion of the figure represents the value sets 13386~1 or domalns assoclated with the supplier6 relation 63 of the relatlonal database in FIG. 2. More particularly, domain 160 (~IG. 4) is associated with suppliers identifier domain 66 of FIG. 2. Domain 162 ~ULL~ J~IdS
to the Person Names domain 70 of FIG. 2. Domain 164 corresponds to the Numbers domain 78 (FIG. 2), and domain 166 corresponds to the Cities domain 74 of FIG.
2. The values of domains 160, 162, 164 and 166 have been arbitrarily chosen and formed in particular orders of occurrence, namely, lexical ordering for domaLn6 162 and 166 and numerical ordering for domains 160 and 164.
A subset of each of the domain 160, 162, 164 and 166 i8 represented by corrosr~n~n~ binary bit-vectors 176, 178, 180, and 182. Each bin2ry bit-vector, which is used ~or the purpose of characterizing a domain, iG
called herein an entity select vector. -In other words, the binary bit-vector is used to L~Lesent a selection (subset) of entities from a set of entities, where an entity is defined to be a unique value or grouping of values. For example, an entity may be a value in a value set or a row in a relation.
The entity select vector has a quantity of binary bits equivalent to the quantity of unique values in a given domain. For example, domain 160 for supplier identifiers contains ten supplier identifiers for ten corresponding binary bits. Each binary bit of the entity select vector has a position which directly corresponds to the position of each of the unique values in the supplier identifier domain 160. Additionally, each of the binary bits represents the presence or absence of each of the unique values in the subsets represented by the er~tity select vector 176. More particularly, the binary bit 236 of entity select vector 176 corresponds to the unigue value Sl at 216 of domain 35 1~0. The binary bit 236 has a value o~ "1" which indicates that the unlque value Sl is present in the subset represented by entity 6elect vector 176.
Likewise, binary bits 238, 240, 242 and 244 also indicate that the unique values S2, S3, S4, and S5 are present in the subset. Binary bits 246, 248, 250, 252 and 2~4 are set to "0", indicating that unique values S6 226, S~ 228, S8 230, S9 232, and S10 234, respectively, are not ln the subset. The binary bit vector represen-tation of the unique values of the domain 160 i8 a 10 short-hand way of representing a subset of the domain 160 by indicatinq those uni~ue values which are present~
I, in the subset. Thi6 aspect of the present invention is ; an important one. By only representing a subset o~ a ~' domain, all values of a domaln need not be associated 15 with a column. More particularly, only the entity select vector which corresponds to a particular column of a relation, and not the domain, needs to be associated with the column.
For each domain containing unique values in the 20 Suppliers relation, a different entity select vector 176, 178, 180, and 182 i9 used to represent subsets o~
the domains. ~ach subset representatlon is directly - ~sæociat~d with a column of the relation. More particularly, entity select vector 176 is aseociated 25 with column 168, entlty select vector 178 is associated with column 170, entity select vector 180 i5 associated with column 172, and entity select vector 182 is associated with column 174. Columns 168, 170, 172 and 174 correspond to the columns of the suppliers relation 30 63 of F~G. 2.
Referring to the set of binary bit-vectors at 260, each binary bit-vector 184, 186, 188, 190 and 192 corresponds to one o~ the values indicatQd to be present in the 6ubset by the entlty select vector 176. In the pre~erred ;~ lim~nt, the binary bit-v- tor is called a row use vector because lt indicates the presence or absence of a unique value in one or more rows of a column in the relation. The row use vectors are combined into a row use set 260 to form the binary representatlon of the column 168. The order o~ row use vectors occurs in the same order as the order of values characterized by the entity 6elect vector. I~ore particularly, the lower order binary bits of the entity select vector have been arbitrarily assigned to correspond to the left most row use vectors of the row use set, and the higher order binary bits of the entity select vector correspond to the right most row use vector3 of the row use set. The implied mapping between the bits set to "1" of the entity select vector could have just as easily been reversed. For example, the lower order binary bits may correspond to the right-most row use ~ectors of the row u6e set.
Binary bit 236 o~ the entity select vector 176 corresponds to row use vector 184 of the row use set 260. Likewise, binary bit 238 of the entity select vector 1~6 corresponds to the next row use vector 186 of the row use set 260. The system interpreting the entity select vectors and the row use set is pre-programmed so that the binary bits indicate unique values present in 2~ the subset. The implied ordering scheme is illustrated by the dotted lines 268, 270, 272l 274 and 275, which show that the binary bits in entity select vector 176 correspond to the row use vectors 184, 186, 188, 190 and 192, respectively. Likewise, the binary bits in the entity s~lect vector 180, which represent the unique values "10", "20", and "30'`, are mapped in an implied manner to the row use vectors 204, 206 and 208, respectively .
It is important to note that each row use vector 3 5 can indicate the presence or absence of a unique value in one or more rows of a column of the relation. For example, in row u6e vector 206, corresponding to the unique value "20`', "1" bits lndicate the presence of the v21ue "20" in the first and fourth rows of column 172.
Because the binary representation of each of the -- columns is represented by an entity select vector to unique values, the columns 168, 170, 172 and 174 of the relation need not be represented as 2 set of actual values in the database memory. Only the row use sets and the corresponding entity select vectors need be stored iil the memory of RDMS 10 (e.g., memory 18, FIG.
2). The actual value6 are stored in a more permanent memory fiuch as a hard di6k (e.g., external device 12, FIG. 2). Thus, for each relation in th2 databa6e, a 6erie6 of entity select vectors representin~ subsets of the various domains pertaining to the particular relatlon, are stored along with their corr~crnn~tn~ row use sets which depict the columns of the relation. FIG.
5 and FIG. 6 represent the entity select vectors and ; 1 20 their corresponding row use sets for the Parts Relation 65 (FIG. 2) and the Shipments relation 67 (FIG. 2).
, I Referring to FIG. 5, domains 282, 286, 290, 294 and ¦ 298 correspond to the domains for part identifiers 68 (FIG. 2), part names 72 (FIG. 2), colors 76 (FIG. 2), numbers 78 (FIG. 2), and cities 74 (FIG. 2) respectively. Note that the numbers domain 294 and the city domain 298 of the Parts relation (FIG. 5) ~re identical to the number6 domain 164 and the city domain 166 of the suppliers relation (FIG. 4). An efficiency 3 o of the present invention is that the unique value6 of the domains need only be represented once. In other words, the unique values in the numbers domain 164 (FIG.
4) and the numbers domain 294 (FIG. 5) are the same domain of unique values of numbers. I,ikewi6e, the city 35 domain 166 (FIG. 4) and the city domain 298 (FIG. 5) are 13~8601 the same domaln of unique values of cities. The representation of the subsets ~or these domains i5 different. For example, the entity select vector 180 (FIG. 5) represents that the unique values "10", "20"
and "30" of the numbers domain are in the relation of 6uppliers (FIG. 4~, whereas the unique values "12", "14", "17" and "19" are in the relation for parts (entity select vector 296, FIG. 5). Thus, instead of having to store two complete sets of unique values 164 ~nd 294, which are identical, the system only stores one version of the numbers domain and two entity select vectors, namely, 180 and 296, to represent 6ubsets of the same domain. Likewise, the system need not retain two unique sets of Yalues for cities, 166 and 298, instead, the system maintains only one version of the city domain and two entity select vectors, 182 and 300, to identify the subsets of cities in the associated relations. In fact, any number of entity select vectors may be assoclated with a partLcular domain. For example, in FIG. 6, the domain for numbers 328 is also referenced; however, the entity select vector 330 depicts a different subset of unique values from the entity select vectors 180 (FIG. 4) and 296 (FIG. 5).
EntLty select vectors are efficient subset representations of domains ~ecause the same domain need not be represented more than once and the actual values of each domain can be uniquely represented by the entity select vector ' s binary bits.
The columns of the Parts relation 302, 304, 308, 312, 316 (FIG. 5) are represented in the relational database by the row use sets 304, 306, 310, 314, 318, respectively. As in the description for the suppliers relation (FIG . 4 ), the row use sets contain row use vectors which are each associated with a unique value 3~ 1Dd1O.. t~3d to ~e pro~oD~ 1D ~, ~Ub~3t o~ ~ do~ D by ~D
-31- 1338~1 entity 6elect vector. Likewise, FIG. 6 depicts a binary representation for shipments.
The domain6 associated with each of the relations with their unique sets of valuQs are stored on the external device 12 ~FIG. lA), and the binary representation of each of the relatlons (FIGS. 4, 5 and 6) which make up the relational database of FIG. 2 are stored in memory 18 of the RDMS 10 (FIG. lA). When the system performs a relatioffal operation on the relational database, only the binary representation of the relations need be referred to. A detailed discussion of this process will follow shortly.
FIG. 7 is a flow block diagram of the steps which are performed by the RDMS 10 for creating a binary representation of a relational database.
Referring to FIG. 7, a detailed discussion of the BINARY REPRESENTATION routine is now provided.
Specifically, during block 344, each domain necessary for specifying unique values in the relations of the relational database is identified. Input commands, are interpreted by the command interpreter 28 (FIG. lA) for creating domain identifiers in the memory 18 (FIG. lA) of the RDMS lO (FIG. lA~. Additionally, the system reads each of the instructions above, and an empty value set ls created for each of the domains listed in the instruction. ~he actual unique values associated with each domain are loaded into memory in a later step of this routine. During block 346, the system determines whether any more domains for the database need to be specified.
Assuming that all the domains to be referenced by the database are specified, the 6ystem, during block 348, identifies a table of the relational database, and the RDMS 10 builds an entry in the system relation (see ~art Vl). Duri~g b ook ~50, the sylit~ ntlfle~ -13386~1 column, as with the table identlfied in block 348. In addition, the RDMS lo build6 on entrle6 in the system relation identifying each column, and the RPI~ 22 via the BBVP 14 creates empty entity select vectors associated with each of the domains. The entity 6elect vectors will have their binary bits set when the columns of the particular relations have their values loaded into the system. During block 352, it Ls decided whether to ldentlfy another column for the particular t2ble lden-tified in block 348. Processing continues at blocks 350 and 352 until all of the relational columns for the particular table identified during block 348 have been identified. During block 354, the system asks for the next table or relation of the database. Assuming that a number of different tables exist in the relational databa6e, blocks 348, 350, 352 and 354 are processed until all of the tables and their related columns have been specified. During blocks 348, 350, 352 and 354, system identifiers, identifying each column of each t 20 relation, are constructed. These identi~iers are discussed more thoroughly in part VII, j During block 356, the system loads a file ¦ associated with each table of the relational database.
The f ile of the table is assumed to exist in the external device 12 of the system deplcted in FIG. 1.
When the relational database has been completed, the system loads the file representations of the relations into external device 12, where they reside until summoned by the RDMS system 10. During block 358, RPtr 22 instructs the external device 12 to transfer the first column associated with a particular table of the relational database vi'a bus 30 to the RPU 22. The particular column is retrieved by referring to certain system identifiers (i.e., AID, RID identifiers, Part VII). When the first value of a particular column is l33s6al brought into the RPU 22, a row use vector associated with the value ls built by the BBVP ~4. Additionally, a bit i5 set to "1" in the first position of the row use vector to indicate that the value occupies the first row of the first column of the relation. For each uniquc --- value of the column, a new row use vector is created and a binary bit associated with a particular row of the column, where the unique value re6ides, is set to "1".
Additionally, as each unique value of a particular column is brought into the RPU 22, the binary bits of the entity select vector, as60ciated with the column, are set to "1" to indicate the unique values of the domain re~erenced in the column.
If the same value appears in a second, third or fourth row, etc. o~ the column, the corrF-F:rnn-!~n~ bits of the row use vector associated with the unique value are set to "1" indicating the presence o~ the unique !: value in these rows of the column. Because the entity select vector has previously had the binary bit zo aGsociated with the unique value set to "1", it need not be set again. This process of converting the column into a row use set continues until all of the values of the column have been represented by binary bits. When the row use set for a column is completely constructed, z5 the RPU 22 ~ommands the external device 12, to bring the next column, via bus 30, into the RDMS 10. This process continues ~or all o~ the oolumns associated with a particular table until the row use sets for the table are completely constructed. During block 360, the system determlnes i~ any other tables need to be binary repre3ented by the I~DI~S 10, assuming a single input file corresponds to a singlê table. If other tables are to be constructed, then blocks 356, 358 and 360 are performed until all the columns o~ the next relation are represented in their binary ~orm. After all of the -binary bit vectors ( i . e ., row Use vectors and entity select vectors) are constructed, the RPU 22 may summon the bit vectors to bit vector encoder BVE 16 of the RDMS
10 to encode the bit vectors lnto compressed lmpulse format. The steps for compressing the bit vectors lnto compressed impulse formats are thoroughly tl~cr~ccPd in Glaser et al. RPU 22 then instructs the bit vectors to be sent via bus 48 to the memory 18 where they are stored .
Eventually, all of the relations of the relatLonal database stored in external device 12 will be converted into the binary represQnted form, and this form is stored in the memory 18 of the RDMS 10 in the compressed mpulse format or in the l~n~ ,-.3ssed form. During block 362, the system returns to the program which called the BINARY REPRESEU~ATION routine (FIG. 7).
Once the binary representatlon of the relations is completed, the relations can be updated by two utility functions callea INSERT and DE~E~E, which will be discussed in PAR~ V of the specification. The INSERT
function, perhaps with appropriate modification to enhance efficierlcy, can also be used to load additional rows into the relation. Additionally, the relations in their binary represented form, can be manipulated via - the relational operations SELECT, JOIN and PROJECT, also to be discussed in PART V.
A. Example of ~,~nt-rat~n~ a B1nAr~ Repr~ct~ntatlQn of a ~l ation Referring to FIGS. 1, lA, 2, 4, 5, 6, 7, 8A, 8B, 9A, 9B and 9C, a detailed example for generating a binary representation Qf a relation is now ~crllcc~.
Referring to FIG. 2, a hypothetical relational database is Ghown. After the BINAKY k~;~Kh~hl~lATION routine (FIG.
FIG, 17B is a flow diagram of the DISPLAY/RECONSTRUCT routine;
FIGS. 18A, 18B, 18C and 18D depict a table of the results of the operations performed by the PROJECT
routine (FIG. 17A);
FIG. 19 depicts a binary representation of a JOIN
relation:
FIG. 20 depicts a more detailed version of the binary representation of the Suppliers portion of the JOIN relatLon (FIG. 19);
- FIG. 21 depicts a more detailed version of the Parts relation portion of the JOIN relation. (FIG. 19);
FIG. 22A is a flow diagram of the EQUIJOIN routine;
FIG. 22B is a flow diagram of the BUILD ROW USE
SETS routine;
FIG. 22C is a flow diagram of the EVALUATE ROW USE
.~ SETS routine;
FIG. 22D is a flow diagram of the CoNb~ l JOIN
' ROW USE VECTORS routine;
FIG. 22E is a flow diagram of the PRODUCTS routine;
FIG. 22F ls a flow diagram of the NUMS routine;
FIG. 22G is a flow diagram of the GENERATE ;3IT
STRING routine;
, FIGS. 23A, 23B, and 23C represent a table of the results of the operations performed by the EQUIJOIN
routine ( FIG . 2 2A );
i FIG. 24 is a flow diagram of the GREATER THAN JOIN
; I routine;
FIG. 25A is a flow diagram of the DISPLAY/RECONSTRUCT FOR JOIN routine;
FIG. 25B is a flow diagram of the ~;~ 'NC~i RELATION routine;
FIG. 25C is a flow diagram of the REFERENCE VALUE
SET routine;
FIGS. 26A, 26B, 26C, 26D, 26E, 26F and 26G depict a table of resu~ts for the operations performed by the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A);
FIG. 27 represents a mapping of the elements of Set S into Set T; Set S is the "domain~ and Set T is the "range. "
.
FIG. 28 represents the binary representation of Suppliers (FIG. 4), with entity use vectors added;
FIG. 29 depicts a binary representation of the Suppliers portion of the JOIN relation (FIG. 20) with entity use vectors added;
- FIG. 30A is a flow block diagram of the DISPLAY/RECONSTRUCT WITH ENTITY USE VECTORS routine;
FIG. 30B is a flow block diagram of the REFERENCE
RELATION routine;
FIG. 30C is a flow block diagram of the REFERENCE
VALUE SET routine;
FIG. 31 depi..cts a relational database;
!~ j FIG. 32 depicts a SYSTEM RELATION
'' ¦ FIG. 33 depicts the domains of the relational database shown in FIG . 31 t FIG. 34 depicts the ENTITY SELECT SET associated with the SYSTEM RELATION (FIG. 32);
FI~. 35 depicts the ENTITY USE SET, which is associated with the SYSTEM RELATION (FIG. 32);
FIG. 36 depicts the ROW SELECT SET associated with the SYSTEM RELATION (FIG. 32);
FIGS. 37A, 37B, 37C, and 37D depict the ROW USE
SETS associated with the SYSTEM RELATION (FIG. 31);
FIGS. 38A, 38B, 38C, and 38D are flow block diagrams of the DATABASE IDENTIFICATION routine.
.
13386~1 TART~E OF ~ N~T,NT~
DETAILED DESCRIPTION
I. Hardware Level of the Preferred Embodiments II. A Detailed Discussion on Relational Databases III. Binary Bit-Vector Technology TV. Binary Representation of a Relational Database A. Example of Generating a Binary Representation of a Relation B. Example of Building a Binary Represented Relation V. Operations Performed on B1nary Representations of Relations A. INSERT
1. Detalled Example for the INSERT
~ Function L! 2 0 B . DELETE
'i 1. Detailed Example for the DELETE
Operation C. SELECT
1. DetaiLed Example of a Two-CoLumn SELEC~ for Two Values.
2. Detailed Example of Two Column SELECT for Multiple Values D. RECONSTRUCT
1. Detailed Example of Performing 3 0 PROJECT Operation E. JOIN
1. Bil~ary Representation of a JOIN
Relation 2.Constructlng a BinQry Representation of a JOIN
Relation 3. Detailed Bxample For Constructing a Binary - - Representation of the JOIN
Relation 4. Constructing a BINARY
REPRESENTATION of a GREATER
THAN JOIN
F. DISPLAY/RECONSTRUCT For JOIN
' Operation 1. Example of` the DISPLAY/RECONSTRUCT Operation For A JOIN Relation VI. ENTITY USE Vectors VII. Database Identification A. Performing the Database Identification Scheme . 120 . . .
31~
133860~
1 Det~ i l ed Descri~tion I. Hardware Levf~l o the Preferrcd r ~
FI~URE 1 depicts a computer system having a ~L~yL -~ le computer and computer programs for creating a relational database and for processing operations on one or more relations (also called tables) of a relational database. The system includes ~LuyL~lllullable computer 2, display 3, entry device 11 for the computer and external device 12 such as a disk for storage o data. Hardware/software for representing the relations and hardware/software for performing relational operations are housed in a Relational Databas~
Management Sy6tem (RDMS) 10 (shown in phantom lines), which is connected within the computer 2. The RDMS 10 coordinates the various activities related to representing relations in the relational databas~ and to performing relational operatLons on one or more relations. Conventionally, RDMS 10 is a pLùyL~ llable computer on a printed circuit board which can be easily employed within most standard computers, including per-; ~ sonal, mini-, and mainframe computers. It is envisioned that RDM- 10 may be a special purpose computer formed by one or more integrated chips.
More particularly, referring to FIG. lA, RDMS 10 includes an optional Binary Bit Vector Processor (BBVP) 14, an optional Bit Vector l~ncoder (BV~) 16, an optional Map Vector Proces~or (MVP) 15, an optional memory 18, a Relational Processing Unit (RPU) 22, including a Boolean Logic Unit (BLU) 24, and a Command Interpreter 28. When software programs for generating binary represented relations, for processing relational operations, and for coordinating data transfer between components are loaded into the RDMS lo, the RDMS 10 is formed and ready for processing operation.
3i - 16 - 13~86~1 ~
A detailed discussion of the specific components of the RDMS 10 is now pre6ented. External device 12 is a porr-n~nt of buffered storage, typically in the form of a hard disk, for storing information used in relations which 5 are represented in expanded form where each value is typically no smaller than a byte. The contents of the external device are typically maintained in records which are divided into fields where the nth field of each record corresponds to a specific type of data. The contents of the external device 12 is loaded via bus 30 to the RDMS 10 and specifically to RPrJ 22. The RPU instructs BBVP 14 to convert each relation stored on the external device into a binary representation (to be more thoroughly ~ c~ l in PART IV). Bus 32 then transfers the binary representation 15 of each relation to optional BVE 16. The BVE 16 compresses the binary representation. Xore partLcularly, the BVE 16, employed withln the RDMS 10, evaluates lln~ ssed bit-string representations of each relation and separates the bit-string6 into one or more "impulses". An impulse is a 20 run, which is a strlng of one or more bits of a same binary value or a polarity (e.g., "O's" or "l's"), and an ending bit which has ~ polarity opposite the polarity of the run.
Software programs executed by the bit-vector encoder encode the bit vectors into one of several different ~ ~_essed 25 impulse formats. The compressed bit vectors are then sent via bus 32 back to the BBVP and then in turn to the op-tional memory 18. Memory 18 may be a memory component included in the host computer such as an external device or a memory component included within the RDMS 10 (as shown in FIG.
lA). Memory 18 holds the compres6ed binary representations of relations before processing the relations at the RPU 22 or stores the relations after processing at RPU 22. RPU 22 via the BBVP 14 performs relational type operations (e.g., SELECT, PROJECT, JOIN, INSERT, DELETE, etc. ) on one or ~more relations by processing unique software programs at the RPU 22 via a microcontroller (e.g., Intel 80386). Ir the relations are in the form of encoded bit strings, snd if Boolean operations are required to be performed by the relational operation, then the Boolean operations can be per~or~ed by the harowa~e or software embodi~ents of the BLU 2~
Even though the relational operations in the preferred embodiment may be implemented primarily by software, the RPU 22 can perform relational operations more effioiently in terms of storage, speed, etc., than presently known techniques for performing relational 2 0 operatiJns on one or more relations . At the hardware level, BLU 24 can take full advantage of the unique properties of the latest components, such as 32-blt microprocessors, CPU ' S, etc .
The MVP 15 generates map vectors, called entity use vectors, and the map vectors are used by the RPU 22 to map each row of a column of a relation to a unique value of the relational database. The purpose of the entity use vectors is for facilitating the reconstruction and display of the information represented by the binzry representations of the relations. The MVP 15 is optional because the RDMS 10 can reconstruot and display the relations without having entity use vectors.
~}owever, the entity use vectors are used to perform the DISPLAY/RECONSTRUCT process more ef~iciently.
`~r 13386~1 Although there are many ways by which the system could be interconnected with users or programs, the preferred embodiment contains a command interpreter 28 which interprets instructions for processlng data in the relational database. The instructions used could be those found in Struotured Query Language (SQL) which has become an industry standard language for enabling users to communicate with relational databases. The operation of the components in the RDMS 10 are controlled by software programs implemented by RPU 22. Data lines 30, 31, 41, 44, 46, 48, 50, and 52 of FIG. lA depict the data ~low between the various components of the RDMS 10.
The command interpreter 28 need not be a part of thc RDMS 10; instead, it could be loaded into the host computer 2. In the preferred embodiment, however, the command interpreter 28 is located within the RDMS 10.
In certain circumstances, relations stored in the external device 12 need to be updated by inserting or deleting values in the different domains, etc. The information can be updated by transferring data, via bus 30, to RPU 22, which then lnserts or deletes necessary values, in the form of bytes, in the domain, etc., and restores the domain back to external device 12 via bus 30. This capability enab~es the RDMS 10 to add new and uni~ue values to the relational database.
II. A Det~iled D~cussion on ~elation~l Dat~hs~es The following discussion is an explanation of the relational database depicted in FIG. 2. FIG. 2 is an example of a relational database, namely, the "SUPPLIERS
and PARTS" relational database, and it has been adopted from Date, An IntrsductiQn to Daf:~h~ce SV9t~ , 4th Ed., Chapters 4 & 11 (1986) ~
FIG. z depicts three relations, a table for 35 suppliers 63, a table for parts 65, and a table for -19- 133860~
shipments 67. "Relation" is just a mathematical term for the word "table", and these two words are used interchangeably in the speciîication.
Table 63 represents the information on the suppliers for a particular company. Each supplier has a Supplier ID column 80, a Supplier Name column 82, a Status column 84, and a City column 86, indicating where the supplier is located. As shown in table 63, each row of the table depicts information on a different supplier, and each column 80, 82, 84, 86 represents a different characteristic of each supplier. For example, row 69 of table 63 depicts supplier "Sl'l havLng the name "Smith" with a Status "20" and located in "London. "
¦ Table 65 represents the parts the supplier3 may sell. Bach part has a Part ID 88, a Part Name column 90, a Color column 92, a Neight column 94, and a Clty column 96. As shown by each row of the table, it is also assumed that the part only comes in one color and it is 6tored in a warehouse in exactly one city. For example, row 77 of table 65 corresponds to a single part having part having part ID "Pl", a part name "NUT", a color "RED", a weight "12", and a location or city, " LON DON . "
Table 67 represents the shipments o~ parts (table 65) made by suppliers (table 63). Table 67 is really a connection between tables 63 and 65 (to be discussed in Part V). The first row 87 o~ table 67 connects a specific supplier (Sl) with a specific part (Pl): statcd differently, it represents a shipment Or parts of kind Pl, by supplier S1, and the shipment ~uantity is 300.
Thus, each shipment is uniquely described by supplier ID, corresponding to cdlumn 98, a part ID, ~uLL~C~ nl1in~
to column 100, and a quantity, corr~p~n~1n~ to column 102. It is assumed that at most, one shipment at any given time ~or a given supplier can be made for a given ..
part. For example, the combination of S1 and Pl having a quantity shipment of 300 at row 87 is unique with respect to the set of the shipments appearing in table 67 .
The supplier, part and quantity, together in each row of table 67, create a unique "entity". The Shipments table is a "relationship" between a particular supplier and a particular part. The "SUPPLIER5 and PARTS" relational database ~FIG. 2) describes, in reality, a very elementary database. Databases are likely to be much more involved, containing many more entities and relationships. This database, however, i6 sufficient to illustrate what a relational database is and to illustrate the novel features of the present invention.
A couple of properties regarding each relation are worth noting. First, each of the data values depicted in the tables 63, 65, 67 are "atomic". That is, at every row and column position in every table, 63, 65, 67, there is always exactly one ~ata value, never a set of values. For example, in table 63, at row 69 and column 89, the status for the supplier Smith is a single value "20" and not a set of statuses. Also, note there are no links connecting one table to another table. In the example of FIG. 2, there is a relationship between the supplier row 69 of table 63 and the part row 77 of table 65, because supplier S1 supplies P1 as shown by the existence of row 87 of table 67, in which Sl has sold 300 P1 ' s. However, there are no links extending from table 63 to table 65 to show this unique relationship. By contrast, in non-relational systems, this information is ty'pically represented by Rome kind of "link" that is visible to the applications P ~ L . ^ r, . .
133860i To this point, we have discus6ed a high level theoretical construct of how relational databases can be defined. Although relations at the external level in sy6tems today create this construct for users to work with, the internal levels of the relational systems use a variety of structures, which are not in the form of relations or tables. The idea of a relational database construct only applies to the external levels of the relational system, ahd not to the internal level of a present-day relational database. Stated differently, the relational models as disclosed by the prlor art (e . g., Date) repreGent database systems "at a level of abstraction that is somewnat removed from the details in the underlying machine. " Whereas in the present invention, although the relational database at the external level remains the same, the internal level of the relational database is actually depicted in the form o~ columns, as discussed in PART IV.
~eferring to FIG. 2 at 60, a set of domains i8 depicted. A domain is a set of unique values, and each domain has only one characteristic. For example, the domain for supplier identifiers 66 ls the set of all possible unique supplier identifiers which are referenced in the sy6tem. I.ikewise, the domain for person names 70 is the set of all unique names, the set of numbers 78 is the set of all integers greater than 0 and less than lO, ooo (for example) . Domains are pools of unique values, from which actual values appearing in the columns of the tables 63, 65 and 67 are drawn.
Typically, there will be values in a given domain that do not concurrently appear in one of the columns of one of the relations. For~ example, the value S8 or SlO may appear in the domain of supplier identifiers 66, but note that no suppl ier S8 or S10 actually appears in relation 63. They may appear in some other column of some other relation. Furthermore, each column of a relation corresponds to one of the domains. For example, supplier identifier domaln 66 corresponds to column 80 of table 63.
When a particular relation is ready to be taken from the external devLce 12, RPU 22 acts as a file manager to retrieve each of the records associated with the particular columns. The columns will then be sent via bus 30 to the RDMs 10 or for further processing. In reality, the RDMS 10 ' s view of the database on the external device 12 is a collection of stored columns ( in byte form), and that view is supported by a file manager and disk manager (not shown) which may or may not be a ': I part of the RDMS system.
Assuming that the RDMS 10 system via RPU 22 summons a relation stored in the external device 12 over the bus 30, the relation will be converted into a binary representation via the BBVP. This unique tabular representation of the relation is one aspect of the invention and it will be thoroughly discussed in part IV. ~owever, be~ore proceeding to the binary representation model of a relation, some background regarding bit-vector technology must be discussed.
Part III is devoted entirely to creating a fundament~l framework in which a binary representation o~ a relation --- can be developed.
III. Binary Bit-Vector T~-hn~loqv One aspect of the present invention is the representation of a relational database by bit strings which do not contain the raw data values of the relational database. ' Instead, each value o~ the database is represented by a single binary bit within an ordered set of binary bits. To more fully understand 3~ how ehe~e conc~pt ~r~ ~h1ev~ e ~lled ~ ou~lor~
133860~
., ~23--is presented regarding binary bit-vector charac-terlzations of sets. (For an excellent discussion on this matter see "Elements of Set Tneory, " ~rt~om~ c Press 1977, by Enderton. ) The basLc building block for representing a relation of a relatlonal database, in the pre6Qnt invention, relies on the concept of ordered sets.
Ordered sets consist of ordered pairs, <M, X>, where each ordered pair represents an element of a set, where M ls a symbol that defines the location or position of the value X within the set. In the preferred oml~tlrlir - t of the invention, M is a non-negative integer. Stated dif~erently, every element o~ the ordered set constitutes a function between the lnteger doscribing the ordinal posLtion M and the corresponding value X in the set. Typically, the value M increases by one for each forward progression of one element in the set. An ordered set with unigue values can be defined as a set 2 o having elements X and Y which have ordinal positions M
and N, respectively, and for all <M, X>, <N, Y> in the set, M i6 equal to N if, and only if, X is equal to Y.
Two elements of an ordered set can only be equal when they are at the same ordinal position in the set, as all ordinal positions in the ordered set are unique. The ordering of values within ordered sets (ordering rule), namely, the way ln which each value matches an ordinal posltion, is arbitrary. It depends on the system's or user ' s choices and - requirements .
3 0 A "vector" consists of an ordered set of elements .
HowQver, the ordering of values (a value X of the ordered pair <M, X>, where N gives the ordinal position) i8 implied by the posltion in the vector. Thus, a vector consisting of a set "S" of elements is a one-dimensional array, where eaoh value corresponds to a .
i ,~
1338~01 vector element who6e ordinal position over the vector 18 implied. More particularly, a vector may define an ordering to a set of elements in an ordered or non-ordered set. Given the ordered set "5" 148 of FIG. 3A, a "binary-bit vector" a can uniquely define a subset l'AI' ~FIG. 3D) of "S" 148 (FIG. 3A), by representing the elements of "S" in "A". Only two data values are required for representing the elements of "S" in subset "A". r~ore preclsely, each data value mu6t represent whether an element of set "S" is in subset "A" or is not - in subset "A" and, thus, a binary bit 6tring, where each bit corresponds to an ordinal position of set "S", can represent subset "A". In the preferred ~ 1 t, binary bit-vectors contain l's and 0~6, a "1" indicating a set element is present, and a "0" indicating that it i8 not present. Each position of each binary bit within a binary bit-vector corresponds to an ordinal po6ition of an element in subset "A" (FIG. 3A). Thus, a ~inary bit-value, combined with its implied ordinal position within the vector, is an "element" of the vector. rhe presence of a "1" bit in the vector ls equlvalent to representlng a value o~ the ordered set by only its ordinal position. In this way, representations of data can be operated on without operating on the data values themselves.
- A6 mentioned above, binary bit-vectors characterize a set by identifying which elements of the set exist in a subset; where each binary "1" bit corresponds to an ordinal position of the set. For example, given the ordered set o~ states, (<1, Cali~ornla~) ' (<2, Colorado>), (<3, Hawaii~ <4, Maine~), (<5, New rqexico~ <6, New York>), (<~, Oregon>), (<8, I~exas>), we can characterize all elements in this set by the binary bit-vector a ~ s (11111111) (FIG. 3E) . Likewise, to represent only a subset of set "S"~ for example, the 133860~
subset "A" (FIG. 3B), contalning (<2, Colorado>), (<3l Hawaii~), (<5, New Mexico>), (<8, Texas>), only the binary bit vector a = (01101001) (FIG. 3D) is necessary.
Lastly, to represent an empty subset (FIG. 3C) of set "S", only the bit-vector a" = (00000000) (FIG. 3G) is needed .
Assuming that a relational database comprises ordered sets, binary bit-vector analysis discussed above can be used to represent relations of the relation21 database. A more detailed discussion on the proaedure in the "Binary Representation of a Relational Database"
is discussed in PART IV of the specification.
IV. R~n~ry Rel~resentation of a ~lAt~on;~l DatAh~
FIG. 2 dep~cts a relational database for suppliers, parts and shipments. As stated in PART II, the relational database consists of tables 63, 65, and 67, which depict the Suppliers, Parts and Shl ~nt.c relations, respectlvely. In addition, domain 60 contains the unique values which are referenced by each of the tables 63, 65 and 67. As shown, the domain 60 contains unique values for supplier identifiers 66, part identifiers 68, person names 70, part names 72, cities 74, colors 76, and numbers 78.
One aspect of the pre6ent invention is to create a binary representation of a relational data base, such as the one shown in FIG. 2. Speci~ically, the binary representation for the relational database shown in FIG.
2 is shown in FIGS. 4, 5 and 6. More particularly, FIG.
4 shows the binary representation of the suppliers table 63 (FIG. 2), FI~. 5 is a binary representation of the parts table 65 and FIG.~ 6 is a binary representation of the Shipments table 67 (FIG. 2).
FIG. 4 can be broken up into two portions as shown.
The top ~ortion of the figure represents the value sets 13386~1 or domalns assoclated with the supplier6 relation 63 of the relatlonal database in FIG. 2. More particularly, domain 160 (~IG. 4) is associated with suppliers identifier domain 66 of FIG. 2. Domain 162 ~ULL~ J~IdS
to the Person Names domain 70 of FIG. 2. Domain 164 corresponds to the Numbers domain 78 (FIG. 2), and domain 166 corresponds to the Cities domain 74 of FIG.
2. The values of domains 160, 162, 164 and 166 have been arbitrarily chosen and formed in particular orders of occurrence, namely, lexical ordering for domaLn6 162 and 166 and numerical ordering for domains 160 and 164.
A subset of each of the domain 160, 162, 164 and 166 i8 represented by corrosr~n~n~ binary bit-vectors 176, 178, 180, and 182. Each bin2ry bit-vector, which is used ~or the purpose of characterizing a domain, iG
called herein an entity select vector. -In other words, the binary bit-vector is used to L~Lesent a selection (subset) of entities from a set of entities, where an entity is defined to be a unique value or grouping of values. For example, an entity may be a value in a value set or a row in a relation.
The entity select vector has a quantity of binary bits equivalent to the quantity of unique values in a given domain. For example, domain 160 for supplier identifiers contains ten supplier identifiers for ten corresponding binary bits. Each binary bit of the entity select vector has a position which directly corresponds to the position of each of the unique values in the supplier identifier domain 160. Additionally, each of the binary bits represents the presence or absence of each of the unique values in the subsets represented by the er~tity select vector 176. More particularly, the binary bit 236 of entity select vector 176 corresponds to the unigue value Sl at 216 of domain 35 1~0. The binary bit 236 has a value o~ "1" which indicates that the unlque value Sl is present in the subset represented by entity 6elect vector 176.
Likewise, binary bits 238, 240, 242 and 244 also indicate that the unique values S2, S3, S4, and S5 are present in the subset. Binary bits 246, 248, 250, 252 and 2~4 are set to "0", indicating that unique values S6 226, S~ 228, S8 230, S9 232, and S10 234, respectively, are not ln the subset. The binary bit vector represen-tation of the unique values of the domain 160 i8 a 10 short-hand way of representing a subset of the domain 160 by indicatinq those uni~ue values which are present~
I, in the subset. Thi6 aspect of the present invention is ; an important one. By only representing a subset o~ a ~' domain, all values of a domaln need not be associated 15 with a column. More particularly, only the entity select vector which corresponds to a particular column of a relation, and not the domain, needs to be associated with the column.
For each domain containing unique values in the 20 Suppliers relation, a different entity select vector 176, 178, 180, and 182 i9 used to represent subsets o~
the domains. ~ach subset representatlon is directly - ~sæociat~d with a column of the relation. More particularly, entity select vector 176 is aseociated 25 with column 168, entlty select vector 178 is associated with column 170, entity select vector 180 i5 associated with column 172, and entity select vector 182 is associated with column 174. Columns 168, 170, 172 and 174 correspond to the columns of the suppliers relation 30 63 of F~G. 2.
Referring to the set of binary bit-vectors at 260, each binary bit-vector 184, 186, 188, 190 and 192 corresponds to one o~ the values indicatQd to be present in the 6ubset by the entlty select vector 176. In the pre~erred ;~ lim~nt, the binary bit-v- tor is called a row use vector because lt indicates the presence or absence of a unique value in one or more rows of a column in the relation. The row use vectors are combined into a row use set 260 to form the binary representatlon of the column 168. The order o~ row use vectors occurs in the same order as the order of values characterized by the entity 6elect vector. I~ore particularly, the lower order binary bits of the entity select vector have been arbitrarily assigned to correspond to the left most row use vectors of the row use set, and the higher order binary bits of the entity select vector correspond to the right most row use vector3 of the row use set. The implied mapping between the bits set to "1" of the entity select vector could have just as easily been reversed. For example, the lower order binary bits may correspond to the right-most row use ~ectors of the row u6e set.
Binary bit 236 o~ the entity select vector 176 corresponds to row use vector 184 of the row use set 260. Likewise, binary bit 238 of the entity select vector 1~6 corresponds to the next row use vector 186 of the row use set 260. The system interpreting the entity select vectors and the row use set is pre-programmed so that the binary bits indicate unique values present in 2~ the subset. The implied ordering scheme is illustrated by the dotted lines 268, 270, 272l 274 and 275, which show that the binary bits in entity select vector 176 correspond to the row use vectors 184, 186, 188, 190 and 192, respectively. Likewise, the binary bits in the entity s~lect vector 180, which represent the unique values "10", "20", and "30'`, are mapped in an implied manner to the row use vectors 204, 206 and 208, respectively .
It is important to note that each row use vector 3 5 can indicate the presence or absence of a unique value in one or more rows of a column of the relation. For example, in row u6e vector 206, corresponding to the unique value "20`', "1" bits lndicate the presence of the v21ue "20" in the first and fourth rows of column 172.
Because the binary representation of each of the -- columns is represented by an entity select vector to unique values, the columns 168, 170, 172 and 174 of the relation need not be represented as 2 set of actual values in the database memory. Only the row use sets and the corresponding entity select vectors need be stored iil the memory of RDMS 10 (e.g., memory 18, FIG.
2). The actual value6 are stored in a more permanent memory fiuch as a hard di6k (e.g., external device 12, FIG. 2). Thus, for each relation in th2 databa6e, a 6erie6 of entity select vectors representin~ subsets of the various domains pertaining to the particular relatlon, are stored along with their corr~crnn~tn~ row use sets which depict the columns of the relation. FIG.
5 and FIG. 6 represent the entity select vectors and ; 1 20 their corresponding row use sets for the Parts Relation 65 (FIG. 2) and the Shipments relation 67 (FIG. 2).
, I Referring to FIG. 5, domains 282, 286, 290, 294 and ¦ 298 correspond to the domains for part identifiers 68 (FIG. 2), part names 72 (FIG. 2), colors 76 (FIG. 2), numbers 78 (FIG. 2), and cities 74 (FIG. 2) respectively. Note that the numbers domain 294 and the city domain 298 of the Parts relation (FIG. 5) ~re identical to the number6 domain 164 and the city domain 166 of the suppliers relation (FIG. 4). An efficiency 3 o of the present invention is that the unique value6 of the domains need only be represented once. In other words, the unique values in the numbers domain 164 (FIG.
4) and the numbers domain 294 (FIG. 5) are the same domain of unique values of numbers. I,ikewi6e, the city 35 domain 166 (FIG. 4) and the city domain 298 (FIG. 5) are 13~8601 the same domaln of unique values of cities. The representation of the subsets ~or these domains i5 different. For example, the entity select vector 180 (FIG. 5) represents that the unique values "10", "20"
and "30" of the numbers domain are in the relation of 6uppliers (FIG. 4~, whereas the unique values "12", "14", "17" and "19" are in the relation for parts (entity select vector 296, FIG. 5). Thus, instead of having to store two complete sets of unique values 164 ~nd 294, which are identical, the system only stores one version of the numbers domain and two entity select vectors, namely, 180 and 296, to represent 6ubsets of the same domain. Likewise, the system need not retain two unique sets of Yalues for cities, 166 and 298, instead, the system maintains only one version of the city domain and two entity select vectors, 182 and 300, to identify the subsets of cities in the associated relations. In fact, any number of entity select vectors may be assoclated with a partLcular domain. For example, in FIG. 6, the domain for numbers 328 is also referenced; however, the entity select vector 330 depicts a different subset of unique values from the entity select vectors 180 (FIG. 4) and 296 (FIG. 5).
EntLty select vectors are efficient subset representations of domains ~ecause the same domain need not be represented more than once and the actual values of each domain can be uniquely represented by the entity select vector ' s binary bits.
The columns of the Parts relation 302, 304, 308, 312, 316 (FIG. 5) are represented in the relational database by the row use sets 304, 306, 310, 314, 318, respectively. As in the description for the suppliers relation (FIG . 4 ), the row use sets contain row use vectors which are each associated with a unique value 3~ 1Dd1O.. t~3d to ~e pro~oD~ 1D ~, ~Ub~3t o~ ~ do~ D by ~D
-31- 1338~1 entity 6elect vector. Likewise, FIG. 6 depicts a binary representation for shipments.
The domain6 associated with each of the relations with their unique sets of valuQs are stored on the external device 12 ~FIG. lA), and the binary representation of each of the relatlons (FIGS. 4, 5 and 6) which make up the relational database of FIG. 2 are stored in memory 18 of the RDMS 10 (FIG. lA). When the system performs a relatioffal operation on the relational database, only the binary representation of the relations need be referred to. A detailed discussion of this process will follow shortly.
FIG. 7 is a flow block diagram of the steps which are performed by the RDMS 10 for creating a binary representation of a relational database.
Referring to FIG. 7, a detailed discussion of the BINARY REPRESENTATION routine is now provided.
Specifically, during block 344, each domain necessary for specifying unique values in the relations of the relational database is identified. Input commands, are interpreted by the command interpreter 28 (FIG. lA) for creating domain identifiers in the memory 18 (FIG. lA) of the RDMS lO (FIG. lA~. Additionally, the system reads each of the instructions above, and an empty value set ls created for each of the domains listed in the instruction. ~he actual unique values associated with each domain are loaded into memory in a later step of this routine. During block 346, the system determines whether any more domains for the database need to be specified.
Assuming that all the domains to be referenced by the database are specified, the 6ystem, during block 348, identifies a table of the relational database, and the RDMS 10 builds an entry in the system relation (see ~art Vl). Duri~g b ook ~50, the sylit~ ntlfle~ -13386~1 column, as with the table identlfied in block 348. In addition, the RDMS lo build6 on entrle6 in the system relation identifying each column, and the RPI~ 22 via the BBVP 14 creates empty entity select vectors associated with each of the domains. The entity 6elect vectors will have their binary bits set when the columns of the particular relations have their values loaded into the system. During block 352, it Ls decided whether to ldentlfy another column for the particular t2ble lden-tified in block 348. Processing continues at blocks 350 and 352 until all of the relational columns for the particular table identified during block 348 have been identified. During block 354, the system asks for the next table or relation of the database. Assuming that a number of different tables exist in the relational databa6e, blocks 348, 350, 352 and 354 are processed until all of the tables and their related columns have been specified. During blocks 348, 350, 352 and 354, system identifiers, identifying each column of each t 20 relation, are constructed. These identi~iers are discussed more thoroughly in part VII, j During block 356, the system loads a file ¦ associated with each table of the relational database.
The f ile of the table is assumed to exist in the external device 12 of the system deplcted in FIG. 1.
When the relational database has been completed, the system loads the file representations of the relations into external device 12, where they reside until summoned by the RDMS system 10. During block 358, RPtr 22 instructs the external device 12 to transfer the first column associated with a particular table of the relational database vi'a bus 30 to the RPU 22. The particular column is retrieved by referring to certain system identifiers (i.e., AID, RID identifiers, Part VII). When the first value of a particular column is l33s6al brought into the RPU 22, a row use vector associated with the value ls built by the BBVP ~4. Additionally, a bit i5 set to "1" in the first position of the row use vector to indicate that the value occupies the first row of the first column of the relation. For each uniquc --- value of the column, a new row use vector is created and a binary bit associated with a particular row of the column, where the unique value re6ides, is set to "1".
Additionally, as each unique value of a particular column is brought into the RPU 22, the binary bits of the entity select vector, as60ciated with the column, are set to "1" to indicate the unique values of the domain re~erenced in the column.
If the same value appears in a second, third or fourth row, etc. o~ the column, the corrF-F:rnn-!~n~ bits of the row use vector associated with the unique value are set to "1" indicating the presence o~ the unique !: value in these rows of the column. Because the entity select vector has previously had the binary bit zo aGsociated with the unique value set to "1", it need not be set again. This process of converting the column into a row use set continues until all of the values of the column have been represented by binary bits. When the row use set for a column is completely constructed, z5 the RPU 22 ~ommands the external device 12, to bring the next column, via bus 30, into the RDMS 10. This process continues ~or all o~ the oolumns associated with a particular table until the row use sets for the table are completely constructed. During block 360, the system determlnes i~ any other tables need to be binary repre3ented by the I~DI~S 10, assuming a single input file corresponds to a singlê table. If other tables are to be constructed, then blocks 356, 358 and 360 are performed until all the columns o~ the next relation are represented in their binary ~orm. After all of the -binary bit vectors ( i . e ., row Use vectors and entity select vectors) are constructed, the RPU 22 may summon the bit vectors to bit vector encoder BVE 16 of the RDMS
10 to encode the bit vectors lnto compressed lmpulse format. The steps for compressing the bit vectors lnto compressed impulse formats are thoroughly tl~cr~ccPd in Glaser et al. RPU 22 then instructs the bit vectors to be sent via bus 48 to the memory 18 where they are stored .
Eventually, all of the relations of the relatLonal database stored in external device 12 will be converted into the binary represQnted form, and this form is stored in the memory 18 of the RDMS 10 in the compressed mpulse format or in the l~n~ ,-.3ssed form. During block 362, the system returns to the program which called the BINARY REPRESEU~ATION routine (FIG. 7).
Once the binary representatlon of the relations is completed, the relations can be updated by two utility functions callea INSERT and DE~E~E, which will be discussed in PAR~ V of the specification. The INSERT
function, perhaps with appropriate modification to enhance efficierlcy, can also be used to load additional rows into the relation. Additionally, the relations in their binary represented form, can be manipulated via - the relational operations SELECT, JOIN and PROJECT, also to be discussed in PART V.
A. Example of ~,~nt-rat~n~ a B1nAr~ Repr~ct~ntatlQn of a ~l ation Referring to FIGS. 1, lA, 2, 4, 5, 6, 7, 8A, 8B, 9A, 9B and 9C, a detailed example for generating a binary representation Qf a relation is now ~crllcc~.
Referring to FIG. 2, a hypothetical relational database is Ghown. After the BINAKY k~;~Kh~hl~lATION routine (FIG.
7) is performed by R~U 22 (FIG. lA), a binary .
~ ~J u ~J v v ~
~ ~ 3~\
1 represented database is formed as shown in FIGS. 4, 5 and 6. This example has been designed to emphasize the steps of the BINARY REPRESENTATION routine for constructing the binary representation of a relational database in FIGS. 4, 5 and 6. In this example, the assumption is made that the reader understands instruction formats. For background in this area, please refer to Date, Introduction to Database sYstem Vol. 1, 4th Ed., 100-107 tl986).
Referring to FIG. 2, the system of FIG. 1 creates a relational database as shown in the tables 63, 65 and 67 (FIG. 2). These tables are constructed as inputs to the I system. Once the input process is complete, the system ! 15 calls the BINARY REPRESENTATION routine (FIG. 7). The system, during block 344, creates the domains referenced by the relational database. Input instructions, specify the following domains; suppliers identifiers 66, the parts identifier 68, person names 70, part names 72, city names 74, colors 76, and numbers 78 (FIG. 2). The input instructions in pseudo code look like:
(A) Create Domain Supplier Identifiers;
(B) Create Domain Parts Identifiers;
(C) Create Domain Person Names;
(D) Create Domain Part Names;
(E) Create Domain City;
(F) Create Domain Colors;
(G) Create Domain Numbers.
As the system reads each of the instructions above, an empty value set is created for each of the domains listed. The creation of the empty value set for the various domains specified in the above instructions are depicted in the results table of FIGS. 8A and 8B. Each row of t~e results table represents a new value set created as a result of one of the input commands.
When the command interpreter 28 of the RDMS 10 (PIG. lA) interprets the instruction (A) in block 344, RPU 22 creates an empty value set (364, FIG. 8A). Next, in block 346, the system determines whether there are any other domains to be identified by the system. In fact, there are other domains to be identified as specified by the instruction6 listed above; thus, block 344 i8 called. During block 344, the command interpreter 28 evaluates statement B and RPU 22 creates an empty value 6et to reference the domain of part identifiers (366, FIG. 8A). In block 346, the system determines that there are still more domains to be identifLed, and thus, block 344 is called. During block ; j 344, the system interprets the next instruction (C), which is for the "person names" domain. The RPU 22 (FIG. lA) generates an empty value set (368, FIG. 8A).
BBVP sets all the vectors to "0" to indicate the empty entity select vector (368, FIG. 8A). In block 346, the system determines that there are still more instructions for identifying domains. Proces~ing continues at block 344, and an empty value set for part names is generated (370, FIG. 8A). Block 346 determines that there are still more domain instructions to be processed and processing continues at block 344. During block 344, an empty value set corresponding to the city domain is created (372, FIG. 8~). During block 346, the system determines that there are still more commands for identifying domains, so processing returns to block 344.
During block 344, the system generates an empty value set corresponding to the domain ~colors~ (374, FIG. 8B).
Block 346 determines that there is one more instruction left (G) for creating a domain, and thus, block 344 is called. During block 344, the ~PU 22 creates an empty value set for the ~numbers" domain (376, FIG. 8B).
13386~
When proces6Lng is completed for all o~ the commands illustrated above, processing continues at block 348, during which a -table to be created in the relatlonal database is identified. Speclfically, the system is presented with the following command:
CREATE TAB~E SUPP~IER
(SUPP~iIERS IDi PERSON NAME; STATUS: CIT~) i which indicates that a table for 6upplier6 is to be identified; specifically, an identifier for the 10 suppliers table is stored in memory 18 of the RD~S 10.
Then, during block 350, the system generates an ~ identifier for the first column of the relation, which ; is also stored in memory 18. S~e--~fl~-~lly, it interprets the command above, and a suppliers ID
15 identifier is stored in memory. (Identifiers to be discussed in PART VI~. During block 352, the system ~Ptr~ n~c that the command requires that other columns be identif~ed for the relation, 80 processing continues at block 350. The 6ystem identifies the column for 20 "person names" and stores an identifier indicating such in the memory 18. Additionally, an empty entity select vector is created, which is associated with the "per60n names" domain. In block 352, the system determines that there are still other columns to be identified, and 25 thus, processing continues in block 350. In block 350, -- the system stores an identifier for the next column a6soclated with "status" into the memory 18. Also, an empty entity select ~ector associated with the "status"
domain is created. In block 352, the system determines 30 that there is still a column ~ -~n~n~ to be ldentified;
6pecifically, during block 350, the system Gtores an identifier for the column associated with "city".
During block 350, an empty entity select vector associated with the "city" domain is created. Durlng 35 block 352, the system de~ ni~s that no more columns .
1338~1 need be identified, and thus, processing continues at block 354, which determines whether there are any more tables to be identified for the relational database.
The following instructions can be 6peclfied:
CREATE TABLE PARTS
- - (PART ~D; PART NAME: COLOR; WEIGHT; CITY);
CREATE TABLE SHIP~ENT
(SUPPLIER; PART ID; QUANTITY);
which are for identifying the relatlons and their columns for both the PARTS and bn~ rNlb tables. For purposes of this example, however, we will assume that the syst2m processes blocks 348, 350, 35a and 354 to properly 6tore the identlf iers in the memory 18 f or both the PARTS and b~ NlS relations, and to construct empty ent~ty select vector6 for the columns listed in the instructions above.
Assuming that all the identifiers for the tables, domains and columns have been ~pecified, the RPU 22 during block 356, instructs the external device 12 to ; 2~ download the files associated with each table of the relational database via bus 30 to the RDMS 10 (FIG. lA).
pPrifl~ ~11y, during block 358, the byte values of each '~ column of the relation are sent to the RPU 22 2nd the ~ppropriate vectors of the row use sets are built vla BBVP 14, and the binary bits of the entity select vectors are set there as well. The following section i8 a detailed discussion on the construction of the row use vector6, entity select vectors, and value sets of the relational database in FIG. 2.
B. F le of .~ ildin~ a Bin~rY 17~ esented Rela~Lon ~ IGS. 9A, B and C are results tables dep~cting the formation of the binary representation of the suppliers relation of FIG. 4. Each row of the FIGS. 9A, B and C
depicts an additional row use vector associated with one column in the relation.
Referring to row 378 of the results table (FIG.
9A), the first value S1 (69, 80 FIG. 2) i5 inserted into the first ordinal position of the value set for suppliers identifiers. Additionally, the first binary bit of t~e entity 6elect vector 417 associated with the "suppliers identifier" domain is inserted and set to ''1ll to indicate that the unigue value S1 is referenced in the supplier6 ID column. The column for the suppliers i identifier re~[uires a new row use vector 381 to be generated and the first bit of the row use vector 381 is set to "1" to indlcate that S1 occupies the first row of the column. The process of inserting bits into a row use vector, creating row use vectors, and setting bits in an entity select vector is controlled by a routine called INSERT which will be fl;F~ ced more thoroughly in part V of this specification. This process i8 repeated for all the values S1 to 510 in the value set for supplier identifiers, a6 indicated in rows 380-386 of FIG. sA. A new row use vector is added for each value ,l I present in the subset, as indicated at 383-389, and a i ~ new bit i6 added to the entity select vector 417.
Additionally, five more bits, set to "o", are added to 2s the ordinal positions of the entity select vector --~ coLLe,,l,onding to the new values of the value set.
Referring to FIG. 4, the row use set 260 and the entity select vector 176 have now been generated by the BE~VP 14 via the BINA~Y F~EPF~ESENTATION routine (FIG. 7) .
The next column of the suppliers table is evaluated by the BBVP 14. The next column in the relation is the "person names" column ~82, FIG. 2), and the first value of the "person names" column is Smith (69, 82, FIG. z).
The value Smith i8 inserted to the first position of the 35 "person names" value set. Additionally, the entity select vector 419 associated with "person names" has a first binary bit inserted and set to "1" to indicate that the unique value for Smith is referenaed in the column for person names. A row use vector for Smith has not been created, and thus, a new row use vector 401 is generated (388, FIG. 9A). This process is repeated, a6 indicated in rows 390-396, for the next four names added to the "names" value set, with a new row use vector added for each name, as indicated at 403-409, and a "S~ bit added to the entity select vector 419.
The order of the row use vectors 409, 405, 407, 403 and 401 corresponding to Adams, Blake, Clark, ~ones and Smith is in the order of occurrence of the binary bits set "1" in the entity select vector 419. The values Baker, Fabel, Rahn, ROBS and Young are added to the "person names" value set. Specifically, the values are added in an order corresponding to the ordering (396, FIG. 9B). Binary bits set to "0" are also inserted into entity select vector 414 in the ordinal positions corresponding to the newly added unique values. The new binary bits are set to "0" to indicate the remaining values do not occupy the column. The row use vectors 409, 407, 405, 403 and 401 correspond to the row use set 262 of FIG. 4 and row use vector 262 corresponds to entity select vector 178 of FIG. 4, which is the same as -- the entity select vector 419.
The next column of thc input suppliers relatLon is for the "status'~ (84 , FIG. 2), and the fLrst value of the column i8 "20" (64, 84 FIG. 2). The value "20" is inserted lnto the first position of the value set for status numbers 421. Additionally, a first bit is inserted to entity select vector 421 and set to "1", indicating that uni~ue value "20" is referenced in the relational column. A new row use vector 411 is created, and the first binary bit of the row use vector is set to "1~ indicating the presence of value "20" in the first row of the column (398, FIG. 9B). The process 15 repeated, as lndicated in rows 400-406 of FIG. 9B, for each value in "status" set, resulting in row use vectors 4 11-4 15 .
Note that at row 404 the next value in the column (73A, 84, FIG. 2) is 20. The value 20 already exists in the value set for numbers, and thus, it need not be added again. Additionaliy, the entity select vector 421, for numb~rs, need not be set because a binary bit corr,?r,pQnrl;ng to unique value 20 has already been set to "1" in a previous step. This value also corresponds to an already existing row use vector 411. Binary bits set to "0" are added to row use vectors 413 and 415 to indicate that the values 10 and 3 0 do not occupy the fourth row of the relational column for status. A
binary bit set to "1" is added to the row usc vector 411 to indicate that the unigue value 20 is also in the fourth position of the relational column ~or status.
The last column of the relation for suppliers is the "cltles" column. The fir6t value of the "cities"
column is London (69, 89 TIG. 2). The value London is placed into the first position of the value set for "cities". Additionally, the entity select vector 423 as with the names of cities has a first binary bit inserted and set to "1" to indicate that the city London is referenced in the relational column for cities. A row use vector 417 is created and a binary bit set to "1" i~
added to the row use vector to indicate that London occupies the first row of the column for cities (408, TIG. 9C), "Paris" is added (row 410) in the same way.
Since "Pari6" and "London" occur twice, the row UBe vectors have bits added, as indicated, at rows 412 and 114 .
l33s~al The last value in the column for clties i8 Athens (75, 86, FIG. 2). The value Athens doe6 not exist in the cities value set, and 80 it is added. Specifically, Athens is placed intc the first position of the value 5 set corresponding to the lexical ordering of the city names. The entity select vectcr 423 for citics has a third binary bit in6erted and set to "1" in the first ordinal position of the entity select vector to lndicate that Athens is referenced in the column for cities. A
new row use vector 421 is added to the row use set.
Binary bits set to " 0 " are added to the row use vectors 417 and 419 to indicate that the unique values London and Paris do not occupy the fifth row of the column.
The new row use vector contains five bits and the fifth bit is set to "1" to indicate that the ~ifth row of the column contalns the value Athens. The values Cleveland, Fresno, ~arrisburg, Los Angeles, New York, Rome, and S~n Francisco are added 'co the value set from the input column. The values are arranged in the value set according to a lexical ordering. For each additional value, a corresponding binary bit set to "0" is inserted ~t a corresponding ordinal position of the entity select !~ ¦ vector. The binary bit6 set to "0" indicate that the values are not referenced by the cclumn (418, FIG. sC).
The row use vector6 421, 417 and 419, corr~onrl~n~ to Athens, London and Paris, are in the order of occurrence ccrresponding to the binary bits set in the entity select vector 423. Additionally, the row use vectors 421, 417 and 419 correspond to the row use set 266 of FIG. 4, and entity select vector 423 is associated with the entity select vectojr 182 (FIG. 4).
Referring to FIG. 7, when all of the relations of the relational database have been generated, processing ccntinues at declsion block 3 60 in which the RPU 22 determines whether there aFe more files associated with 1~38601 the PARTS and the b~L~ .LO tables of the relational database. Processing continues at blocks 356 and 358 until the binary representation for the PARTS and SHIP~qENTS tables are constructed (418, FIG. 9C). The binary representations for the PARTS and ~ rN'l~
table6 are generated in the same fa6hion as the SUPPLIERS table tFIGS. 9A, 9B and 9C) discussed above.
Processing returns during 362 to the calling routine of the BINARY REPRESENTATION routine (FIG. 7).
Ii lo v. o~eratiOns pt~rf( - on ~;n~rv Repre~c~ntatinnc of RelatiOnc FIGS. 10A, 10B, 10C, 10D, llA, llB, llC, llD, 12, 13, 14, 15A and 15B, 16, 17, 18, 19, 20, 21, 22A, 22B, 1~ 22C, 22D, 22E, 22F, 2nd 22G depict flowcharts of operations performed on relations in their binary represented form. Specifically, FIGS. 5A, B, C and D
are flow diagrams of the utility function called INSERT.
FIGS. llA, B, C and D are flow diagrams of the function 2p DELETE. FI~ is a flow diagram of the relation operation called SELECT. FIG. 16 is a flow diagram of the relatLonal operation called PROJECT, and FIGS. 22A, 22B, 22C, 22D, 22E, 22F, ~nd 22G are flow block diagrams of the operation JOIN. The functions INSE~T and DBLETB
are b~Ci~l ly for maintaining and manipulatlng data within the relations. The relational operations, SELECT, PROJECT and JOIN are for generating resultant relation6, and in the preferred t~mho~9ir-nt, in a binary represented form. Only three relational operations SELECT, PROJECT and JOIN are discussed in order to simplify this disclosure and to provide a basic understanding on how relational operations are preformed on binary representations of relations. For e~cample, the operations PRODUCT, UNION, INTERSECTION, DIFFERENCE
35 and DIVIDE, which are tdescribed in Date, "An ., Introduction To Database Systems, " Vol. 1. (4th ed., 1986), are not discussed in this specification, due to their complexity. However, one skilled in the art will readily understand theLr operation on binary representations of relations af ter reading the section on SEIiECT, PROJECT and JOIN Thi6 part of the specification is broken up into ive subsections, each dealing with a separate operation as ~9~cc~sed above.
For purposes of this example, it is assumed that one or more relations have been loaded into the RD~IS 10.
Here the relations are encoded by the RPU 22 via BBVP 14 into the binary representations o~ the relations. Then the binary represented relations are either sent directly to memory 18 for storage, or they are sent to the BVE 16 where the bit strings of the binary representations are encoded into compressed impulse formats as discussed in Glaser et al. The resulting compressed bit strings are then stored to memory 18.
They stay in memory 18 until a request to perform a relational operation (i.e., INSERT, DELETE, SEI,ECT, PROJECT, JOIN) iæ initiated and is interpreted at the command interpreter 28. In their ~n~ 38ed ~orm, the binary representations o the relations of the rela-tional data base stored in memory 18 are shown in FIGS.
4, 5 and 6. It is also assumed that the bit vectors of the binary represented relation could also be ln compressed impulse formats. However, for ease of understanding, the bit vectors are processed in the uncompressed form. When the RPU 22 is ready for processing, the binary represented relation~s~ are brought to the ~PU 22 via buses 48 and 31. Here the specified relational operation is performed on the relation ~s) . I any Boolean operations need be performed by the relational operation, then the steps for processing compressed bit strings as set forth in Glaser et al. are preformed by the BLU 24. Once processing is completed at the RPU 22, the RPU 22 outputs a new binary represented relation. The new output relation i5 sent to memory buses 31 and 48, where the resultant relation resides until it is sent back to RPU 22 for further processing.
, A. I~E~
, 10 INSERT is a function which adds one value at a time to a relation. The neces6ity for performing an INSERT
operation occurs in three categories of cases. First, a unique value needs to be added to a domain or value set, and it needs to be added to a column of a relation.
3econd, a unique value already exists in the value set, and it needs to be added into a column of a relation.
Third, a value already exists in a column and lt needs to be added again to the column. Multiple values may be added to a value set or to a column; however, the INSERT
z0 subroutines must be processed more than once corresponding to each time a value is added. In all of the situations above, an assumption is made that the binary representation of the one or more relations to perform the INSERT are in RPU 22 ready for processing.
Additionally, the function INSERT can be used to add values to more than one column of a relation. The INSERT function is separately performed each time a value is added to the relation.
The flow diagrams in FIGS. 10A, B, C and D depict routines for processing any one of the three situations discussed above. SpecifiCally, the flow diagram in FIG.
10A is a routine for adding a unlque value to a value set, and adding the unique value to a column of a relation; the routine is called INSERT. FIG. 10B is a flow diagram of a subroutine for adding a unique value to the value set, and this routine is called INSERT
VALUE INTO VALUE SET. FIG. 10C i5 a routine for updating an entity select vector to reflect the addition o~ a unique value into a particular subset; this routine is called UPDATB SUBSET. FIG. 10D is a routine for updating a column of a relation with a new occurrence of a value; this routine is called ADD VALUE TO COL~MN.
The routine in FIG. 10D would by itself be used to insert a value into a column of a relation when the value already existed in a column. The flow diagrams of FIG. 10C and 10D are combined for inserting a value already existing in the value set into a subset of values of the value set and adding the value to a column of a relation.
Referring to FIG. loA, a more detailed description of the INSERT routine is now discussed. At block 422, the syst~m calls the INSERT VALUE INTO VALUE SET routine (FIG. 10B) to add a unique value to the value set. Once the new unique value has been added to the value set, block 424 is called to call the subroutine UPDATE SUBSET
(FIG. lOC~. This routine updates an entity select Vector corresponding to the value set 60 that the unique value is represented in an associated subset. During block 426, the system performs the ADD VALUE TO COLUMN
subroutine (FIG. 10D) to add the value lnto a specified column of the relation. Processing is completed: a value is added to the value set and to a column and the system returns at block 428 to the calling program.
Referring to FI~. loB, a more detailed description 3 0 of the INSERT VALUE l:NTO VALUE SET routine is now discussed. In 1~1Ock 430, the RPU 22 via the BBVP 14 determines the ordinal~ position in the value set at which the new, unique value is to be inserted.
E~sentially, the value set is stored ln a structure which i6 traversed to find the new value. The structure contains all of the unique values presently stored in the value set, and is built as 6uch to minimlze access time for finding values. The system locates a node of the structure, which corresponds to the ordinal position at which the new, unique value should be placed. During block 432, the system determines whether the value already exists. If the value already exists, then during block 434, the system returns to the calling program. Assuming that the value ls not in the struc-ture, then the new value is added to the value set by adding a node assigning an already existing node in the structur~ to incorporate the unique value. During block 438, processing return6 to the calling program.
Referring to FIG. lOC, a more detalled dlscusslon of the UPDATE SuBSET routine is now discussed. In block 440, the P~PU 22 via the sBVP 14 determines the ordinal position of the entity select vector which corresponds to the ordinal position of the unigue value in the value set. In decision block 441, the RPU 22 d.ot~rmlnf-c whether the unique value has been added to the value set. If the unique value has been added, block 442 is called. During block 442, the system inserts a blt to the entity 6elect vector at the ordinal position corresponding to the new unique value. Processlng continues at block 444. During block 444, the new bit added to the entity select vector ls set to "1" to indicate the uni~ue value in the subset. Returning to block 441, if the unique value already exlsts, block 443 is called. During block 443, the RPU 22 via the BBVP 14 ~lP~c~rm~n~c whether the bit in the entlty select vector has been set to "1". The blnary blt set to "1"
lndlcates that the colllmn contains this unique value.
Ii~ the bit has been set to "1", the processing returns to the calling program during block 445. However, if the bit is not set, then processlng continues in block .
444. In block 444, the new bit added to the select vector i5 6et to "1" to indicate that the unique value is in the subset. During block 446, processing returns to the calling program.
Referring to FIG. lOD, a more detailed r~l cr~lt q~ nn of the ADD VALUE To COLU~N routine is now r1;ccuc,:",9.
During b' ock 448, the RPU 22 via BBVP 14 counts the number of binary bits set to "1" in the entity select vector up to and including the bit at the ordinal position corresponding to the unlque value inserted.
This number i8 called "count". The "count" of binary bits set to " 1 " corresponds to the location of the row use vector in the row use set. During block 450, the ;~ ¦ system inserts the new row use vector at the position in the row use set corresponding to "count. " During block 452, the system appends a binary bit set to "0" to all of the row use vectors of the row use ~et. In block 454, the system sets the last bit of the new or selected row use vector to "1" to indicate that the new value i5 added to the last row of the column. During block 456, processing returns to the calling program. Returning to block 449, if a new row use vector is not required to be built by the RPU 22, then processing contLnues at blocks 452. During block 452, bits ~et to "0" are appendcd to the existing row use vectors and during block 454, the last bit of the ne~ row use vector is set to "1".
1. Detailed RY;tr~le fgr t~e INC~ t~nct~ nn Referring to the suppliers relation depicted in FIG. 4, a detailed example for adding a name to the value set of names (162, FIG. 4) and to a column (170, FIG . 4 ) corresponding to names in the relation is now discussed. In this example, the requirement exists to insert the name "zeus'~ to the value set ~epicted at 162 (FIG. 4), and to the end of the column at 170 (FIG. 4).
_ , ThLs exampl e has been constructed to illustrate all of the routines for inserting a value to a value set and to a blnary representation of a relation. FIG. 12 is a detailed results table depicting the various steps performed by the INSERT routine (FIG. 10A). The results table of FIG. lZ is broken up into three columns. From left to right, the first column depicts the existing value set, the second column i8 the entity select vector characterizing a subset of the value set, and the third column is the row use set representing the names column of the relation. Each column of the results table (FIG. 12) depict6 a change in either the value set, entity select vector, or row use set as the INSERT
routlne (FI~. 10A) is performed. The subroutines depicted in FIGS. 10A, B, C and D are transparent to the application. Only the following type of instruction is required:
INSERT
Into (Relation-Suppliers) Value (Zeus) This instruction is interpreted by the command interpreter 28 (FIG. lA) to add a new and unique value Zeus to the value set for names and to add the word Zeus to the binary representation of the relation for suppliers, currently stored in memory 18. Pursuant to the instruction, the RPU 22 brings the binary representation of the suppliers relation from memory 18 to the RPU 22. Additionally, the value set is brought from the external device 12 via bus 30 to the RPU 22.
Because the unique value Zeus is not part of the value set names, the RPU 22 calls the routine (FIG. 10A) to insert the value Zeus 'into the column of names in the relation and also to add the unique value Zeus to the value set of names, referring to the INSERT routine (FIG. 10A). During block 422, the RPU 22 calls the .
subroutine INSERT VALUE INTO VALUE SET (FIG. 10B).
Referring to FIG. 10B, at block 430 of the INSERT VALUE
INTO VALUE SET routine, the system determLnes the ordinal position in which the value Zeus is to be inserted. During block 432, the RPU 22 determines that -- Zeus does not exist in the value set, and in block 436, the RPU 22 adds Zeus to the proper node in the structure for the value 494 (FIG. 12). If a node did not exist in the structure, then a new node would be added and set to 10 Zeus. At block 438, the RPU 22 returns to block 424 of the INSERT routine (FIG. 10A).
During block 424 (FIG. 10A), the RPU 22 calls the subroutine UPDATE SUBSET (FIG. 10C). Referring to FIG.
10C, at block 440, the RPU 22 via the BBVP 14 determines 15 the ordinal position of the entity select vector which coLL~a~ ds tc the unique value Zeus in the value set.
The ordinal position is the last position of the entity select vector. In block 441, RPU 22 determines that the value Zeus had been previously added to the value set o~
; 20 n~mes, and processing continues at block 442. During block 442, the system inserts a bit to the entity select vector (494, FIG. 12) to indicate that a value exists in the last ordinal position of the value set. During block 444, the ~PU 22 sets the new bit to "1" in order 25 to indicate that the value Zeus is added to the subset (496, FIG. 12) . Then, in block 446, the RPU 22 returns to block 426 o~ the INSERT routine (FIG. 10A) .
Referring to block 15A, durinq block 426, the RPU
22 calls ADD VALUE TO COLUMN routine (FIG. 10D). During 30 block 448 of the ADD VALUE TO COLU~N (FIG. 10D), the RPU
22 determines the "count" of binary bits which are set to "1", up to and including the bit which corresponds to ordinal position corr~cr~-n~ 1 n~ to the new value Zeus.
"Count" is e~aual to 6Lx because there are six binary 35 bits set to "1" in the entity select vector; the bit corresponding to the value Zeus is the sixth binary bit 6et to "1". With the "count", the RPU 22 determines if there presently resides a row use vector which corresponds to the value Zeus. A row use vector doe6 not exist at position 6ix as detcrmined previou~ly at --- block 440 (FIG. l~C), and during block 450, the RPU 22 inserts a new row use vector at the sixth po6itlon of the row use set. During thi6 6tep, the RPU 22 counts, from left to right, six row use vector position5 in the row use set. The RPU 22 adds a new row use vector having binary bits set to "0". During block 452, the RPU 22 ~ppends binary bits set to "0" to the end of all of the row use vectors in the row use set (500, FIG.
12). Then, during block 454, the last bit of the newest and sixth position row use vector i6 6et to "1" to indicate that Zeus now occupie6 thc last row of the column for names in the relation for suppliers t502, FIG . 12 ) .
2 0 B . i2E~
DELETE is an operation which remove6 one value at a time from a binary representation of a relation and possibly from a value set. The DELETE operation occurs in three categories of case6. First, a unique value need not be removed from a subset; however, the unique value needs to be removed from the column. Second, a unique value exi6t6 in a column, and it need6 to be deleted from the relation and from a subset; however, it does not need to be removed from the value set. Thlrd, a unique value already exists in a relation and it needs to be deleted from the column from a corresponding subset and from a value set. Multiple values may be removed from a value set or a column; however, the DELETE routines must be processed more than once according to the number of times a value or values need -52- 133~6~
to be deleted. As In the case of INSERT, the binary representation of the relation and the value set are in RPU 22, ready for proccssing.
The flow diagrams in FIGS. llA, B, C and D depict routines for processing any one of the three situations discussed above. Specifically, the flow diagram ln FIG.
llA is a routine for removing a unique value from a value set and for removing the unique valuQ from a column of a relation; this routine i8 called DELETE.
FIG. llB is a flow diagram of a routine for removing a value only from a column. This routine ls called DELETE
VALUE FROM COLU~N. This routine by itself could be uaed to DELETE a value, one or more times, from a particular column. The flow diagram of FIG. llC is a routine for updating an entity select vector to reflect the removal of a unique value from a particular subset; this routine is called DELETE VALUE FROM SUBSET. The flow diagram of FIG. llD is a routine for removing a unique value from a value set: this routine is called DELETE VALUE FRO~
VALUE SET. The flow diagrams in FIGS. llB and llC can be used together to remove a value already existing in a column, and to remove the value from a subset of a value set without removing the value from the value set.
Referring to FIG. llA, a more detailed description of the DELETE routine is now discussed. During block 460, the RPU 22 calls the DELETE VALUE FROM COLUMN
routine (FIG. llB) to DELETE a value from a column of a relation. Once all of the occurrences associated with a unique value have been removed from a column, the row use vector asffociated with the unique value contains binary bits set to "o". If not all of the bits c~ the row usc vector are set to "0", then during block 461, the RPU 22 determines that processing ls complete ~nd returns processing to the calling routine at block 461.
On the other hand, if all the bits are set to "0", then block 461 determines that processlng is incomplete and processing continues at block 462. Becau6e the unique value is no longer re~erenced in the column of the relation, it can be removed ~rom the subset (or entity select vector) which depicts the column. Sper~f1c~lly, the entity select vector associated with the unique value of the column can be updated via a call to the DELETE VALUE FROM SUBSET (FIG. llC) during block 462.
Once the entity select vector has been updated, the RPU
22 cAlls block 462 (a) to determine whether there is "l"
bit set in the ordinal position corresponding to the value being deleted in any of the entity select vectors for the relational database. If there are, then the value is presently being used in other relations and, proces6ing returns to the calling program at block 462(b). i~owever, if there are no "1" bits set at the ordinal position corresponding to the unique value being deleted, then processing continues at block 464. During block 464, the DELETE VALUE FROM VALUE SET (FIG. ~D) is called to remove the unique value from the value set.
The value is removed from the value set only when none of the other relations in the relational database reference the unique value. When processing is completed, the system returns at block 466 to the calling program. The DELETE routine (FIG. llA) can be called successively to DELETE one or more value of a row of one relation.
Referring to FIG. llB, a more detailed de3cription o~ the DELETE VALUE FROM COLUMN routine is now discussed. During block 470, the RPU 22 determines which row use vector of the row use set ls aGsociated with the particular value to be removed from one row of the column. This operatLon can be conductcd by performing successive Boolean AND operations on the row use vectors and a binary bit only having a Gingle bit 13386~1 set to one at the ordinal position corresponding to the row position of the column. The bit is changed from "1"
to "0", in the appropriate row, indicating the absence of the value in the particular row of the column.
During block 472, the RPU 22 via the BBVP 14 determines whether all of the hits of the row use vector have been set to "0". If not all of the bit6 of the row use vector are set to "0", then a "DONE" signai is returned to the calling routine at block 474. If all of the lo binary bits of the row use vector a~e set to "0", then processing returns at block 478 to the calling routine.
Referring to FIG. llC, a more detailed discussion of the DELETE VALUE FROM SUBSET routine is now discu6sed. During block 480, the RPU via BBVP 14 ~l~t~ n~c the ordinal position in the corresponding entity select vector associated with the unique value whose row use vector has been deleted from the row use set. During block 482, the ~PU 22 via the BBVP 14 sets the binary bit in the entity select vector, associated with the unique value to "~" to indicate the absence of the uni~ue value ln the subset. In block 485, processing returns to the calling program.
Referring to FIG. llD, a more detailed discus~ion of the DELETE VALUE FROr~ VALUE SET routine is now discussed. During block 488, the system removes the value from the value set by removing The value in the value set structure. Additionally, the binary bit corresponding to the deleted value in all entity select vectors is also removed to account for the reduced size of the value set.
1. Det~ ~ l ed ~ E; ~ le for thf~ nT~`T.T"T'T~' O~era~ i on Referring to FI~. 13, the na~es of the suppliers of the suppliers relation depicted in FIG. 4 are shown 3~ - along with the row use set corresponding to suppliers name column in the relation. Essentially, thi6 example atarts where the "Insert Operation Example, " left off .
Zeus had been added to the names value set, and Zeus had been added to the column of names in the auppliers relation. In this example, it ls required to DELETE the name Zeus from the value set and to DELETE the name Zeua from the column (504, FIG. 13). FIG. 13 i8 a detailed results table depicting the various tas3cs performed by the DELETE routine ~FIG. llA). This example has been 10 chosen to illustrate all of the routines for deleting a value from a value set and from a binary representation of a relation.
The results table of FIG. 13 is broken up into three parts. From left to right, the first column 15 depicts the existing value set, including the uni~ue value Zeus; the second column is the entity select vector representing the subset of the value set for names corresponding to column for names, including the , , name Zeus; and the third column is a row use set which 20 represents the "names" column of the suppliers relation.
Each row of the results table (FIG. 13) depicts a change ln either the value set, entlty select vector, or row use set as the DELETE routine (FIG. llA) is performed.
The routines depicted in FIGS. llA, B, C and D are all 25 transparent to the application. The application provides only the following instruction to the system:
DE LETE
From (Relation-Suppliers) Where Name = Zeus 3 0 This instruction is interpreted by the command interpreter 28 (FIG. lA~ to DELETE a value from the column of names in the 'suppliers relation. 3ecause the value zeus only appears once in the column for names in the relation and in the database, the system will call the routine DELETE (FIG. llA) to remove the value Zeus 1338 ~01 from the column of names and also from the value set of names. SpecLfically/ during block 460 (FIG. llA), the RPU 22 call& the subroutine DELETE VALUE FROM COLUMN
(FIG. llB) to remove the value Zeus from the name6 column of the relation. Referring to FIG. llB, during block 470, the RPU 22 via the BBVP 14 changes the binary bit set to "1" to "0" in the row use vector to indicate that the value Zeus is no longer in the column for names (506, FIG. 13). In block 472, the system determines whether all of the bits of the row use vector have been set to "0". Zeus only appeared in the column once, ahd - thus, by changing the one binary bit to "0", Zeus is no longer represented in the column; all of the binary bits of the row use vector are set to "0". Thu6, in block 478, processing returns to block 462 of the DELET~
routine (FIG. llA) .
In block 4 62 (FIG . llB), the DELETE VALUE FRON
SUBSET routine (FIG. llC) is called to remove the unique value Zeus from the 6ubset depicted by the entity select vector. Specifically, during block 463, the row use vector associated with the unique value Zeus is removed from the row use set. Processing continues at block 480, during which the ordinal position of the binary bit associated with the value Zeus in the entity select vector is determined. Speclfically, the RPU 22 vi~ the BBVP14, during block 480 determines that the ordinal position of the value Zeus in the entity select vector is the tenth position. Then, in block 482, the system sets the tenth binary bit from "1" to "0" to indicate the absence of the value Zeus from the subset (510, FIG.
13). During block 487, processinq returns to the DELETE
routine (FIG. llA) at block 462 (a) . In block 462 (a), the RPU 22 determines whether there are any "1" bits set at the ordinal positions corresponding to Zeus in any entity select vector. The entity select vector6 .
corresponding to the value set of names for the entire relational databaGe are evaluated to determine if the unique value is referenced in any other 3ubset o~ the relational database . Because the other tables ( i . e ., parts (FIG. 5) and shipments (FIG. 6) ) do not reference the "names" value set, there is only one entity select vector and the bit corresponding to the ordinal position of Zeus has been set to "0". Processlng continues in block 464 which calls the DELETE VALUE FROI~ VALUE 5ET
lb routine (FIG. llD~ .
Referring to FIG. llD, during block 488, the RPU 22 removes Zeus ~rom the valuc 3et and the coLL~:~yundlng bit in the entity select vector is also removed (512, FIG. 13). Processing continues at block 490, during 15 - ~which the RPU 22 returns to the DELETE routine (FIG.
llA) at block 466, where processing returns to the cal l ing program .
c. ~.ES~ -To this point, we have discussed the operations INSERT and DEliETE, which are basically functions for updating binary representations of relations. The next operations, are query functions for finding relevant information about a relation or groups of relations.
The operations include SELECT, ~OIN and PROJECT, and are principally used for de~orm;nin~ a re3ultant binary relation. The resultant binary relations c~n then be converted into their byte value form for users to under-stand. This section concentrate6 on the operation SELECT which generateG a resultant binary relation for depicting which row or rows of a relation contain selected values Stated differently, given one or more value sets and a relation, SELECT determines the rows of the relation which correspond to a particular value or values, in one or more columns of a relation and the 133g601 result is depicted in a binary representation: a binary bit vector called a "6elect vector". A typical example of a SELECT operation might be for rlDt~ning which suppliers (i.e., Smith, Jones, Blake, Clark and Adams) are located in Athens (63, FIG. 2). A more detailed discussion on this query will be presented shortly.
Re~erring to FIG. 14, a ~low diagram o~ the SELECT
operation is depicted. During block 516, the RPU 22 via the BBVP 14 ( it i6 assumed a~ter thi6 point that any time proce6sing needs to be done on a bit vector that the RPU calls the BBVP 14 for proces6ing) determine6 the ordinal positions of one or more selected unique values, which are in one column of the relation, in a particular value set. In block 518, the RPU 22 determine6 whether the selected unique values are found in the value set.
If the selected unique values are not found in the value set, then proces6ing returns to the calling program during block 520. Assuming that the 6elected unique values are ~ound in the value 6et, then in block 522 a binary bit vector di6playing the ordinal positions of the selected values within the value set is generated.
Specifically, the binary bit vector contains blt6 6et to "1" at the ordinal positions corresponding to those of the 6elected values and the remaining bits OI the bit vector are set to '!0". In block 524, the bit vector generated in blook 522 is "ANDed" with the entity select vector, corresponding to the column in which the values re6ide, to determine whether the 6elected unique value6 are reLerenced in the column. In blocl~ 526, the RPU 22 determines whether the resultant bit vector has all bit6 o~ the resultant vector set to "0"; i.e., i~ the corresponding set is empty. I~ re6ultant bit vector i6 all ~eros, then the selected unique values are not in the column, and thu6, no 6elect vector can be gener~ted and processlng returns at block 528 to the calling program .
Assuming that the resultant bit vector, ~rom the AND operation step, is not empty (e.g., 60me bit6 are 6et to "1" in the bit vector), proce56ing continues at block 530 in which the RPU 22 determine6 count; the number of binary bit6, in the entity select vector, which are set to "1" up to and including the ordinal position of each selected value. For each unique valuQ, the count is then used to determine which row use vectors of the row use set correspond to the selected unigue values. During block 532, the row use vector6, corresponding to "count", are retrieved. Processing continues at block 536, in which the RPU 22 ~otormi~o~a whether one or more unique values were selected from the particular column over which this part of SE~ECT is processed. Assuming that only one unique value from a particular column was selected by the application, the RPU 22 returns the one row use vector, corresponding to the unique value, retrieved at block 532. Processing returns to the calling program at block 542. ~owever, if more than one unique value from a particular column was selected by the applicatLon ~or this operation, then during block 537 the Boolean OF~ operation i8 performed on the selected ro~ use vectors to determine a resultant relation. The Boolean OR operation is performed by the BLU 24 (FIG. lA). The steps for performing Boolean operations for compressed bit string is fully discussed in Glaser et al, which was referenced earlier.
During block 538, F~PU 22 determines whether the values selected are ~rom only one column o~ the relation. If all of the selected values are from one column, then the system returns the resultant select bit vector to the calling routine at block 542. If, during 35 block 538, it is determined that other values are 13386~1 selected from other columns of the relation, then processlng continues at block 53g. During block 539, RPU 22 determines whether any more row use vector for values need to be selected from other columns. If more values need to be processed, then processing continues at blocks 516, 518, 522, 524, 526, 530, 532, 536, 537, 538 and 539 until all of the row use vectors are processed and the entity select vector for each resultant column i8 generated. When there are no more cclumns from which to select values, then during block 540, a Boolean operation specified by the SEI,ECT
instruction is performed on the entity select vectors.
For example, the SELECT instruction might require the determination o~ whether one value, in one column of the relation, is associated with another value in a differen~ column o the relation. The select vectors for the two values would be ANDed together to determine whether both values reside in the same row of the relation. It should be noted that any of the Boolean operations (i.e. OR, XOR, etc.) could be used to calculate the desired result~i. For purposcs of discussion, however, the operation is assumed to be AND.
(For a more detailed discussion, refer to detailed ex-amples. ) In summary, the flow diagram of FIG. 14 depicts the SELECT operation for returning a resultant entity select vector (e.g. binary bit vector) for depicting which rows o a relation contain one or more selected values.
Once the resultant select vector has been determined, the rows of the relation corresponding to the selected values can be displayed to the user. By having the row positions of each column of the relation, which contain the selected values, the RPU 22 determines which row use vector of the corresponding row use set contains a binary bit set to "1" in the ordinal position -corr~cpon~ i n~ to the selected row position. An indexing function is performed, which determines the position of the selected row use vector in the corr~rnn~1 ~ ng row use set. Specifically, RPU 22 counts the number of row use vectors in the row use 6et up to and including the selected row use vector. This number corresponds to the ordinal position of the binary bit set to "l" in the entity select vector, which references the unique value of the relation. The unique value i8 retrieved from the vaIue set. For each column of the relation, the value in the selected row is determined and displayed for the user .
In another embodiment of the SE~ECT operation, a step is added for selecting the unique value according to whether the selected value is greater than, less than, equal tc, not equal tc, equal to or greater than, or equal to or less than a prespecified value selected by the application program or user.
In another ~mhn~ nt to be discu6sed in PART VI, the actual values in the selected rows of the relation are determined and displayed to the user via a mapping function through a vector called thQ "entity use vector~'. For each column o~ the relation, an entity use vector is maintalned for identifying a value in the value set which corresponds to the value at a particular row of the column.
l. Pe~ 11 ed r le of a Two-Csl sEr~r~çT
for Two V;l~uec.
This example relles principally on the supplier6 relatlon of FIG. 2. Specifically, it is assumed for purposes of this exampIe that the suppliers relation (63, FIG. 2) is in its binary represented form as depicted in FIG. 4 and this representa~ion of the relatLon resides in memory 18 to be prooessed by RPU 22.
The SELECT operation in this example involves two columns, namely, Suppliers ID column (168, FIG. 4) and the City column (174, FIG. 4). Speci~ically, the query entered by the user is to determine all information by a supplier ~rhose number is S5 and whose location i6 Athens. The instruction for this query is:
SELECT ID#, CITY
FROM S
WHBRE (ID# = S5) AND
(CITY = ATHENS) This query is interpreted by the command interpreter 28 (FIG. lA) and RPU 22 retrieves the row use vectors associated with the unique values S5 ~nd Athens, and then performs a Boolean AND operation to determine the resultant relation. FIG. 15 is a detailed results table depicting the various steps performed by the SELECT
routine (FIG. 14). The results table of FIGS. 15A and B
are broken up into six columns. From left to right, the first column depicts the value set associated with the ~elected unique value, the second column iB a bit vector corresponding to the ordinal position of the selected unique value within the value set, the third column is the entity select vector associated with the column in which the value re3ides, the ~ourth column i8 a resultant bit vector determined by ANDing the ordinal position bit vector with the entity select vector, the fifth column is the row use set associated with the column in which selected unique value resides, and the last column is the select vector determined by 3 o performing a Boolean OR operation on all of the row use vectors corresponding to the selected unique vAlues from one column.
The query in this example is a simplified SELECT
query to minimize the explanation and steps reguired to perform the operation. However, generally, the query -63- 13386~1 will be over several value columns of the relation (see the next example for select). This example could easily be expanded to determine which of the suppliers IDs (i.e., S1 through 55~ is located in Athens. The row use vector corresponding to Athens indicates the rows of the relation which contain the supplier IDs associated with Athens. For simplicity, in this example we are cnnr~rn~l with only one of the supplier IDs, namely S5, and whether it is associated with Athens.
Referring now to FIGS. 14, 15A and B, a detailed example of the query for de~-~rm~n~n~ whether supplier ID
S5 i8 located in Athens is now discussed. Specifically, during block 516, the system determines the ordinal position of S5 in the value set for suppliers IDs.
Essentially, the system traverses a structure associated with the suppliers ID value set and deter~mines the 6pecific node in the structure where the S5 value resides. During block 518, the system determines whether or not the value S5 has been found in the value set. The value S5 is located in the structure, and thus, it is within the value set for the suppliers IDs.
In block 522, the system creates a binary bit vector for representing the ordinal position within ~che value set associated with value S5 (544, FIG. 15A). As shown at row 544 of FIG. 15A, the fifth ordinal position in the new binary bit vector is set to "1" corr~ p~n~ln~ to the ordinal position of the value S5 in the value set.
During blook 524, the new binary bit vector i~ ANDed with the entity select vector associated with the suppliers identifiers in the suppliers relation (546, FIG. 15A). In block S26, the resultant bit vector from the AND operation is evaluated and it is determined that bit vector is not an empty set. The resultant bit string contains a oinary bit set to 'tl" (548, FIG. 15A).
In other words, the unitaue value S5 is located in the .
133860~
subset re~erenced by the entity select vector of the suppliers column, and thus, it is in the relation of suppliers. In block 530, a count is performed on the entity select vector to determine the number of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique value S5. The RPU 22 determines that there are five binary bits set to "1", and thus, the unique value S5 is associated with fifth row use vector of the row use set associated with suppliers IDs t260, FIG. 4). In block 532, the row use vector associated with unigue value S5 is retrieved from the row use set (260, FIG. 4). The row use vector for S5 is a binary bit vector containing four binary bits set to "0" and a fifth binary bit set to ~ , indicating that the value S5 resides in the fifth row for the column for the suppliers IDs (550, FIG. 15A). During block 536, RPU 22 determines that there is only one value selected from the column for suppliers IDs. Processing continues in block 538, in which the RPU 22 determines that more than one column is involved in this 6elect operation, i . e., the supplier ID
and city columns. In block 539, the RPU 22 determines that the value Athens has also been selected by the user in this query, and thus, processing returns to block 516. In block 516, the RPU 22 determines the ordinal position of Athens in the value set for cities.
Essentially, the system traverse6 the value 6et for cities, and during block 518, the RPU 22 determines that Athens is in the value set for cities. During block 3 0 522, a binary bit vector ls constructed to indicate which ordinal po6ition of the value set for cities contains the city Athens ( 552, FIG . 15A) . Specifically, the system creates the binary bit vector, which shows a binary bit set to "1" in the first ordinal position 35 (552, FIG. 15A). During block 524, the 3001ean AND
-1~38601 operation is performed between the new binary bit vector and the entity select vector associated with cities for the suppliers relation (556, FIG. 15A). In block 526, the RPU 22 determines that the resultant vector does not contain all "o's" (558, FIG. 15B), and thus, the value Athens is determined to be in the suppliers relation.
If the value Athens was not located in the suppliers relation, then the RPU 22 would return at block 528 to alert the user that the selected value for Athens, although found in the value set, is not within the suppliers relation. Prooe3sing continues at block 530, in which the RPU 22 doe6 a count of the binary bits set to "l" up to and including the ordinal position of the , binary bit associated with the unique number value Athens. RPU 22 determines that Athens is associated with the first binary bit set to "l" in the entity select vector, and thus, the uniquc value Athens corresponds to the f irst row use vector in the row use set (266, FIG. 4). In block 532, the RPU 22 retrieves the row use vector (560, FIG. 15B) associated with Athens. The row use vector for Athens contains four binary bits set to "o", followed by a binary bit set to "1", indicating that the value Athens occupies the fifth row of the column associated with cities. During block 536, the RPU 22 determines that there are no more unique values selected in the column of cities, and thus, processing continues at block 538, in which the RPU 22 determines that more than one column, i.e., suppliers 2nd city, was selected by the user. In block 539, the RPU 22 determines that no more value need to be selected. In block 540 the row use vectors, associated with S5 and Athens, ar~e ANDed together to generate a resultant select vector, which represents the rows of the relation which satisfy the query (562, FIG. 15B).
35 As shown, at 562 (FrG. 15B) the resultant binary bit 13386~i vector contains four binary bits set to "O" followed by a binary bit set to "l", indicatin~ that the fifth row of the relation contains the supplier ID S5 and the city Athens. The actual row of the rQlation can be reconstructed and displayed to the user in one or two ways. First, for each column, the system can use the entity use vectors and associated row use vectors to map the row number determined by the select vector to the ordinal position in each value set. A more detailed discussion on the entity use vector approach will be discus6ed in Part VI. Second, the RPU 22 could trace back from the row use vector6 to the entity selcct vectors and then back to the value set to (1.otl~rm~n~ the unique values in the fith row of relation.
2. ~ets1led EYAm~le of Twg Colllmn STTR~'T for lU -l ti~le VAlues -- ~
As ln the previous example, this example relies principally on the suppliers relation of FIG. 2.
Specifically, it is assumed for the purposes of this example that the suppliers relation (63, FIG. 2) is in its binary repre6ented orm, as depicted in FIG. 4 and that this binary representation of the relation resides in memory, to be processed by RPU 22. Again, the SE1ECT
operation in this example involves two columns, namely, suppliers names column (170, FIG. 4) and the city column (179, FIG. 4)- i Specifically, the query entered by the user is to determine whether suppliers Smith or Blake are located in 10ndon or Paris. The standard instruction for the query i~:
SELECT s~, CITY
FROM S
WHERE (SNAME = ' SMITH ' ) OR (SNAME = ' BI.AKE ' AND
(CITY = '10NDON') OR (CITY = 'PARIS'~
This query is interpreted by the command interpreter (FIG. lA) and RPU 22 retrieves the row use vectors assoclated with the unique values Smith or Blake which are, in turn, ANDed with the row use vectors for the unique values London or Paris. FIG. 16 is a detailed results table depicting the various steps performed by the SELECT routine ~FIG. 14). The results table of FIGS. 16A and B are broken up into seven columns. From left to right, the first column depicts the value set associated with the unique values, the second column is a bit vector corresponding to the ordinal positions of the selected unique values within the value set, the third column is the entity select vector associated with the column in which the selected values reside, the 1~ fourth column is a resultant bit vector determined by ANDing the ordinal position bit vector with the entity select vector, the fifth column is the row use set associated with the column in which the 3elected values reside, the sixth column is the select vector determined by performing a Boolean OR operation on all the row use vectors corresponding to selected unique values from one column, and the last column is the resultant vector detcrmined by perf orming a Boolean AND operation on the select vectors determined for the selected values from more than one column.
Referring now to FIGS. 14, 16A and B, a detailed example of the query for det~rm~n~r~ whether the suppliers Smith or Blake are located in London or Paris is now discusæed. Specifically, during block 516, the RPU 22 determines the ordinal positions of the suppliers names Smith and Blake in the value set for suppliers names. Essentially, the system traverses a structure associated with the supplier name value set and det~rmi nPq the speci~ic nodes in the structure where the 81 ith And ~l~k~ v~lu~- r~sid~. Durirlg bl~ok 518, the "
RPU 22 determines whe~her or not the values Smith and Blake have been found in the value set. The valueci Smith and Blake are located in the structure; thus, they are within the value set for the suppliers names. In block 522 the RPU 22 creates a binary bit vector for representing the ordinal positions within the value set associated with the value6. Blake and Smith are as shown in row 565 of FIG. 16A, the third and ninth ordinal positions. The new binary bits are set to "1"
cuLLc:~onding to the ordinal positions o~ the values in the value set for suppliers names. During block 524, the new binary bit vector is ANDed with the entity select vector associated with the suppliers names in the suppliers relation ~567, FIG. 16A). In block 526, the resultant bit vector from the AND operation is evaluated and it is determined that the resultant bit vector is not all zeros. The resultant bit string contains binary bits set to "1" (569, FIG. 16A). In block 530, a single "count", with two ordinal positions as input, is performed on the entity 6elect vector, up to and including the binary bit associated with the unique value Smith, the last value characterized in the new entity select vector. The RPU 22 determines that there are two binary bits set to "1", one for the count corresponding to Blake and that there are five binary bits set to "1" corresponding to the count for Smith.
Therefore, the unique value6, Blake and Smith, are associated with the second and fi~th row use vectors of the row use that is assocLated with suppliers names (262, FIG. 4). In block 532, row use vectors associated with the unique values Smith and slake are retrieved from the row u6e set (262, FIG. 4). The row use vector ~or Blake is a binary bit vector containing five binary bits in which the third binary bit is set to "1", indicating that the value Blake resides in the third row 1338~01 of the column for the suppliers names (570, FIG. 15).
Likewi3e, the row use vector for Smith is a binary bit vector containing five binary bits and the first bit is set to "1", indicating that the value Smith resides in the first row of the column for suppliers names (570, FIG. 16A).
During block 536, RPU 22 determines that there is more than one value selected from a column for suppliers names, thus processing continues at block 537. During block 537, the row use vectors associated with Smith and Blake (570, FIG. 16A) are ORed together to form the select vector as shown at 571 of FIG. 16A. During block 538, the RPU 22 determines that more th~n one column is involved in this select operation, i . e., the supplier3 names column and the city column. It continues at block 539, during which the RPU 22 determines that the values London and Paris are selected in the separate column cities. In block 516 l the RPU 22 determines the ordinal positions for London and Paris in the value set for cities. Essentially, the system transverses the value set for cities, and during block 518, the RPU ~ot~ n~
that London and Parls are both located in the value set for cities. During block 532, a binary ~it vector is constructed to indicate which ordinal positions the value set for cities are associated with the cities London and Paris (573, FIG. 16A). Specifically, the system creates a binary bit vector which shows binary bits set to "1" in the fifth and eighth ordinal positions (573, FIG. 16~). During block 524, the Boolean A~D operation is performed on the new binary bit vector with the entity select vector associate~ with cities for the supplieris relation (575, FIG. 16B). In block 526, the RPU 22 determines that the resultant vector does not contain all zeros (579, FIG. 16B).
London and Paris are determined to be in the suppliers relation. If the values London and Paris were not located in the suppliers relation, then the RPU would return at block 528 to alert the user that the selected values were not found in the supplLers relation.
Processing continues at block 530, in which the RPU
determines the number of bits set to "1" up to and including the binary bit associated With the unique value London, and the same is done for Paris. RPU 22 determines that London is associated with the second binary bit set to "1" in the entity 6elect vector; thus, the unique value London corresponds to the second row use vector in a roW use set (266, FIG. 4). Additional-ly, the RPU determines that Paris is as60ciated with the third binary bit set to "1" in the entity select vector and, thus, Paris corresponds to the third row use vector in the row use set (266, FIG. 4). In block 532, the RPU
22 retrieves the row use vectors associated with London and Paris (561, FIG. 16B). The row use vector for London contains five bits, the first and fourth bits of the row :se vector containing binary bits set to "1".
The row use vector for Paris contains five binary bits, the second and third binary bits set to "1".
Essentially, the row use vectors for London and Paris indicate that the first through fourth rows of tbe column are associated with cities London, Paris, Paris, London .
During block 536, the RPU determines that more than one unique value was selected in the column of cities and, thus, processing continues. In block 537, the RPU
performs the Boolean OR operation on the row use sets for London and Paris. A select vector is generated (563, FIG. 16B), which has five binary bits and bits one through four are set to "1". In block 538, the RPU
determines that more than one column of the relation was 35 involved in the SELEC~, l.e., suppliers names and ,, clties . In block 539, the RPU r~PtPrTn~ nP~ that no more values in other columns need to be selected from the relation. In block 540, the select vectors associated with the suppllers names (5~1, FIG. 16A) and cities (563, FIG. 16B) are ANDed together to generate a resultant select vector, which represents the rows that satisfy the query (565, ~IG. 16B). As shown in 565 (FIG. 16B) ~ the resultant entity select vector lndicates that the first and third binary bits are set to "1", indicating that the first and third rows of the suppliers relation contain information on whether Smith is associated with Paris and/or London and whether Blake is associated with Paris and/or London. As in the last example, the actual rows in the relation can be reconstructed to display to the user in one of two ways.
First, the system can use the entity use vectors associated with each column of the relation to map the row numbers d~Prmi~P~ by the resultant select vector, to the ordinal positions in the appropriate value sets.
Second, the RPU 22 could trace back from the various row use vectors to the entity select vectors and back to the appropriate value sets to determine the actual unique Values in the first and third rows of the relatlon. A
more detailed discussion on the entity usc vectors approach will be discussed in Part VI.
D. RECONSTRUCT
The purpose of the KE~N~l~UCT operation is to generate the values associated with a particular column of a relation for the user to ascertain. Typically, the binary representation of a relation is constructed and stored in memory 18. ~f the user of the system wishes to see the actual relation and the values depicted in the relation, then the RECONSTRUCT operation can be per-formed for reconstructing and displaying the relation to f --~2--the user . The f low diagrams in FIGS . 17A and 17B are for reconstructing and displaying various columns specifLed by the user or an applications program.
Typically, the user will specify one or more columns of a relation to be displayed by the system. The user might re~uest the Suppliers ID column of the relation for suppliers at 63 of FIG. 2, which is currently stored in its binary representation in memory 18 (FIG. lA) (260, FIG. 4) .
Referring to FIG. 17A, the first step in performing the RECONSTRUCT operation is in block 565, in which the user specifies various columns of a relation to be reconstructed. As stated above, an applications program may also specify particular columns of the relation to be pro~ected. For example, when the SE~ECT operation is performed, the resultant binary representation can then be reconstructed and displayed to the user via the RECONSTRUCT operation (FIG. 17A). In the hext step, block 566 calls the routine DISPLAY/RECONSTRUCT (FIG.
17B) to reconstruct one of the specified columns. The DISPLAY/RECONSTRUCT routine (FIG. 17B) essentially performs the necessary steps for obtaining the values and for placing the values in the proper rows in the column. In block 567, the RPU determines whether there are any more columns that need to be displayed. If there are more columns to be displayed, then processing continues at block 566. Blocks 566 and 567 are performed until all of the columns specified by the user or applications program have been reconstructed. If all o~ the columns have been reconstructed, then processing continues at block 568, in which the RPU returns to the calling program.
Referring to FIG. 17B, a flow diagram of the DISPLAY/RECONSTRUCT routine is depicted. During block 35 571, the RPU 22 obtains the entity select vector associated with the partlcular column to be displayed.
During block 575, the first row use vector associated with the row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18, is transferred to the RPU 22. Then during -- block 577, the P~PU 22 performs a 3001ean AND operation on the row use vector obtained in block 57? with a row 6elect vector. (The row select vector i8 created by performing a query operation on the relation thereby selecting which rows of the relation the user or application program wishes to di6play. ) The row select vector is a new binary bit vector and each binary bit corresponds to a row of the column or column6 to be displayed. A binary bit set to "1" indicates that the corresponding row needs to be displayed. The result of the AND operation is a new vector Z which depicts the rows of the coLumn which contain a particular value associated with the row use vector. The results of the AND operation are sent to memory 18 ~or future processing. Then, in block 579, the RPU 22 determines whether the resuLtant vector Z is "0". If the resultant vector Z is "0", then during block 581 a "0" is placed in the first binary position of a new vector called the index vector. The index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the row use set. Each bit indicates whether the unique value associated with the row use vector exists in the relationaL column to be displayed. If a binary bit in the index vector is set to "o", the unique value associated with the row use vector does not exist in the column to be displayed. Wherea6, if the binary bit is set to "1" in the inde'x vector, then the unique value associated with the row use vector exists one or more times in the column. Processing continues at block 593, during which the next row use vector of the row use set i8 obtained.
Returning to block 579, if the result of the Boolean operation performed in block 577 is non-~ero, then processing continues at block 585. Durlng block 585, the RPU 22 6ets the bLnary bit in the index vector, which is associated with the current row use vector, to "1". In block 587, the resultant bit vector Z is stored in memory 18. The resultant vector Z is later used in the reconstruction process.
During block 5ag, the RPU 22 clear6 the binary bits ln the row select vector that match the binary bits set to "1" in the resultant vector Z. The purpose of this step is to shortcut the processlng of the row use vectors in the row use set. Stated differently, when the row select vector is cleared, all of the values in the rows of the column have been determined. The row use vectors which have been processed with the row select vector contain all of the values to be displayed in the column. Then during block 591, the RPU 22 (28, FIG. lA) determines whether the row select vector has been completely cleared, or stated differently, all of the binary bits have been set to "0". If the row select vector contains only binary bits set to "0", then processing continues at blocks 597, 601, 603, 605, 607, 609 and 611 to reconstruct the column with the values 2ssoclated with the row use vectors ln the row use set.
~owever, lf not all of the binary blts ln the row usc vector are set to "0", then processlng contlnues at block 593. During block 593, the RPU 22 gets the next row use vector in the row use set currently being pro-oessed. During block 59~5, the RPU 22 determines whether the end of the row use set has been reached. If the end of the row use set has been reached, then processing 35 continues at blocks 597, 601, 603, 605, 607, 609 and 611 to reconstruct the column. However, assuming that the end of the row use set has not been reached, then processing continues at blocks 577, 579, 585, 587, 589, 591, 593 and 595 until all of the row use vectors of the row use set bave been processed.
Assuming that all cf the row use vectors of the row use set have been processed or the row select vector contains binary bits set to "0", then processing continues at block 597. During block 5g7, the RPU 22 determines the ordinal positions of the binary bits set j to "1" in the index vector. Each binary bit set to "1"
indicates which row use vectors reference unique values which are to be displayed in the column. During block 601, or a row use vector which has a corrPsp~n~lf n~
binary bit set to "1" in the index vector, the ordinal position of the binary bit set to "1" in the entity select vector is determined. Then, during block 603, the value associated with the ordinal position obtained in block 6Cl is obtained from the value set. During block 605, the RPU 22 finds the appropriate location in the index vector associated with the value. Then during 607, the appropriate resultant Z vector stored during step 587 is retrieved. The resultant vector Z indicates which rows of the column contain the unic~ue value associated with the row use vector, and the unique value is placed in the column at the appropriate row locations. During block 609, the RPU 22 determines whether any more values are left or processing.
Assuming that there are more values, then processing continues in blocks 601, 605 and 607 until all of the values have been placed in the proper rows of the cclumn. Once all of the values have been placed into the column, then processing returns to the calling program during block 611.
., 1. Det~iled r- le gf Perform~n~
uCT oPerat ~ on Referring to FIGS. 17A, 17B, 18A, 18B, 18C and 18D, a detailed example for reconstructing the column for 8uppliers IDs in the Supplier6 relation (FIG. 2 ) is now . More particularly, it is assumed that only the binary representation of the column Suppliers IDs exist ln the RDMS 10. The binary representation of the column may be from a result of a SELECT operation or it may have been previously stored after processing by the BBVP 14. In either case, the binary representation of the column Suppliers IDs exist in memory 18 and now the user or an applications program need6 to display the actual value6 of the column. Although, this example is for reconstructing and displaying only one column of a supplier for ,~the suppliers relation, the PRO~ECT
operation could be se~uentially performed to supply all of the columns of the supply relation.
Referring to FIGS. 18A, B, C and D, a Results Table for depicting the results of the RECONSTRUCT operation for reconstructing or supplying the Suppliers ID column ~80, FIG. 2) of the suppliers relation (63, FIG. 2) is shown. Each row (800-848) of the Results Table depicts a result of the routines shown in FIGS. 17A and 17B.
The Results Table is separated into seven columns. From left to right, the first column of the Results Table shows the entity select vector for the selected column to be displayed. The second column shows the row use vector set associated with the specified column to be displayed. The third column is the row use vector currently oeing processed, and the fourth column is the row select vector specified by the user application program for determining which rows of the column are to be displayed. The fifth column is the result of ANDing the row use vector and the row select vector together;
the result is called vector Z. The sixth column depicts the reconstructLon of the index vector for displaying which row u6e vectors of the row use set have associated unique values in the column. The last column is for the reconstruction of the column to be displayed.
Referring to FIGS. 17A and 17B, the operation performed by the RPU 22 (FIG. lA~ for displaying and reconstructing the Suppliers ID column (80, FIG. 2) is now discussed. Specifically, during block 565 (FIG.
lo 17A~ the user or application program selects the various column or columns which are to be reconstructed and displayed as a result of the query operation. For this example, the user has selected the Suppliers ID column (80, FIG. 2). Currently, the Suppliers ID column only exists in the secondary memory (18, FIG. lA) in the form of a ~inary representation or row use set. During block 566, the DISPIAY/RECONSTRUCT routine (FIG. 17B) is called for finding the values and reconstructing the 6uppliers ID column.
2d Referring to FIG. 22B, during block 571, the RPU 22 (FIG. lA) finds the entity select vector assoclated with the Suppliers ID column in the memory 18 (FIG. lA~ he entity select vector stored in memory 1~3 (FIG. lA~ is transferred via bus 30 (FIG. lA) to RPU 22 (FIG. lA~.
~5 Then, during block 573, the RPU 22 finds the row use set associated with the Suppliers ID column in memory 18 and transfers via a bus to the RPU 22 (FIG. lA). During block 575, the RPU 22 obtains the first row use vector o~ the row use set for the Suppliers ID column (804, FIG. 18A). For purposes of this example, the user wishes to display the first four rows of the Suppliers ID column. Constructed earlier in the system is a row select vector having four binary bits set to 1 in the first four ordinal positions, i.e. 1 1 1 1 O . During 35 block 577, the row use vector (804, FIG. 18A) and the row select vector (806, FIG. 18A) are transferred to the RPU 22 (FIG. lA). Here the row use vector (804, FIG.
18A) and the row select vector (806, FIG. 18A) are ANDed together to ~lP~Prm;nP a resultant binary vector Z (808, FIG. 18A). The resultant binary bit vector Z depicts the row6 of the Suppliers ID column in which the unique value associated with the row use vector (804, FIG. 18A) is to reside. The resultant binary vector Z contains a binary bit set to "1" in the first posit$on which means that the uni~ue value associated with the row use vector (804, FIG. 18A) will be placed in only the first ordinal position of the Suppliers ID column. During block 579, the RPU 22 determines that the result of the AND
operation is not "0", thus processing continues to block 585. During block 585, the RPU 22 via BBVP 14 (FIG. lA) sets the first binary bits of the index vector to "1"
(810, FIG. 18A) to indicate that the unique value associated with the row use vector (8Q4, FIG. 18A) exists at least once in the Suppliers ID column. In block 587, the resultant binary bit vector Z is stored in memory 18 for future reconstruction of the column.
Specifically, the binary bit set in the resultant binary bit vector Z indicate the rows of the Suppliers ID
column with the unique value S1 associated with the row use vector (804, FIG. 18A) reslde. Then during block 589, the binary bit6 set to "1" in the row select vector which match the binary bits in the resultant vector Z
are set to "Q" (812, FIG. 18A). During block 591, the row select vector is a value determined if all of the binary bits have been set to "0". The row select vector contains three more binary bits set to "1" (812, FIG.
18A), thus it continues ~at block 593 . During block 593 , the RPU 22 obtains the next row use vector of the row use set (814, FIG. 18B). The row use vector (814, FIG.
18B) and the row select vector (816, FIG. laB) are ANDed .
together during block 577. The resultant vector Z is shown at 818 of FIG. 18B. During biock 579, the resultant vector Z is evaluated to determine if all the binary bits of the resultant vector have been set to "o". Not all of the binary bits of the resultant vector are "o" (the resultant vector is "0 l 0 0 0" (818, FIG.
18B) ), and thus during block 585, the second binary bit of the index vector is set to l'l" (820, FIG. 18B). In block 587, the F~PU 22 (FIG. lA) stores the resultant lb vector Z in memory 18 for future processing. Then during block 589, the binary bits of the row select vector which were set to "l" and matched the ~inary bit set to "1" in the resultant vector Z are set to "0"
(820, FIG. 183). In block 591, the row Gelect vector iB
evaluated to determine if all the binary bits have been set to "0". The row select vector still has two binary bits set to "1" (821, FIG. 18C), and durlng block 579, lt is evaluated to determine i~ all the binary bits are set to "0". The resultant vector Z contalns one blt set to "1~, and thus processlng continues at block 585.
Durlng block 585, the RPU 22 (FIG. lA) sets the thlrd binary bit of the index vector to "l" (828, FIG. 18C).
During block 587, the resultant vector Z is stored in memory 18 for future processing. In block 589, the binary bit of the row select vector which matched the binary bit of the row use vector is set to "0" t830, FIG. 18C). Then during block 591, the row select vector is evaluated to determine if all the binary bits are set to "0". The row select vector still has one binary bit set to "1" (834, FIG. 18C), thus it continues at block 593. During 593, the RPU 22 obtains the next row use vector of the row use sét (832, FIG. 18C). The row use vector (832, FIG. 18C) and the row select vector (834, FIG. 18C) are ANDed together during block 577. The 35 resultant vec~:or Z is shown at block (836, FIG. 18C).
f 1338601 During bLock 579, it is determined that not all of the binary bits of the resultant vector are "0" (the resultant vector is "00010" (836, FIG. 18C). Thus, during block 585, the fourth binary bit of the index g vector i5 set to "1" (838, FIG. 18C). The resultant vector Z is stored for future processing. Then during block 589, the binary bit of the row select vector which were set to "1" and match the binary bit set to "1" in the resultant vector Z are set to "0" (840, FIG. 18D).
I'hen during block 591, the row select vector is evaluated to determine if all the binary bits had been ~et to "0". All the binary bits of the row select vector are set to "0" (830, FIG. 18D), and thus , ¦ processing continues to block 597.
During block 597, the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Specifically, the ~irst, second, third and fourth binary bits of the index vector are set to "1".
Thus the first, second, third and fourth row use vectors of the row use sets are associated with unique values which exist in the Suppliers ID column. During block 601, RPU 22 (FIG. lA) determines the ordinal positions of the entity select vector which coLLc~.uul~d with the row use vectors having corr~qron~in7 binary bits set to "1" in the index vector. Then during block 603, the values associated with each ordinal position of the entity select vector are obtained from the value set of Supplier IDs. Specifically, the values Sl, S2, 53 and S4 are obtained. Then, during block 605 and 607, the first resultant binary vector Z is obtained from temporary storage, and the value Sl is placed in the proper ordinal position 'of the column for Suppliers IDs.
And during block 609, the RPU 22 (FIG. lA~ determines whether there are any more values left for processing.
There are three ~ore values left for processing, and --., ~81--thus block 605 and 607 are performed. During block 605 and 607, the second resultant vector Z is obtained and the value S2 ls pl2ced in the second ordinal position in the column (844, FIG. 18D). In block 609, it is determined that there are more values left for processing, and thus block 605 and 607 are performed.
During block 605 and 607, the third resultant vector associated with the third resultant vector Z which is associated with the value S3 is obtained from memory.
~he value 53 is placed into the third ordinal position of the relational column for Suppliers IDs (846, FIG.
18D). In block 609, it ls determined that there is still one more value to process, and thus block 605 ~nd 607 are performed. During block 605 and 607, the fourth resultant vector Z stored in temporary memory is obtained and the value S4 is placed into the fourth ordinal position of the column for Suppliers ID6 (848, FIG. 18D). During block 609, the RPU 22 ~FIG. lA) determines that there are no more values for proces3ing and thus returns to the calling RECONSTRUCT routine (FIG. 17.~) at block 567.
During block 567, the RPU 22 (FIG. lA) determines whether there are any more columns to be displayed to the user. For this examplel it is assumed that only the Suppliers ID column is to be reconstructed and displayed. However, if more columns of a particular relation were to be di6played, then processing would continue at blocks 566 and 56i until all o~ the columns were displayed. Also, the same row select vector would be used each time the DISPLAY/RECONSTRUCT routine (FIG.
17B) was performed. Assuming that there are no more columns to be displayed, then processing returns to the calling program during block 568.
E.
The ability to "JOIN" two or more relations is considered to be the most powerful feature of a relational system, An IntrDduction to Data S~ SYstems Vol. 1, 4th Ed. (1986). E6sentially, a JOIN is a SELECT
over the Carte6ian product of more than one relation o~ .
the relational database.
To understand the purpose of the JOIN operation, an overall view of how an operation might be lmplemented io for a conventional system s shown. Suppose that a user needs to get all combinations of supplier and part information for the SUPPLIERS relation (63, FIG. 2) and PARTS relation (65, FIG. 2) such that the supplier and part in question were located in the same city. The user might use the following query:
SELECT S.ID#~s.NAMElsTATusls.cITylp.ID#/p.NAME~
COLO~,P.WEIGHT,P.CITY FROM S,P
WHERE S . CITY=P. CITY;
The result of this query produces the following table:
I~BIE A
s# SN~ME STATUS S . CITY P# PN~E C0LCR WEIGHT P. CITY
Sl Smith 20 L~ndon Pl Nut Red 12 L~ndon Sl Smith 20 London P4 Screw ~ed 14 L~ndon 2 5 Sl Smith 20 L~ndon P6 Ccg Red 19 L~ndon S2 Jones 10 Paris P2 801t Green 17 Paris S2 Jones 10 Paris P5 Cam Blue 12 Paris S3 Blake 30 Paris P2 Bolt Green 17 Paris S3 Blake 30 Paris P5 Cam Blue 12 Paris 3 0 54 Clark 20 L~ndon Pl Nut Red 12 L~ndon 84 Clark 20 London P4 Screw Red 14 London S4 Clark 20 Londdn P6 Cog P~ed 19 L~ndon The data shown in Table A above, comes from the two reIations, suppliers 63 and parts 65 (FIG. 2). In the ~; .
., ~uery above, the names of the relations are listed in the FROM clause, and conneotion between the two relatLons (63, 65 FIG. 2) i6 listed in the W~ERE clause (i.e., the fact that the city values must be equal) which is called the JOIN predicate. The JOIN is used to combine relations ba6ed on e~uivalent values in the column if specified by the JOIN predicate. In this case, the specified columns are the "city" columns of each relation. The JOIN pairs each o~ the N rows of a first relation; e.g., the SUPPLIERS relation (63, FIG.
2), with each of the M rows of a second relation, e.g., the Parts relation (65, FIG. 2), to form an N * M
resultant relation. Then, the JOIN operation discards all resultant rows of the JOIN relation which do not satisfy the JOIN specification. This type of JOIN
relation is generally referred to as the "EQUIJOIN"
operation .
As an example of EQUIJOIN, consider any two rows from the two relations (i.e., the suppliers relation (63, FIG. 2) and the Parts relation (65, FIG. 2). For example, the rows shown below:
S.ID~ SNAME STATUS CITY /~ P.ID~ PNAME COLOR WEIGHT crrY
S1 Smith 20 London Pl Nut Red 12 L~ndon TheGe rows show supplier S1 and part P1, are locatcd in the 6ame city, namely, London. Therefore, a reGult row iG generated, since both rows satisfy the predicate in the WHERE clause, namely, S.CITY=P.CITY.
8imilarly, for all other pairs of rows in the SUPPLIERS relation (63, FIG. 2) and the PARTS relation (65, FIG.2) which satisfy the predicate clause, a resultant row is generated (see Table A). Referring to Table A and to FIG. 2, notice that the supplier, S5 at 75, located in Athens, does not appear in the JOIN
relation (Tabie A) because there are no parts associated 1338~01 with the city. I.ikewise, part P3 at i31, associated with Rome, does not appear in the resultant relation, because there are no 6uppliers associated with Rome.
There is no requirement that the comparlson operator, in a JOIN predicate be equality. The EQUIJOIN
by definition produces a result containing two identioal columns as shown $n Table A. If one of these two columns is eliminated, the result ls cailed NATURA~
JoIN .
In conclusion, the JOIN operation is the restriction of the Cartesian product of two or more relations. The Cartesian product of a set of N relations is a new relation consisting of all possible rows "r", such that "r" is the concatenation of all rows from the participation of relations. Once the Cartesian product is generated, all rows that do not satisfy the "JOIN
predicate" are eliminated from the Cartesian product.
What is le~t is the ~QUIJOIN result relation.
In this examp~e, the complete table contains thirty rows. Now, all the rows ' Cartesian product in which S . CITY is not egual to P. CITY are eliminated and what is left is the EQUIJOIN result as shown earlier.
In the following sections, two aspects of the present inventions are discussed. First, the JOIN
relation is efficiently represented by binary bit vectors and second, the JOIN relation is constructed without having to create a cross produot relation.
1. B; n~rV ~ePresentat ~ nn o:E a JOIN ~ t 1 I~n Referring to FIGS. 2, 19, 20 and 21, a binary representation of a JOIN relation is now d~cl~Gs~
Specifically, FIG. 19 represents the depiction of a JOIN
relation from the following query:
133~601 ., SELECT S . ID#, S . STATUS, S . CITY, P. ID#
FRON S, P
WHERE S . CITY=P. CITY;
This guery is a pro~ ection of the JOIN because the P. CITY is not mentioned in the SELECT clause of the query. Like the EQUIJOIN example discussed above, this query requires that data come from two relations, namely the suppliers relation (63, FIG. 19) and the Parts relation (65, FIG. 19). 80th relations are named in the FROM clause and the rnnnPrt~nn between the tables i6 through the CITY columns ln the WHERE clause. The rcsult of the JOIN for dlsplaying the columns CITY, SUPPI.IERS ID#s, STATUS, and PART ID # ' 8 is shown at 628 of FIG. 19. To construct the JOIN relation of the SUPPLIERS relation and the PARTS relatlon, a new set of entity 3elect vectors 600 and 602 Ior depicting the values in the JOIN relation columns, are created. The entlty select vectors at 600 and 602 are binary bit vectors that indicate which rows of the particular relation associated with the entity select vector, participate in the JOIN relation. More particularly, each binary bit has an ordinal position, which corresponds to a row of the relation. Binary bits 601, 603, 605, 607 and 609 correspond to the five rows of the suppliers relation 63. When the binary bit is set to "1" the particular row associated with the binary bit participates in the JOIN relation. SpPr~f~c~l~y, binary bit 601 indicates that the first row of the SUPPLIERS
relation 63 participates in the JOIN relation.
Likewise, the entity select vector 602, having binary bits 611, 613, 617, 619 anc 62i, indlcates that the flrst, second, fourth, 'fifth and slxth rows of the PARTS
relation 65 participate in the JOIN relatlon, Vectors 600 and 602 act ~u6t llke the entity select 35 vectors discussed in FIGS. 4, 5 and 6. The only -difference is that the ordinal positions of the bits in the entity 6elect vectors 600 and 602 do not cuL~ u-ld to unique values in a value set. Instead, the binary bits of entity select vector 600 and 602 refer to row locations in a partLcular relation. Like the entity select vc~ctors in FIGS. 4, 5 and 6, an implied mapping correspondence exists between each binary bit in the entity select vector to and a particular row u6e vector in an associated row use set.
The implied mapping scheme is illustrated by the dotted lines 608, 610, 612 and 614, which show that the binary bits of the entity select vector 600 indicate the ccrrespondence of the rows of the suppliers relation to the row use vectors 615, 617, 619 and 621, respectively.
Likewise, the binary bits of the entity select vector 602, which correspcnd to the row use vectors of the Parts relation 65, are mapped in an implied manner to the row use vectors 623, 625, 627, 629 and 631, as shown by the dotted lines 616, 618, 620, 622 and 624. The row use sets of the JOIN relation perform a dual task of representing the values in the rows of the JOIN relation and for depicting more than one column of the JOIN
relation. Specifically, the row use set 604 represents the columns S.CITY 638, S. ID~ 636 and S.STATUS 634.
Likewise, the row use set 606 represents the P.ID,L" 630, the column of the JOIN relation 628. The columns 638, 636, 634 and 630 depict the result of the JOIN.
To summarize, the entity select vector 600 and the row use set 604, represent all of the binary information necessary to construct the suppliers relation portion 626 of the JOIN relation, 628, namelyt columns 638, 636 and 634. The following' is a detailed discussion on how this representation is achieved.
~eferring to row use vector 615 at FIG. l9 of the 35 row use set 604, a dotted line 608 maps the row use -f vector 615 to the binary bit 601 of the entity select vector 600. ~he binary bits of the row use vector 615 indicate the row positions of the columns 638, 636 and 634, which contain a particular value in the fir6t row of the suppliers relation 63. Binary bit 601 cor-responds to the values London, 20, Smith and Sl. To build the S . CITY column of the JOIN relation, only the value London is referenced. Thus, the three binary bits set to "1" in the row use vector 615 represent three occurrences of the value London in the S.CITY column 638 of the suppliers portion 626 of the JOIN relation. For the S.ID;~ column 636, the first three bLts set to "1" in the row use vector 615 indicate three occurrences of the value Sl. Likewise, in the STATUS column 634, the first three binary bits set to "1" in the row use vector 615 represent occurrences of the value 20. Thus, the first three rows of the suppliers portion 626 of the JOIN
relation 628, are characterized by the fir6t three bits of the row use vector 615. Likewise, the ne~t three rows of the suppliers relation 626, are characterized by the row use vector 621 of the row use set 604. The replication of the threc bits in row use vector 615 indicates that the three values [LONDON, 51, 20] of the first table of the Suppller relation occur in rows 1, 2, 3 of the JOIN relation. The remaining values and the columns of the suppliers portion 626, are indicated by the row use vectors 61? and 619, which contain binary bits set to "1" in the seventh, eighth, ninth and tenth rows of both row use vectors. It should be noted that although the columns S.CITY, S.II~ and S.STATUS of the Suppliers relation are indicated by the query, the column SNAME in the SUPPLIERS relation, could just as easily have been represented by the row use set 604.
The values of the Parts relation are mapped into the JOIN relation 628 by the row use set 606 and the .
entity 6elect vector 602 ln exactly the same ~a6hion as for Suppliers.
In sum, the row u6e sets 606 and 604, together give a binary representation of the JOIN relation 628. Only the binary representation of the JOIN relation ls stored in the RDMS 10 (FIG. lA). The actual values depicted by the binary representation, are retrieved when the system performs a reconstruct and display for the user. FIGS.
20 and 21 represent a more detailed view o~ the JOIN
relation 628 (FIG. 19). More particularly, the SUPPLIERS relation 63 (FIG. 19) i5 depicted by its row use 6ets at 63 (FIG. 21) and the Parts relation 65 (FIG.
20), as depicted by its row use sets 65 (FIG. 20). In addition, the entity select vector6 600 and 602 are 6hown corresponding to the row use sets 630 and (FIG. 20) . Also, FIGS. 20 and 21 depict the value sets referred to in the SUPPLIERS and PARTS relations with their as60ciated entity select vectors.
Referring to FIG. 20 a detailed discussion of the SUPPLIERS relation portion 626 (FIG. 19) ifi now presented. As discussed above, the row use 6et 604 for the JOIN relation (628, FIG. 19) is mapped baclc to the Suppliers relation by the entity select vector 600. As 6hown is FIG. 19, the entity select vector 600 i3 repeated for each column of the Suppliers relation which has one or more values in the suppliers portion of the JOIN relation. The implied mapping from each row use vector o~ the row use set 604 is shown by the dotted lines 608, 610, 612 and 614. The row u6e sets ~or 3p representing the column6 o~ the suppliers relation are shown in FIG. 4. The first three binary bits set to "1"
in the row use vector '615 of the row use set 604 are mapped to the l~inary bit 601 of the entity select vector 600. The entity select vector 600 corre6ponds to the 35 row use set 260. The fir6t binary bit 601 is set to "1"
indicating that the value in the first row of the 5 . ID#
column of the suppliers relation is present ln the JOIN
relation. To determine which 6uppliers ID is referenced by the binary bit 601, a Boolean AND operation is performed on each row use vector 184, 186, 188, 190 and 192 to determine which row use vector contains a .:o~l~al!ol~ding "1" bit in the first position. Row use vector 184 contains a binary bit 6et to "1" in the first position. Row use vector 184 maps back to the first binary bit of the entity select vector 176 for the suppliers relation. The first binary bit of entity select vector 176 corresponds to the value S1 in the value set for suppliers identifiers 160. Thus, the first three binary bits set to "1" in the row use vector 615 indicate that the value Sl is in the first three rows of the s column in the JOIN relation. In the same way, the first three binary bits set to "1" in the row use vector 615 are mapped to the entity select vector 600 associated with the row use sets 264 and 266, corresponding to the S.STATUS and S.CITY columns of the suppliers relation. The row use vector 615 represents the first three values of the S.CITY column 638 and the S.STATUS column 634 of the prior portion of the JOIN
relation .
Referring to FIG. 21, a more detailed view of the Parts relation portion of the JOIN relation is shown.
Specifically, the mapping of the part ID #'s to the P. ID# column 630 of the JOIN relation is shown. Row use vector 623 o~ tlle JOIN relation contains binary bits set to "1" in the first and fourth positions. These binary bits correspond to the first and fourth positions of the P. ID# column 627 of the JC~IN relation. Row use vector 623 i8 impliedly mapped to the first binary bit 611 of the entity select vector 602. The entity select vector 35 602 corresponds to the row use set 304. A Boolean AND
, --so--operatlon is performed on the entity select vector in each row use vector of the row use set 304 to determine which row use vector contains the binary bit set to "l"
in the first position. The left most row use vector contains a binary bit set to "1" and this row use vector maps back to the first binary bit position of entity select vector 284. The first binary bit of the entity select vector 284 is set to "1", indicating that the first row of the value set 282 contains the unique value mapped into the Parts relation 65 and into the JOIN
relation. Thus, binary bit 611 indicates that the value Pl is mapped into the row use vector 623 of the JOIN
relation, specifically, that of the first and fourth rows. The remaining rows of the P. ID~ column 627 of the JOIN relation are depicted by the row use vectors 625, 627, 629 and 631 of the row use set 666.
2 . Con~trUCt; n~/ A BinarY Rel~resenta~ ~ on of a JOIN Relati on Referring to FIGS. 22A, 22B, 22C, 22D, 22E, 22F and 22G, a detailed discussion on the operations performed by the RPU 22 (FIG. lA) for performing a JOIN operation and constructing the JOIN relation is now discussed.
Specifically, referring to FIG. 22A, the routine EQ~I-JOIN, which builds a binary representation of the .JOIN
relation, is shown FIG. 22B is a flow diagram of the routine BUILD ROw USE SETS for constructing the particular row use sets associated with each column of the JOIN relation. FIG. 22C is a flow diagram for the routine CONSTRUCT JOIN ROW USE VECTORS for controlling the overall construction of each row use vector of a row use set in the JOIN relation. FIG. 22D is a routine EVALUATE ROW USE SETS for detorm;n;n~ the number of occurrences of the unique values participating in the JOIN operation. FIG. 22E is a routine called PRODUCTS
13385~1 for calculating a series of proauct terms which characterize the formation of bit pattQrns in each of the row use vector6 fQr a particular row use set of the JOIN relation. FIG. 22F is a routine called NUNS for determining the number of times a particular bit pattern -- repeats itself in a row use vector in the JOIN relatlon.
FIG. 22G is a routine called GENERATE BIT STRING for building the row use vector in the JOIN relation.
When the JOIN operation ls performed, the binary representation of the one or more relations to be JOINed are found in the memory 18 of the RDMS 10. Here the binary represented relation~ are stored until the RPU 22 is ready for processing. When the RPU 22 is ready to per~orm the JOIN operation, the relations are sent via bus 48 to the BBVP 14. Using the relations in the BBVP
14, the RPU 22 performs the EQUI/NATURAI JOIN operation (FIG. 22A) to create a binary representation of the resultant JOIN relation.
Referring to FIG. 22A, a detailed description of the overall process for performing the EQUIJOIN
operation is now discussed. Specifically, during block 652, the RPU 22 obtains all the entity 6elect vectors, for the participating relations, of the columns which contaln values from the same particular value set over 2s which the JOIN operation is performed (e.g., CITY VALUE
SET for S . CITY = P. CITY) .
The JOIN operation is performed over the values of the relations which fulfill the WHERE clause of the JOIN
query (e . g ., S . CITY = P. CITY) . For example, referring 3 o to FIG . 19, where the JOIN relatLon was characterized by the WHE:RE clause (e.g., S.CITY - P.CITY), the values London and Paris in the CITY columns of relation 63 and 65 were common to both relations, and the JOIN operation was performed with respect to these values.
"
During block 654, the RPU 22 performs a Boolean AND
operation on the entity select vector6 obtained to determine a resultant binary bit vector 2- The resultant binary bit vector indicates which values of the particular value set are common to all of the entity select vectors ~ involved in the JOIN operation.
Specifically, binary bits set to "1" in the resultant bit vector "x" indicate the values of the value set wh$ch are common to all the columns represented by the obtained entity select vectors.
j During block 658, the RPU 22 determines which binary bits of each entity select vector CO~l~a~.,lld to the binary bit set to "l" in the resultant bit vector ~.
For each binary bLt set to "l" in the entity seleot vector that correspond6 to a binary bit set to "l" in the resultant bit vector X, the RPU 22 obtains the row use vector in the associated row use 6et during block 658. Then, during block 660, for each row u6e 6et as60ciated with each entLty 6elect vector, the Boolean operatLon OR is performed on the selected row use vectors of the a6sociated row use set. The resultant , vectors are referred to a6 JOIN entity select vectors, which characterize values belonging to one or more JOIN
columns in the JOIN relation. In block 664, a row u6e set corroQr~n~9~n~ to each JOIN entity 6elect vector i8 constructed. Specifically, during block 664, the BUILD
ROW USE SET routin~ tFIG. 22B) is called to construct each row use set corresponding to each JOIN entLty select vectar of the JOIN relation. When all of the row use sets for the JOIN relation have been con6tructed, processing returns to the calling routine in block 668 .
Referring to FIG. '22B, a detaLled dLscu6sLon fcr the routine BUILI) ROW USE SETS i6 now disoussed. The purpose o~ thLs routine is to construct the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns o~ the resultant JOIN relation.
Specifically, during block 672, the RPU 22 selects the first unique value i in the resultant bit vector ~.
In other words, the RPU 22 selects the first value represented by the occurrence of a "1" blt ln the re6ultant bit vector ~. Then, during block 673, a variable "START ROW" is set egual to zero. This variable indicates the ~tart position of the first row o in the JOIN row use vector in the JOIN row use set being generated and will be discussed in more detail along with the description of FIG. 22G. Then, during block 674, the CONSTRUCT JOIN ROW USE VECTORS routine (FIG.
22C is called. This routine is performed by the RPU 22 to determine the characteristics of a particular row use vector of the JOIN relation corresponding to the value i and to build the row use vector or vectors associated with the value in the JOIN relation. During block 676, the RPU 22 determines if there are any more unique 2 0 values which are indicated in the resultant vector 2~-If there are, then the next unique value, i, is obtained at block 678. Processing continues at the ~:ONS~1KIJC~
JOIN ROW USE VECTO3~S routine to build the JOIN row use vector(s) associated with the next unique value.
Assuming that there are no more unique values represented in the resultant bit vector ~, then processing returns to the calling program (EQUI/NATURAL
JOIN, FIG. 22A) at block 680.
Referring now to FIG. 22C, a detailed description of the CONSTRUCT JOIN ROW USE VECTORS routine is now fli Cc~lcc-or~. As stated above, the purpose of this routine i8 to d2termine the ~haracteristics of the row use vectors of the JOIN relation and to bulld the row use vectors for a particular row use set in the JOIN
relation. During block 684, the routine EVALUATE ROW
USE VECTORS is called. The purpo6e of this routine is to determine the number of occurrences of a particular value which particlpates in the JOIN operation. A more f~tA ~ discussion of this routlne will be presented shortly with reference to FIG. 22D. Then, during block 686 the PRODUCTS routine (FIG. 22E) is called. The PRODUCTS routine calculates a 6eries of product terms which characteri~e the formation of bit patterns in each of the row use veators of a particular row use set of the JOIN relation. A more detailed discussion of this routine will be presented with reference to FIG. 22E.
Then, during block 688 the NUMS routine (FIG. 27F) is called. The NUMS routine determines the number of times a particular bit pattern repeats it6elf in a row use vector of the JOIN relation. A more detailed discussion will be shortly presented with reference to FIG. 22F.
~s~ n~ that all the calculations for det~ nin~ the characteristic of a row use set have occurred, processing continues at block 690 during which the first input column "j" (where "j" is set equal to l) is obtained. Then, in block 692, the row use set associated with the first input column ~ is obtained.
Then, during block 694, the GENE~ATE BIT ST3~ING
routine (FIG. 22G) is called for construoting the row 2s use veotors associated with a particular value in the row use set of the JOIN relation. The GENERATE BIT
STRING routine (FIG. 22G) evaluates the calculations of the PRODUCTS (FIG. 22E) and NUMS (FIG. 22F) routines to determine the characteristics o~ the bit patterns in the 3 o row use vectors and constructs these bit patterns in the row use vectors of the JOIN relation. Then, during block 696, the RPU 22 determines if there are any more input columns which need to be proces6e~. Assuming that there are still other input columns to be processed, block 698 increment6 the variable "; " by l. Prooessing continues at blocks 692, 694 and 696 until all of the input columns have been processed. Assuming that there are no more input columns for processing, block 700 is entered to calculate a new value for the variable "START
ROW. " rqore particularly, the equation START ROW = START
ROW + PRODS (1) is determined. The new value for START
ROW will indicate the starting position of the bit pattern in the row use vector for the next value i of the resultant bit vector 2~. Processing returns at block 702 to the calling program, the BUILD ROW USE 8ETS
routine, FIG. 22B.
FIG. 22D is a flow diagram of the EVALUATE ROW USE
VECTORS routine, discussed below in more detail. As stated above, the purpo6e of this routlne is to determine the number o~ ~ccuLL.2l~ces of a particular value which participates in the JOIN operation. During block 704, the RPU 22 selects the first column and obtains the row use set of the column. Then, during block 705, the RPU 22 obtains the row use vector of the ~0 RUS (~ ) which corresponds to the unique value i of the bit vector x. The row use vector i6 reerred to as V~.
During block 706, the number of binary bits set to "1"
in the row use vector V~ is determined. The number o~
binary bits set to "1" in Vj corresponds to the number of occurrences of the particular value i in the current column over which the JOIN is performed. The number of occurrences calculated for this particular row use vector V~ is placed in the variable C~. C~ is used by the PRODUCTS routine (FIG. 22E), to be discus6ed. Then, during block 707, the RPU 22 (1~t~ ne~ whether there are any more input columns. Assuming that there are still more input row llse vectors, processing continues at block 709, durin~ which the variable ~ is incremented by 1. Processing continues at block 705, 706 and 707 until all of the input columns are processed. Assuming 1338GOl that all of the input columns have been processed, then during block 711 processing returns to the calling program, the CONSTRUCT JOIN ROW USE VECTORS, FIG. 22C
(AT BLOC~C 636).
FIG. 22E, i6 a flow diagr2m for the routine PRODUCTS. As stated earlier, the purpose of this routine is for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors of the JOIN relation. During block 710, an array called PRODS is set equal to the series PRoDs s ~ (l,C1 * C2 * C3 * * Cn), ( 2 , C2 * C3 * * Cn), (3,C3 * * Cn), (N - 1, Cn - 1 * Cn), (N, Cn), (N + 1, 1) }
where C; is equal to the number of o~ ~uL~ .ces of value i in column j participating in the JOIN operation. For example, suppose the JOIN operation is performed over the CITY columns for two relations and the value of London is found to be present in both columns, Assume that the CITY column in the first relation contains two occurrences of the value London and the CITY column in the second relation contains three occurrences of the value London. Then C1 is set equal to 2 and C2 is set equal to 3. Therefore, PRODS(l) is equal to C1 * C2, which is equal to 6. This number is used by the GBNERATE BIT STRING routine (FIG. 22G) to determine the characteri atics of the bit patterns in the row use vectors of the JOIN relation. When processing block 710 ls completed, processing continues at block 712, to return processing to~ the calling program or the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
Referring to FIG. 22F, a detailed de$cription of the NUMS routine is now discussed. The NUMS routine is .
9, for do~rm;n;n~ the number of times a particular bit pattern repeats ltself in a row use vector of the JOIN
relation. Specifically, during block 716 the followlng series is calculated NUMS = (1, PRODS(l)/PRODS(1)) (2, PRODS (l)/PRODS (2) ) (3, PRODS (1)/PRODS (3) ) (4, PRODS (1)/PRODS (4) ) (N, PRODS ( 1) /PRODS (N) ) .
Then, during block 718, proceGsing returns to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C).
Referring now to FIG. 22G, a detailed description of the GENERATE BIT STRING routine is now ~ ~ Gc~l~G~d. As stated above, this routine determines the characteristics oi a bit pattern in a ~OIN row ufie vector associated with a particular value i. An OFFSET
value is determined for specifying the number of binary "O" bits in the row use vector ahead of the first binary "l" bit in addition to the zero-bits specified by START-ROW. During block 722 the offset value is equal to an initial value of zero.
During block 724, RPU 22 obtains the input bit vector associated with the variable V~. During block 726, RPU 22 obtains the first binary "l" bit in the Vj bit vector. At block 728, a variable K is set equal to the ordinal position of the selected (726, FIG. 22G) bit o~ Vj. Processing continues at block 730, during which the Create output vector "W" is generated. The output vector "~ " at a position in the row use set corresponds to the bit position k, in the entity select vector (the destination entity select vector corr~Gr~n~l ~ n~ to column j ) . The characteristics of the output bit vector W are determined by calculating NUMS (~ ), PRODS (~ + 1) and PRODS (~ UMS (~ ) indicates the number of repetitions 13386û~
o~ a bit pattern having PRODS (~ + 1) "1" bits.
PRODS (; ) indicates the total number of blts in the bit pattern assoclated with the output bit vector W. The po6ition of the first bit set to 1 in the output bit vector is determined by the ecuation: POSITION = START
RON + OFFSET; where START ROW is the first bit position of the bit 6tring indicating a particular value and OFFSET specifies the number of binary bits set to "O" in the row use vector before the series of bits set to "1"
occurs.
Processing continues in block 732, during whlch the RPU 22 determines if there are any more bits set to "1"
1 in the bit vector Vj. Assuming that there are more " I binary bits set to "1" in the bit vector V;, processing continues at block 736 during which OFFSET is set to OFF8ET f PRODS (~ ~ 1). Then, during block 734, the next bLt of the bit vector V~ is obtained. Processing continues at blocks 728, 730, 732, 734, 736, until all of the binary "1" bits in the bit vector V~ have been 2 0 evaluated and output bit vectors W have been built .
When all the binary bits ln the bit vector V~ have been ' evaluated, processing return6 to the calling program ; ~ during block 738.
3. De~ ed r le For ~ , u~ n~ a Bi n;~y Re~rc~:~ntatinn gf th~ JoIN
Relatic,n Referring to FIGS. 19, 22A, 22B, 22C, 22D, 22E, 22F, 22G, 23A, 23B, and 23C, a detailed example for constructing a binary representation of the JOIN
relation a~ shown in FIG. 19 i8 now described. As discussed earlier, FIG.~ 19 depicts a JOIN operation for the following ~uery:
SE~ECT S.ID~,S.STATUS,S.CITY,P.ID#
FROM S,P
.
13386~
"
WHERE S . CITY=P. CITY;
The result of thls JOIN operatLon for columns CITY, SUPPLIERS ID#s, STATUS and PART ID#6 is shown at 628 of FIG. l9. The binary representation for the new JOIN
relation is shown at 604 and 606. ~ore particularly, the SUPPLIERS relation portion of the JOIN relation 628 is shown at 604 and the PARTS relation portion of the JOIN relation is shown at 606. The purpose of this discussion is to oonstruct the binary representation of the JOIN relation for the ~uery above.
Fig. 23 is a Results Table depicting the JOIN
operation performed on the SUPPLIERS relation and the , ~ ~ PARTS relation for the specific query above. Each row of the Results Table, indicated ~t 850-878, depicts a different step of the creation of the binary representation of the JOIN relation, as shown by the flow diagram of Fig. 22. There are ten oolumns in the Results Table of FIG. 23. From left to right, the first column i5 the value set over which the JOIN operaticn is performed. The second column is the entity select vector associated with the column of the first relation (or SUPPLIER relation) over which the JOIN operation is performed. The third column is the entity select vector for the column in the second relation (or PARTS
relation) over which the JOIN operation is performed.
The fourth column is the resultant bit vector generated by Boolean ANDing the entity select vectors for the two columns over which the JOIN operation is being performed. The fifth column is the selected input row use vectors of the SUPPLIERS relation. The sixth column is the selec~ed input row use vectors for the PARTS relation. The seventh column is the entity select vectcr l, corresponding to the first cclumn of the JOIN
relaticn which also refers to the SUPPLIERS relaticn portion of the JOIN relation. The eighth column i6 the entity select vector 2 co~Le:.~ollding to the second column of the JOIN relation which also refers to the PARTS relation portion of the JOIN relation. The ninth column is the fir6t JOIN row use set corresponding to entity select vector 1, and the tenth column is the second JOIN row use set corresponding to entity select vector 2.
Referring to FIG. 22A, a more detailed di30ussion for performing the EQUIJOIN operation is now discussed.
D~ring block 652, the RPU 22 obtains all entity select vectors for the SUPPLIERS relation and the PARTS
rclation, for the columns which contain values from the values set over which the JOIN operation is performed (e.g., the CITY value set cu~ ~ e~ n~ to the Where clause, S.city = P.city). Thus the entity select vectors for the CITY oolumns of the SUPPLIERS relation (63, FIG. 19) and the PARTS relation (65, FIG. 19) are obtained (~350, FIG. 23A). The second column of the ; 20 Results Table of FIG. Z3 depicts the entity select vector for the CITY column of the SUPPLIERS relation, and the third column of the Results Table of FIG . 2 3 depicts the entity select vector associated with the CITY column of the PARTS relation. Then during block 654, the RPU 22 performs a Boolean AND operation on the entity 3elect vector6 for the SUPP~IERS and PARTS
relatLons to obtain a resultant binary bit veotor ~E
(852, FIG. 23A). The resultant binary bit vector, shown in the thi- d column of Fig. 23A, indicates which values 3 0 of the CITY value set are common to the entity select veators for the CITY columns of the SUPPLIERS and PARTS
relation6. During block 658, the RPU 22 determineG
which binary bits of each entity 6elect vector corre6pond to the binary bits set to "1" in the 35 resultant vector x. For the entity select vector 13386Ql ., aoLL_D~ul-ding to the CITY column in the SUPPLIERS
relation, the row use vectors ~oLL~:D~ol~ding to the valUes London ~L) and Paris (P) are obtained (854, PIG.
23A). Additionally, the row use vectors corresponding to the values London and Paris in the CITY column for the PARTS relation are obtained (856, FIG. 23). Then, during block 660, for each row use sct assoclated with an entLty select vector (i.e., 854 and 856 of FIG. 23A), the Boolean OR operation is performed. The resultant vectors of the Booiean OR operations are entity select vectors for the output or JOIN row use sets of the JOIN
relation being formed. Speciflcally, the entlty select vector 1, for the Boolean OR of the bit vectors for London and Paris is shown at 858 of FIG. 23A. The entity select vector 2 for the Boolean OR of the bit vectors for London and Paris is shown at 860 of FIG.
23A. In block 664, the routine BUILD ROW USE SETS is called for constructing the row use sets associated with the JOIN entity select vectors.
Referring to FIG. 22B, the BUILD ROW USE SETS
routine for constructing the row use sets (i.e., 604 and 606 of FIG. 19) corrP~pnn-lin~ to the columns of the resultant JOIN relation is shown.
During block 672 (FIG. 22B), the RPU 22 selects the first value represented to be present by the first occurrence of a bit set to "1" in the resultant bit vector ~. The first unique value of the resultant relation which corresponds to a bit set to "1" is London. During ~lock 673, the variable START-ROW is set equal to zero. In block 674, the CONSTRUCT JOIN-ROW USE
VECTORS routine is called. Processing contLnues at block 684 of the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
During block 684, the EVALUATE ROW USE VECTORS
35 routine (FIG. 22D) is called. Referring to FIG. 22D at 1~38601 " .
block 704, the RPU 22 obtalns the first row use set associated with the S . CITY column over which the JOIN
operation is performed. Particularly, the RPU 22 obtains the row use set co~Le~Gnding to the S.CITY
column of the Supplier relation. The variable ~
corresponding to this particular column i8 set equal to 1. Processing continues in block 705, during which the row use vector of the row use set for 5 . CITY column is obtained for the unique value London which corresponds to the first binary bit set to "1" in the resultant binary bit vector ~. The variable Vl 18 ~et equal to the row use vector "10010" (854, FIG. 23A). Then, during block 706, the RPU 22 sets the variable Cl equal to 2, the number of "1" bits in the bit vector V1.
During block 707, the RPU 22 determines lf there are any more input columns to be processed. There ls a 6econd input column corresponding to the PARTS relation. Thus, processing continues at block 709, during which the variable ~ is incremented by 1 (~ = 2). Processing continues at block ~05, during which the row use vector "100101" (856, F~G. 23B), corresponding to London in th2 row use set for the P.City column, is obtained. V2 is set equal to this vector. Then, during block 706, the RPU 22 sets the variable C2 equal to 3, the number of "1" bits in the bit vector V2. During block 707, the RPU 22 determines that there are no more input columns over which the JOIN operation is to be performed. Thus, processing returns during block 711 of FIG., 22D to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 686.
During block 686, the PRODUCTS routine (FIG. 22E) i~ called. Referring to FIG. 22E during block 710, a series of numerical products which characterize the formation of the bit patterns in each of the row use - - -1~3~6~1 ., vectors of the JOIN row use set i8 constructed.
Sp~c~f~lly, the function called PRODS i3:
PROD5 -- [ (1, 6) (2, 3) (3, 1) ] .
Processing returns during block 712 to the CONSTRUCT JOIN ROW USE VECTORS routlne (FIG. 22C) at block 68>3. During block 688, the NUMS routine (FIG.
22F) i8 called to determine the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. The NUMS determination is a series of divisions for calculating the number of repetitionG of a particular pattern of "1" bits ln a row use vector of the JOIN relation. SrP~ c~lly, NUMS is equal to the following series:
NUMS 3 (1~ 1) (2, 2) .
15- Processing returns during block 718 to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at bl ock 6 9 0 .
During block 690, the RPU 22 sets; -- 1 and selects the first input column, S.CITY, over which the JOIN
operation is being performed. Processing continues in block 692, during which the row use set associated with the S.CITY column is obtained. Then, in blook 694, the GEN~RATE BIT STRING routine (FIG. 22G) is called.
Referring to FIG. 22G at block 722, the varlable OFFSET is set equal to an inltial value of zero. During block 724, the RPU 22 obtains the bit vector associated with the variable V where ~ = 1. Then, during block 726, the RPU 22 performs a count function on the first bit set to "1" in the binary bit vector Vl. Then, at block 728, the variable K is set equal to the ordinal position of the first binary bit set to lllll in Vl; the ordinal position is 1. ~ Processing continues at block 730, during which the create output vector W is generated. The output vector W corresponds to the bit 35 position 1 in the entity select vector 1 (the 1338GOl ., Destination or ;rOIN entity select vector corresponding to S . CITY column) . The characteristicE of the output bit vector W are determined by calculating NUMS (1), PRODS (2) and PRODS (1). NUMS (l) indicates the number o~ repetitions of the bit string of the output vector W
corr~cp~ n~ to the first occurrence of London in the bit vector V1. PRODS (2~ lndicates that there are three "l" bits in the bit pattern of the output vector W.
PRODS (1) indicates the total number of bits, six, in the bit pattern as60ciated with the output bit vector W.
The position of the first bit set to "1l' in the output bit vector W is determined by the eguation:
POSITION = START ROW + OFFSET: where START ROW - zero and OFFSET = zero. Thu6, the position of the first binary bit set to "l" in the output bit vector W is zero. The output vector W is "lllOOO" (862, FIG. 23).
Processing continues in block 732, during which the RPU
22 determines that there is a second "l" bit in the bit vector Vl. Thus, processing continues at block 736, 2 o ~ during which the VARIABLE OFFSET is determined to be 3 (i.e., OFFSET = OFFSET + PRODS (2) ) . Then, in block 734, the next bit of Vl is obtained. The po6ition of the selected bit of V1 is 4. Thus, during block 728, K
is set to the 4. During block 730, the characteristics of the output vector W associAted with the bit poGition 4 in Vl i6 determined by calculating NUMS (l), PRODS (2) ~nd PRODS (1) ) . The calculation for these formulas was determined above. The position of the first bit set to "1~' in the output vector W is determined to be 3 3 0 (position - O + 3 ) . Thus, the resultant output vector W
is "OOolll" (864, FIG. 23). Processing continues in block 732, during which~the RPU 22 determines that there are no more "1" bits in Vl to evaluate. ThUs, processing returns during block 738 to the JOIN ROW USE
35 VECTOR routine (FIG. 22C) at block 696.
-~338601 Dur1ng block 696 (FIG. 22C) the RPU 22 determines that there i5 another column, P. CITY, over which the JOIN operatlon ifi performed. Processing continues at bloclc 698, during which the variable ~ is incremented by 1 (~ - 2)- Then in block 692, the row use set associated with the column, P.CITY, i8 obtained.
Processing continues ln block 694, during which the GENERATB BIT STRING routine (FIG. 22G) is called.
During block 722 (FIG. 22G) the variable OFFSBT is set equal to the initial value of zero. During block 724, the bit vector V2 is obtained. This is the row Use vector corresponding to London in the P. CITY row use set. In block 726, the first bit set to "1" of the bit vector V2 is selected. During block 728, the position of this selected bit of V2 i6 determined to be 1.
During block 730, the output vector W corresponding to the bit position 1 of the entity select vector 2 (Destination entity select vector corresponding to P. CITY column) is determined . The characteristics of the output vector W are determined by calculating NUMS
(2), PRODS (3) and PRODS (2). NUMS (2) is the number of repetitions of the generated bit string which is egual to 2. The bit string consists of PRODS (2) bits which is equal to three bits. The number of "1" bits in the bit pattern is given by PRODS (3) which is equal to one "1" bit. This series commencing with the first bit set to "1" begins at position zero as indicated by the START
ROW + OFFSET, where ST~RT ROW and OFFSET are both zero.
Thus, the bit pattern of output vector W is "100100"
(866, FIG. 23). Processing continues at block 732, during which the RPU 22~ determines that there are still "1" bits in the bit vector V2 to be evaluated.
Specifically, bits at positions 4 and 6 of the bit vector V2 need to be evaluated. Thus, processing continues at block 736, during which the variable OFFSET
.
is determined to be equal to 1. Processing contlnues at block 734, during which the next bit of the bit vector V2 is obtained and the position of the newly selected bit of bit vector V2 is determined to be 4. This value is set equal to variable K during block 728. During block ~30, the output bit vector W corre6ponding to the bit position 4 of the entity select vector 2 ~destination entity select vector corresponding to column P.CITY) is determined. Specifically, the output bit vector W is determined by calculating NUMS (2), PRODS (3), and PRODS (2). The calculations for these formulas have been determined above and the series of bits corresponding to PRODS (2) begins at the first bit position correspondlng to an offset of 1 (POSITION
START ROW + OFFSET; where STA~T ROW = zero and OFFSET =
1). Thus, the output bit vector W is "010010" (868, FIG. 23). Processing continues at block 732, during which the RPU 22 determines that there is still one "1"
bit in the bit vector V2 to be evaluated. Thus, processing continues at block 736, during which offset iB incremented by the value PRODS (3). PRODS (3) is equal to 1 and thus the value of offset is equal to 2.
During block 734 the next bit of the bit vector V2 is obtaLned and proce6sing continues at block 728. The position of the newly selected bit of the bit vector V2 is 6 and the variable K is set to 6. During block 730, the output vector W corresponding to bit position 6 in the entity select vector 2 is determined. The characteristics o the output vector W have been determined above and the series of bits PRODS (2) begins at the second position which corresponds to an offset equal to 2. Thus, the output vector W is "001001" (870, FIG. 23). Proces6ing continues at block 732, during which the RPU 22 determlnes that there are no more "1"
35 bits to evaluate in the bit vector V2. Thus, processing -continues at block 73~3, during which processing returns to the CONSTRUCT JOIN RoW USE VECTOR routine (FIG. 22C) at block 696.
During block 696, the RPU 22 tlPt~ n~: that there are no more lnput columns for processlng. Thus, processing continues at block 700, during which the START-ROW variable is calculated to be 6: START-ROW plus PRODS tl) = 6. Processing returns at block 702 to block 676 of the BUILD ROW USE SETS routlne, FIG. 22B. During block 676, the RPU 22 determines that there is one more unique value indicated to be present ln the resultant bit vector ~. The bit set to "1" in the resUltant bit vector corresponds to the unique value Paris. During block 678, the unique value i = 2 for Paris is obtained.
Then, durin~ block 674, the CONSTRUCT JOIN ROW USE
VECTOR~3 routine (FIG. 22C) is called. In a similar fashion, the row use vectors corresponding to the value PARIS are formed.
2 0 4 . Construct ~ nq a BINARy ~ h,~ ATTON of a t~RFATER TNbN JOJN
Referring to FIGS. 2 and 24, a detailed discussion for performing the GREAT~R THAN JOIN operation is now discussed. Specifically, in the JOIN operation, the "where" clause contains the predicate "greater than"
( > ). For example, referring to FIG. 2, suppose that a usQr wishes to combine the SUPPLIERS relation 63 and the PARTS relation 65 such that the SUPPLIER CITY follows the PARTS CITY in alphabetical order. The command for 3 o this query ls:
SELECT S . *, P. *
FRO~q S, P
WHERE S . CITY > P . CITY
::
.
133o601 The result of this query on the Supplier5 relation 63 and thQ Parts relation 65 (FIG. 2) i8 the following relation:
5~A
__ __ _ _ _ S# SNA~qE STATUS S.C~ P# PNAME COLOR WEIGHT P.CFrY
.
S2 Jones 10 Paris Pl Nut R~d 12 London S2 Jones 10 Paris P4 Screw Red 14 London 10 S2 Jones 10 Paris P6 Cog Red 19 Lcndon 53 Blake 30 Paris P1 Nut Red 12 london S3 Blake 30 Paris P4 Screw Red 14 London S3 Blake 3~ Paris P6 Ccg Red 19 L~don As in the case of the EQUIJOIN examples discussed above, the command for the GREATER THAN JOIN operation requires that the data oome rom two relations, namely, the Suppliers relation (63, FIG. 2), and the Parts relation (65, FIG. 2). As shown above, both relations 2 are named in the FROM clause, and the express connection between the tables is the City column in the WHERE
clause. The result o the GREATER THAN JOIN 18 for dlsplaying all the columns of the Suppliers relation with all the columns of the Parts relation. The process or creating the binary representation of the GREATER
THAN JOIN relation is the same as for the EQUIJOIN
operation~ Namely, the BUIL~ ROW USE set ~FIG. 22B) ~8 called to physically construct the columns of the ;JOIN
relation. The ma~or difference in performing a GREATER
3 TE~AN JOIN rather than an EQUIJOIN is inserting and preparing the data prior to the construction of the row use sets for the ,JOIN relation. Speci~ically, FIG. 24 is a flow diagram which depicts the steps for evaluating and preparing the data prior to performing the BUILD ROW
~338601 --1 os--USE SETS routine (FIG. 22B), which construct3 the bLnary representation of the JOIN relation.
Note ln Table A for the GREATER THAN JOIN, that the CLty column of the Suppliers portion of the relation contalns the value Paris, which is in all instances lexically greater than the value London in the Parts relation. Whereas, the value for London is not in the City column for the Suppliers portion of the JOIN
relation because London i5 not greater than the lexical values for Rome, Paris or London. Also note that the GREATER THAN JOIN operation can be performed for any particular characteristic. The results table above relied on the lexical ordering of the cities. However, any other kind of ordering could have been used to perform a comparison o greater than, for example, numerical ordering of values, etc.
Referring to FIG. 24, a generic routine for performing the GREATER THAN JOIN operation is shown.
Specifically, the JOIN operation is constructed for all the values of column A (i.e. all the value6 of the City column of the Suppliers relation) which are greater than all the values of column B ( i . e . all the values of the City column for the Parts relation~. It should be noted that this routine could be modLfied for any other of the predlcate JOINs such as LESS THAN, LESS THAN OR EQUAL
TO, and GREATER THAN OR EQUAL TO.
During block 1004, a sort operation is performed on all the values of column A to order the values from the greatest value to the least greatest value. Then, in block 1006, a binary bit vector n is generated. The binary bit vector ~ contains a number of binary bits equal to the number of~ values in the value set over which the GREATER THAN JOIN operation is performed. The binary bit vector ~2 has its binary bits set to "1" at all the ordinal po6itions corresponding to the values of 133~601 --11 o--the value set, which are less than the value y of column A.
During block 1008, the Boolean AND operatLon i6 performed on the binary bit vector ~2 with the entity select vector corresponding to column B. The resultant vector contains binary bits set to "1" at the ordinal positions corresponding to the values of column B, which are less than the value y of column A. In block 1010, the resultant vector is checked to see if it is "0". If the resultant vector is "O", then processing returns to the calling routine during block 1012 because none of the values of B is less than the value of A. Therefore, the GREATER THAN JoIN operation cannot be satisfied, and ;; thus, the user is noti~ied. As:s~ n~ that the resultant vector is not "0", then processing continues at block 1014. During block 1014, the ordinal positions of the binary bit set to "1" in the result vector calculated in block 1008 are determined. The ordinal positions correspond to the ordinal positions of the entity seleot vector, which correspond to the values of column B, which are less than the values of column A. Then during block 1016, all of the row use vectors associated with the ordinal positions of column B, which are associated with the binary bits set to "1" in the result vector, are obtalnad. Additionally, the row use vector for value y of column A is also obtained. During block 1018, the row use vectors associated with column B are placed into a temporary area, along with the row use vector for column A. During block 1020, the Boolean OR
operation is performed between the row use vectors associated with column B to generate the entity select vector for the column B portion of the JOIN relation.
With the entity select vector for the column A portion of the JOIN relation is a row use vector associated with 3s he v~lu~ y o~ colu~n a. stated di~r~nely~ the el~t1ey 13~8601 select vector for the input column B depict5 all of the values of B, which are less than the value y of column A. Thus, the BUILD ROW USE SETS routine (FIG. 22B) will - construct the first 6everal rows Or the ~OIN relation, which will consist of the value y of column A and all of the values of column B, which are less value y. Then, during block 1024, a determination is made o~ whether any more values of column A need to be evaluated. If there are no more values to be evaluated by the GREATER
o THAN JOIN operation ~FIG. 24), then processing returns to the calling program at 1026. However, if there are more values to be evaluated by the routine, then processing continues at block 1028. During block 1028, the next value y of column A, which is less than the last value of column A evaluated, is obtained.
F. DISPrAY/~ ~N~ J~;L For JOIr~ Operat~on Typically, the JOIN operation is performed over two relations and a resultant JOIN relation is generated.
Recall that the resultant JOIN relation (FIGS. 19, 20 2nd 21) was a binary representation. The system can automatically perform the DISPLAY/REc~1Nb1~uul FOR A JOIN
operation routine (FIG. 25A) to generate the particular columns of the JOIN reLation for the user to ascertain.
The DISPLAY/RECONSTRUCT FOR JOIN operation tFIG.
25A) will be referenced by the PROJECT operation at block 566 (FIG. 17A), instead of the DISP~Y/RECONSTRUCT
routine (FIG. 17B). Referring now to FIG. 25A, a more detailed discussion for reconstructing and displaying the columns of a JOIN r~lation is now tl~C~ QC~ ore particularly, during block 1102, a row use vector colL~ cllding to the rcws of the JOIN relation to be constructed is created. Then, during block 1104, an index vector assocIated with the JOIN relation is 3 5 created . The index vector is a binary bit ve~tor in 1338~01 which each binary bit corresponds to a row use vector of the JOIN row use set. Each bit indicates whether the unique value associated with the row use vector exists in the JOIN column, which is to be displayed. If a binary bit in the index vector is set to "0", the unique value associated with the row use vector does not exist in the JOIN column to be displayed. Whereas, if the binary bit ls 6et to "1" in the Lndex vector, then the unique value associated with the row use vector exists one or more times in a column. At this time, the index vector contains a quantity of binary bits equal to the number o~ row use vectors of the JOIN row use set.
During block 1106, the row use set for the JOIN column to be displayed is obtained. During block 1108, the i~irst row use veator associated with the JOIN column row use set is obtained. 2qore particularly, the first row use vector, which is currently stored in memory 18 tFIG.
lA), is transferred via bus 48 to the BBVP 14 (FIG. lA).
Then, during block 1108, the RPU 22 via BBVP 14 performs a Boolean AND operation on a row use vector with the row select vector. As discussed earlier in Part V. (D. ), the row select vector is a binary bit vector, and each binary bit corresponds to a row of the column or columns to be displayed by the system. A binary bit set to "1"
in the row select vector indicates that the - corresponding row needs to be displayed by the system.
The result of the AND operation is a blnary bit vector z, which depicts the rows of the column which contain a particular value associated with the current row use vector. The results of the AND operation are sent via bus 48 back to memory 18 (FIG. lA) for future processing. Then, in biock 1110, the RPU 22 determines whether the resultant vector i~ "0". ~ore particularly, the resultant vector Z contains all binary bits set to "0". If the binary bit vector is "0", then during block 1112, a "0" is placed in the first binary bit position of the index vector to indicate that the unique value associated with the row use vector does not exist in the JOIN column to be displayed. However, if the resultant bit vector Z is not equal to "0", then processing continue6 to block 1114. The RPU 22 via BBVP 14 sets the binary bit of the index vector, which is associated with the current row use vector, to "1". In block 1116, the resultant bit vector Z is stored in memory 18. The resultant bit vector Z is identified for this particular row use vector presently being processed, and it will later be used in the reconstruction process.
During bLock 1118, the RPU 22 via BBVP 14 (FIG. lA) clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z. l'he purpose o~ this step is to "shortcut" the processing o~ the row use vectors in the row use ~et.
Stated di~ferently, when the ,JOIN row select vector is cleared, all of the values in the rows to be displayed for the column can be determined. During block 1120, the RPU-22 (FIG. lP.) determines whether the JOIN row select vector has had all its binary bits set to "0".
If the JOIN row select vector contains only binary bits set to "0", then processing continues at blook 1126.
However, if all of the binary bits in the row use vector are not set to "o", then processing continues at block 1122. During block 1122, the RPU 22 (FIG. lA) gets the next JOIN row use vector in the JOIN row use set currently being processed. During block 1124, the RPU
22 (FIG. lA~ determlnes whether the end of the JOIN row use vector has been reached. If the end of the JOIN row use vector has been re~ched, then processing continues at block 1126. However, assuming that the end of the row use vector has not been reached, then processing 3s oor~in~le~ ~t l:lo~ 1108, 1110, 1114, 116, 1118, 1120, 13386~1 "
1122 and 1124 until all of the JOIN row use vector3 of the JOIN row use set have been completely processed.
Assuming that all of the JOIN row Use vectors in the JOIN row use set have been processed or that the JOIN row 6elect vector contains only binary blts set to "O", then processing continues at block 1126. During block 1126, the RPU 22 (FIG. lA) determines whether the present column for display references a value set or a relation. The DISPLAY/RECONSTRUCT routine for the JOIN
operation (FIG. 25A~ starts with the row use set of the JOIN column and then references the relation corresponding to the row use sQt in the JOIN relation.
Thus, the column to be dlsplayed is a JOIN column, and it does not reference a value set. Therefore, processing continues at block 1112, during which the REFERENCE RELATIO~ routine (FIG. 25B) is called.
During block 1136 of the REFERENCE RF~T.2lrrTO~ routine (FIG. 25B), the RPU 22 obtains the entity select vector as60ciated with the JOIN column. The entity select vector for the JOIN column is used as a row select vector for obtaining values from the referenced relation. Proces6ing returns to the DISPLAY/R~CONSTRUCT
FOR JOIN operation routine (FIG. 25A) at block 1104.
During block 1104, an index vector for the referenced relation is created. More particularly, the index vector contains a number of binary bits ls equal to the number of row use vectors of the reference relation. Then, during 1106, the RPU 22 (FIG. lA) obtains ~ row use set for a column of the relation to be displayed. The first row use vector ~ssociated with the row use set iG obtained. During block 1108, the RPU 22 instructs the BBVP 14 to perform a Boolean AND operation on the row use veotor obtained in block 1106 with the row select vector obtained during the R~ ;~;Nc~ T.Z~'l'Tt)N
routine (FIG. 6B). The reGult of the AND operation is a vector Z which displays the rows of the oolumn which contain a particular value associated with the row use vector. Then, during block 1110, the RPU 22 ~FIG. lA) determines whether the resultant vector Z i8 1lOl'. If the resultant vector Z is "0", then during block 1112, a "0" is 6et in the first binary bit position of the index vector. Processing continues at block 1122, during which the next row use vector associated with the column in the reference relation is obtained. R,otl~rn~n~ to block 1110 ~ if the result of the Boolean operatlon performed in block 1108 is "0", then processing continues at block 1114. During block 1114, the RPU 22 vla BBVP 14 (FIG. lA) sets the binary bit in the Lndex vector, which is associated with the current row use vector to "1". During block 1116, the resultant bit vector Z is stored in memory 18 (FIG. lA).
During block 1118, the RPU 22 instructs the BBVP 14 (FIG. lA) to clear the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z. Again the purpose of this step ls to 6hortcut the processing of the row use vectors of the row use set associated with the referenced relation.
During block 1120, the RPU 22 (FIG. lA) instructs the BBVP 14 to determine whether the row select vector has been completely cleared. stated di~ferently, whether all the binary bits have been set to "0". The row select vector contains only binary bits set to "o", and processing continues at decision block 1126. Otherwi6e, assuming that all of the binary bits in the row use 30, vector are not set to "0", then proces3ing continues at block 1122. During block 1122, the RPU 22 (FIG. lA) acquires the next row ~se vector in the row use set of the referenced relation currently being processed.
Then, during block 1124, the RPU 22 instructs the BBVP
14 to determine whether the end of the row use set has 1338~0~
been reached. If the end of the row use set has been reached, then processlng continues ~t block 1126.
Otherwise, assuming that the end of the row u6e set has not been reached, then processing continues at block 1108, 1110, 1114, 1116, 1118, 1120, 1122, and 1124 until all the row use vectors of the row use set of the referenced relation have been processed.
Assuming that all of the row use vectors of the row use set have been proces6ed, or in the event that the row select vector corresponding to the referenced relation contains only binary bits set to "0", then processing continues at block 1126. During block 1126, the RPU 22 (FIG. lA) dPtpr~npc whether the column currently being evaluated references a relatlon or a value set. The currently column presently being evaluated references a value set, and thus, processing continues at block 1130. During block 1130, the REFERENCE VALUE SET routine (FIG. 25C) is called.
The REFERENCE VALUE SET routine (FIG. 25C~ obtains the values from the value set and places the values ln the proper rows of the JOIN relation column. More particularly, referring to block 1142, the R~?U 22 instructs BBVP 14 (FIG. lA) to determine the ordinal po~;itions of the binary bits set to "1" in the index vector for the referenced column. Each binary bit set to "1" indicates which row use vectors reference unicue values to be displayed in the referenced column. In addition, for each row use vector which has a corresponding binary bit set to "1" in the lndex veotor, the ordinal position of the corresponding binary bit set to "1" in the entity select vector associated with the value set is de~ nP~ This process is performed for each of the row use vectors which has a coLLei.~onding binary bit set to "1" in the index vector for the referenced column. Then during block 1144, the values associated with the ordinal positions obtaLned in the block 1142 are retrieved from the referenced value set.
During block 1146, the RPU 22 (FIG. lA) finds the appropriate~location in the index vector associated with each of the values retrieved in block 1144. Then, during block 1148, the appropriate resultant Z vector associated with each of the row use vectors is retrieved. The resultant Z vector indicates which rows of the referenced column contain the unigue value associated with the row use vector. Then, during block 1150, the RPU 22 (FIG. lA) determines whether any more values are left for processing. Assuming that there are more values, then processing continues at blocks 1146 and 1148 until all the values from the value set have been placed in the proper rows of the referenced column.
Assuming that all of the values have been placed in the proper rows of the column, then processlng continues at block 1152. During block 1152, for each row of the referenced column, the corresponding JOIN row use set is determined. More particularly, the corresponding binary bit set to "l" in the lndex vector for the JOIN column is determined. I`hen, during block 1153, the approprlate vector Z temporarily stored at an earlier step is obtained. The resultant vector Z indicates the row positions of the JOIN column where the value from the referenced column should be placed. During block 1155, the RPU 22 determines whether any more values from the referenced column need to be processed. If more values need to be processed, then processing continues at 3 o blocks 1152, 1153 and 1155 until all the values of the re~erenced column have been processed. 7~ n~ that ~ll the values have Been processed, then processing returns to the DISP~AY/RECONSTRUCT FOR JOIN routine (FIG. 25A). During block 1132, processing returns to ~5 'che cmll~og progr~lm.
1. T~ s~nle of the DT!::PT.~Y/~ ,'O~ .J
Ol~eration FQr A JOrN R~ ion Referring to FIGS. 20, 25A, Z5B, 25C, 26A, 26B, 26C, 26D and 26E, a detailed example ~or performing the DISPLAY/RECONSTRIICT program on the JOIN column for Suppliers IDs in the ~OIN relatLon (604, FIG. 29) is now ~CGllCC~ ore particularly, it is assumed that only a binary representation of the JOIN column for Suppliers IDs exists in the RD~S 10. The binary representation of the JOIN relation is a result of the JOIN operation, or it may have been previously stored after processing of the BBVP 14 tFIG. lA). In either cace, the binary representation of the JOIN column exists ln memory 18 and now the user or applications program needs to display the actual values of the JOIN column. Although, this example is for reconstructing and di~playing only one column of the JOIN relation (604, FIG. 20), the inln~ columns of the JOIN relation could be reconstructed by sequentially performing the PROJECT
operation. Recall, that the PROJECT routine (FIG. 17A) calls the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS.
20A, 20B and 20C). Referring to FIGS. 21A, 21B, 21C, 21D and 21E, a results table for depicting the results of the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 25A, 25B and 25C) for reconstructing and supplying the Suppliers ID column of the JOIN relation (604, 626 of FIG. 20) is shown. Each row of the results table depicts a result of the routines shown in FIGS. 25A, 25B
~nd 25C. The results table is separated into 12 columns from left to right. The first column of the results table shows the row se'lect vector associated with the JOIN column to be displayed. The second column shows the row use set associated with the JOIN column to be displayed. The third column is the row use vector of 133860~
"
the row use set which is currently being proces6ed, and the fourth column is the index vector associated with the row use set corresoonding to the JOIN column. The fifth column is a resultant binary bit vector ~. The resultant binary bit vector Z is determined by ending a JOIN relation row use vector and the row select vector.
The sixth column deplcts the entity select vector associated with the referenced relation. The seventh column is an index vector associated with the row use set corresponding to the re~erenced relational column.
The eighth column depicts the row use set corresponding to the referenced relational column. The ninth column is for the index vector associated with the row use set for the referenced relational column. The tenth column is for the resultant bit vector Z. The resultant bit vector Z is determined by ending a row use vector of the row use set with the entity 6elect vector. The eleventh column is a depiction of the creation of the referenced relational column. The last column depicts the ~OIN
relational column, which is to be displayed.
i Referring to FIG. 25B, the referenced relation obtains the entity select vector (600, FIG. 20) associated with the J0rN column. The entity select vector is used as a row select vector for specifying the rows of the referenced relation which have values in the JOIN columr (1184, FIG. 26C). Then, during block 1138, processing returns to the DIS3~LAY/RECONSTRUCT FOR JOIN
routine (FIG. 25A) at block ll04.
Block 1104 creates an index vector for the row use sets corresponding to the Supplier ID column tll81, FIG.
20). During block 1106, the row use set associated with the Supplier ID column' is retrieved (1186, FIG. 26C).
Then, during bloc~c 1108, the first binary bit vector of the row use set (1186, FIG. 26C) is ANDed with the entity select vector to form a resultant vector Z (1192, 1338~01 ., FIG. 26C). During block 1110, the resultant vector is evaluated to determine if all the binary bit6 are set to "0". One bit in the re6ultant vector Z is set to "1", and thus, processing continues at block 1114. During block 1114, the binary bit in the index vector associated with the current row use vector is set to "1"
(1194, FIG. 26C). Then, during block 1116, the resultant vector Z associated with the current row use vector is stored in memory 18 (FIG. lA). During block 1118, the binary bit set to "1" in the row select vector that matched the binary bit set to "1" in the resultant vector Z are cleared (1196, FIG. 26D). Processing continues at block 1120, during which the row 6elect vector is evaluated to determine if all the binary bits are set to "0". All the binary bit6 of the row select vector are not "0" (1196, FIG. 26D), and thus, processing continues at block 1122. During block 1122, the next row use vector of the row set for the Supplier ID column is obtained (1198, FIG. 26D) . During block 1124, the P.PU 2Z (FIG. lA) determines if the end of the row use set has been reached. The end of the row use 6et has not been reached, and thus, processing continues at block 1108.
VI. E~TITY US~ ~7CctnrS
To this point, the use of binary bit vector6 in the setting of a relational database has been extensively discussed. The purpose of the binary bit vectors as described in the Binary Bit Vector Technology (Part III) section of this specification is for characterizing a subset of an ordered set. In con~rast, the pu~pose o~
an entity use vector ~is for defining a relationship between the elemen~s of two sets.
More particularly, referring to FIG. 27, sets 5 35 1234 and T 1236 define the set of all ordered pairs (S, 13386~1 T) such that 8 is an element of S and t is an element of T. A "mapping" exists from S 1234 to T 1236. Nore than one element of set S 1234 may map onto unique elements of T 1236; however, there will never exist more than one element of T into which maps the elements of S. In mathematical terminology, the set S 1234 is the "domain", and the set T 1236 is the "range". FIG. 27 shows the many-to-one mapping. Specifically, sl and 52 map to the value tl. The purpose of the entity use vector is to facilitate this relationship between ordinal positions of elements in one set to another set.
Stated differently, the entity use vector is a vector whose elements are values expressed ln ordinal positions of elemer.~s within another set.
1~ In the context of the current invention, the entity use vector constitutes a function (or a mapping) between either a value in a value set or a relational row in another or the same relation. The mapping may apply specifically ta a relational column, whether it be a single physical column or a physical column oonsisting of many conceptual columns. The implied ordinal position within the entlty use vector corresponds to a row number in the "domain". The as60ciated value of the element is the entity location (ordinal position of thQ
2~ value in the value set or row position of the relational row being referenced) of the "range".
Re~erring to FIGS . 4, 2 0, 2 8, and 2 9, a more detailed discussion regarding the entity use vector and its use in the relational database is now rl i qc~qsed.
Referring to FIG. 4, a binary representation of the Supplier relation of the Supplier and Parts relational database is shown. In this database, the mapping between the domains and the columns was achieved by entity select vecto~s and row use vector fiets . q he 3 5 add~d re~ture, entlty u~ vectore, hAve b~en Added to the Supplier relation as depicted in FIG. 4, as shown in FIG. 28. Note that the entity use vectors 1238, 1240, 1242 and 1244 correspond to the columns 168, 170, 172 and 174. Recall that the columns do not actually exist ln the RDMS 10; instead, the binary representations of the columns 260, 262, 264 and 266, respectively, are stored in memory 18 area of the RDMS. If the entity use vectors are employed into the relational database setting, an efficiency for det~rTnin~n~ the values afisociated with each row of a column in the relation is created. For example, referring to the entity use vector 12~, which corresponds to the city column 174 of the Supplier relation, note that in the first ordinal position 1237 of the entity use vector, the value 5 resides. Essentially, the value 5 maps the value in the first row of the city column 174 to the fifth row of the domain 166. Referring to the dotted line 1249, the mapping of the first row of the city column 174 to the fifth row of the value set is shown. Likewise, the fourth row of the clty column 174 is also mapped to the fifth row position of the value set 166. Thus, the first row and the fourth row of the city column 174, both map to the same row of the value set 166. In this way, the many-to-one mapping can be achieved via the entity use vector. Likewise, the second row and the third row of the city column 174 map to the same row of the value set 166 may be implied mapping at 1247.
Lastly, the fifth row of the city column 174 maps back to the first row of the value set 166. With all the elements in the entity use vector, the city column 174 can be easily reconstructed with the values of the value set 166. More particu~arly, the fLrst row of the city column 174 corresponds to the value 5 in the entity use vector at lZ37 which OLLe:~OlldS to the value London.
The second row of the city column 174 corresponds to the ., .
entity use vector value 8 which corresponds to the value Paris. Likewise, the third row of the city column 174 corresponds to the value 8 of the entity use vector and to the value Paris. The fourth row coLLt~ ol~ds to the value 5 in the entity use vector and to the value London. Lastly, the fifth row of the city column 174 cuLLe:~ollds to value 1 in the entity use veotor which corresponds to the value Athens. The entity use vector will be used specifically during the DISPLAY/KE~u~lblKu~l operations for the RDMS 10. The entity use vector facilitates the steps for det~rmin1n~ the values in the particular columns af a relation and hence facilitates the DISPLAY/KECONSTRUCT process.
Referring to FIG. 29, a depiction of how entity use vectors may be used in the setting of a JOIN relation is 6hown. More particularly, the entity use vectors 1238, 1240, 1242, 1244, and 1246 have been added to the binary representation of the JOIN relation of FIG. 20. The entity use vectors 1238, 1240, 1242 and 1244 are identical to the entity use vectors of the Supplier relation in FIG. 28. The new entity use vector o~ FIG.
29 is at 1246. This entity use vector maps the rows of the JOIN columns to the rows of the input relations.
More particularly, the values of the entity use vector correspond to the row numbers of the referenced relation. In particular, the first three rows of the entity use vector, 1250, 1251, 1252, respectiveIy, correspond to row 1 of the Supplier relation. Thus, the first three rows of the JOIN relation will contain values from the first row of the city column. The city column has associated with it an entity use vector 1244 which has a value 5 in~ the f irst row corresponding to the first row of the city column. The value 5 maps back to the fifth row of the value set 166. The fifth row of 3 5 t~ lue s ~t ~ ~ ~ cont~nll the v~lu~ Lo~on . ~her~rore, "
the first three rows of the entity use vector, 1250, 1251, and 1252, correspond to the value London as shown by the physical representation of the S . CITY column of the JOIN relatLon. The entity use vectors associated with the JOIN relation provide a mapping between the columns of the Supplier relation and the columns of the JOIN relation. As stated earlier, the columns are mapped via the entity use vectors to the value sets of the relational database. Once again, the entity use vector will be used during the DISPLAY/RBCONSTRUCT
process of the JOIN relation. The entity use vector will substantially improve the processing time for generating the actual representation of the JOIN
relation by not having to perform the multitude of ~teps associated with the DISPL~Y/R~ ON~1KUC1 oper~tions as discussed earlier. Again, a more detailed discussion on the DISPLAY/RECONSTRUCT will be presented shortly.
Referring to FIG. lA, typically, the entity use vectors will be stored in byte or multi-byte form in memory 18. When a binary representation of a JOIN
relation, or any other type of relation for that matter, needs to be displayed to the user, the RPU 22 cause3 the appropriate entity use vectors to move from the memory 18 via bus 46 to r~vP 15. There, the RPU 22 evaluates the entity use vectors via the DISPIAY/RE~ON~lr~u~ L
routines (FIGS. 30 and 31). When the relation has been reconstructed and it is ready for display to the external device 12 for storage or to display 3 (FIG. 1) for the user to ascertain.
Referring now to FIG. 7, the entity usc vectors are constructed during the load operation performed by the ~INARY K~rK~;~rNlATION 'routine at block 356. More particularly, when the creation of the relational database has been completed, the system loads the file representation of the relations into Rxternal device 12, .
where they reslde until summoned by the RDMS system 10.
A particular column i8 retrieved by referring to its system ldentifier as discussed earlier and in more detail in part VII. The first value of the particular column i5 brought to the RPU 22, as ~Rc~R~d earlier, a row use vector as60ciated with the value is built by the BBVP 14. Additionally, a bit is set to "l" in the first position of the row use vector to indicate that the value occupies the first row of the first column of the relation. At the same time the first value is brought to the ~PU 22, it is also evaluated by the MVP 15. When the first value is evaluated by the MVP 15, an entity use vector associated with the column is built.
Specifically, the first posltion of the entity use vector is assigned a numerical value to indicate the row of the value ~et corresponding to the input value. As each value is input from the column, the MVP 15 generates a corresponding value associated with the row position in the value set which contains the particular value. Once the column has been completed and the next column is entered from the external device to the MVP
15, a new entity use vector is created. This process continue~ until all of the columns input from the external device have associated entity use vectors. As the entity u6e vectors are constructerl, they are sent via bus 46 to memory 18, where they reside until they are called for processlng via the DISPLAY/K~ KU~:L
routine at FIGS. 25A, 25B and 25C.
Referring to FIG. 30A, a flow diagram for the DISPLAY/K~CON~lKuCT - WITH ENTITY USE VECTORS is shown.
The purpose of this routine is to efficlently reconstruct and display ~the values of a specified column or columns. More particularly, referring to block 1270, the RPU 22 (FIG. lA) causes the entity use vector associated with the particular column t~ be displayed to move from memory 13 via bus 46 to MVP 15. Then during block 1272, the RPU 22 determines whether the entity use vector associated with the column references a value set or another relation. AGsuming that the entity use vector references another relation, then processing Continues at block 1273, during which the REFERENCE
RELATION (FIG. 30B) is called. However, if the entity use vector associated with the column to be displayed references a value set, then the REFERENCE VALUE SET
routine (FIG. 30B) is called. The R~ ;~;N~:~ RELATION
routine (FIG. 30C) is called when the column to be displayed is, for example, a column of the JOIN
relation. In contrast, the REFERENCE VAL~E SET routine (FIG. 30C) is called when the column to be displayed is a column of a base relation (one which references one or more value sets di~ectly). In either case, when the column has been reconstructed and is ready for display, processing rèturns to the calling program at block 1276.
Referring to FIG. 3~B, a flow diagram for the REFERENCE RELATION routine is shown. Specifically, during block 1280, the entity select vector associated with the column is obtained. Thi6 entity select vector is used as a row ~elect vector for obtaining the values from a REFERENCE RELATION. For example, the entity select vector 600 ~FIG. 29) which is associated with the JOIN column depicting the row use set of the JOIN
relation 604 (FIG. 29) could be such a Kl~ N~i RELATION. Processing continues at block 12~32, during which processing returns to the DISP~Y/RE~:O~ Uo1-WITH ENTITY USE VECTORS routine (FIG. 29A).
Referring to FIG. 30C, a flow diagram for the REFERENCE VA1UE SET routine is now discussed. ~ore particularly, during block 1286, the values from thQ
value set are obtained corresponding to the ordinal positions expressed by the entity use vector elements.
., Then, during block 1288, a value is placed in the appropriate row o~ the column. The ordlnal position of the value in the column correspond6 to the ordlnal poGition of the corresponding element in the entity use vector. During block 1290, the next value of the value set ls obtalned and in block 1292, the RPU 22 (FIG. lA) rl~torm~ n.~e whether all the values have been processed.
Assuming that not all of the values have been processed, then processing continues at block 1288, 1290, 1292, until all of the values are placed in the appropriate row positions of the column. Once all the values have been placed in the column, processing continues at block 1294. During block 1294, it is determined whether there 18 another column to go through. ~lore particularly, whether there is a ,JOIN column which needs to be reconstruoted, for example, the CITY column 638 tFIG.
29) of the JOIN relation. Assuming that a JOIN relation needs to be reconstructed, then processlng contlnue~ at blocks 1288, 1290, 1292, 1294 until all the values associated with the JOIN column have been placed into the proper rows of the column. Assuming that there are no more levels of column6 to reconstruct, then processing continues at hlock 1296, during which processing returns to the DIS~LAY/RECONSTRUCT - WITH
ENTITY USE VECTORS routine (FIG. 30A) at block 1276.
During block 1276, processing returns to the calling program and the display of the column has been completed .
VII. DatAhA~e Id~ntificat~r~n Referring to FIGS. 31, 32, 33, 34, 35, 36, 37A, 37B, 37C and 37D, a' detailed description of the structure for maintaining identification of the elements of the relational database is now ~1 cc~le~e~l. Referring to FIG. 31, a repre~entation of a relational database is .
133860~
,f shown. The top portion of FIG. 31 depicts seven domalns of unique values referenced by two relatlons (Supplier and Parts). The third relation, labelcd ~OIN relation, references l~oth the Supplier and Parta relations as shown. All three relations depicted in PIG, 31 were thoroughly discussed in the previous sections of the sre~ tion. Specifically, the Supplier relation is detailed in FIG. 4, the Parts relation i8 detailed in FIG. 5, and the JOIN relation is detailed in FIG. 19.
The three relations are assumed to be in their binary representations and stored in memory 18 (FIG. lA). The RD~S 10 maintains track of all of the columns of the relational database (FIG. 31) via an identification scheme, called the System relation. The System relation logically connects all of the necessary information for describing a column of any of the relations.
Referring to FIG. 32, the System relation for the supplier/Parts relational database (FIG. 31) is shown.
The System relation i8 broken up into four columns.
From left to right, the column identifLer (CID), the relation identifier (RID), the attribute identifier (AID), and the domain identlfier (DID). The CID i8 a number which distinguishes a column from all of the other columns of the relational database. The RID
identifies the relation in which a column logically resides. For example, column 80 (FIG. 2) resides in rclation 63 (FIG. z). The AID de~ines the order or position in which the column resides in a particular relation. For example, column 80 (FIG. 2) is the first 3 0 column of the relation 63 (FIG . 2 ), and thus , has an AID
number o~ 1. The DID identifies the particular domain associated with the column. All four identi~iers together characterize each column of the relational database. Stated differently, all four identifiers ., characterize the relationship of a column in the relatlonal database.
Referring to FIG. 31, each element of the supplier/Parts~ relational database ls identified by a CID number. Speclfically, the Supplier/Identifier domain is characterized by a CID number "l" and an RID
number "1". Domains do not have AID or DID numbers beoause the relation only has a single column, and it is itself a domain. Normally, a single oolumn in the rclational database would have associated with it an AID
number of l; however, in the preferred ' ~ L, the domain is always oonsidered to have an AID of 0.
I.ikewi6e, the Parts Identifier domain i8 characterized by a CID number "2" and an RID number "2". The domains are more fully depicted in FIG. 33 with their assoclated CID and R~D numbers. The CID and RID numbers for the domains are shown in the first seven rows of the system relation (FI~. 32). With the CID and the RID number, any of the domains for the Supplier/Parts relation (FIG.
2~ 31) can be refe~enced and obtained.
Now, referring to the Supplier relation of FIG. 31, the CID, RID, AID and DID numbers associated for each of the columns and the relation is now discussed. The Supplier relation is identified by a special row of the System relation, which generates a virtual column (as seen by tl^e user) for ldentLfying the Supplier relation.
Each relation of the relatLonal databases has a virtual column. The user never sees this column. This column has a CID number of 8 and the Supplier relation has an RID number of 8. The virtual column oontains row numbers associated with the rows of the relation. This column by convention does not have associated with it an AID or a DID number. Thus, in the eighth row of the system relation ~FIG 32), the AID and DID numbers are set to 0. The first column (which the user sees) of t~e ~3386ol Supplier relation i8 the Supplier ID column and it ha6 a660ciated with it a CID number 9. The Supplier ID
column is part of the Supplier relation, and thus, its RID number is 8. Likewise, the Per60n Name column of the Supplier relation has a CID number 10 and it, too, i6 part of the Supplier relation, and thus, has an RID
number of 8. The Statu6 column o~ the Supplier relation has a CID number 11 and it i6 part of the Supplier relation, and thus, has an RID number 8. The last column of the Supplier relation is the City column and it has a CID number Qf 12 and an RID number 8.
Note in FIG. 32 that the Supplier ID column has sn AID number of 1, which identifies the Supplier ID column as the fir8t column of the Supplier relation. The Supplier ID column is a6sociated with the Supplier Identifier domain, and thus, the DID number a6sociated with the Supplier column is 1. Recall that the CID
number for a Supplier Identifier domain is etaual to 1, and this corresponds to the DID numoer for the Supplier ID column. Likewise, the Person Name column of the Supplier relation has an AID number of 2 and a DID
number of 3. More particularly, the AID number 2 mean6 that the Supplier Name column i5 the 6econd column of the Supplier Relation and the DID number 3 means that 2 5 the Person Names column i6 associated with the Person Names domain. The Status column of the Supplier Relation has an AID number 3 and a DID number 7. The AID number 3 means that the Status column i6 the third column of the Supplier relation and the DID number 7 means that the Numbers domain is associated with the Status column. Lastly, the City column has AID number 4 and DID number 5 associ~ted with it. The AID number 4 means that City column is the fourth column of the system relation and the DID number 5 means that the City 1338~01 domain is associated with the City column of the Supplier relation.
The Parts relation also has a virtual column which is identified by the CID number 13. Additicnally, the JOIN relation is identlfied by a virtual column having a CID number 19. The columns of the Parts relation are identified by the CID, RID, AID and DID numbers in the same way the columns are identified for the Supplier relation . The CID number 2 0 corresponds to a JOIN
column which references the Supplier relation having RID
~ ~J u ~J v v ~
~ ~ 3~\
1 represented database is formed as shown in FIGS. 4, 5 and 6. This example has been designed to emphasize the steps of the BINARY REPRESENTATION routine for constructing the binary representation of a relational database in FIGS. 4, 5 and 6. In this example, the assumption is made that the reader understands instruction formats. For background in this area, please refer to Date, Introduction to Database sYstem Vol. 1, 4th Ed., 100-107 tl986).
Referring to FIG. 2, the system of FIG. 1 creates a relational database as shown in the tables 63, 65 and 67 (FIG. 2). These tables are constructed as inputs to the I system. Once the input process is complete, the system ! 15 calls the BINARY REPRESENTATION routine (FIG. 7). The system, during block 344, creates the domains referenced by the relational database. Input instructions, specify the following domains; suppliers identifiers 66, the parts identifier 68, person names 70, part names 72, city names 74, colors 76, and numbers 78 (FIG. 2). The input instructions in pseudo code look like:
(A) Create Domain Supplier Identifiers;
(B) Create Domain Parts Identifiers;
(C) Create Domain Person Names;
(D) Create Domain Part Names;
(E) Create Domain City;
(F) Create Domain Colors;
(G) Create Domain Numbers.
As the system reads each of the instructions above, an empty value set is created for each of the domains listed. The creation of the empty value set for the various domains specified in the above instructions are depicted in the results table of FIGS. 8A and 8B. Each row of t~e results table represents a new value set created as a result of one of the input commands.
When the command interpreter 28 of the RDMS 10 (PIG. lA) interprets the instruction (A) in block 344, RPU 22 creates an empty value set (364, FIG. 8A). Next, in block 346, the system determines whether there are any other domains to be identified by the system. In fact, there are other domains to be identified as specified by the instruction6 listed above; thus, block 344 i8 called. During block 344, the command interpreter 28 evaluates statement B and RPU 22 creates an empty value 6et to reference the domain of part identifiers (366, FIG. 8A). In block 346, the system determines that there are still more domains to be identifLed, and thus, block 344 is called. During block ; j 344, the system interprets the next instruction (C), which is for the "person names" domain. The RPU 22 (FIG. lA) generates an empty value set (368, FIG. 8A).
BBVP sets all the vectors to "0" to indicate the empty entity select vector (368, FIG. 8A). In block 346, the system determines that there are still more instructions for identifying domains. Proces~ing continues at block 344, and an empty value set for part names is generated (370, FIG. 8A). Block 346 determines that there are still more domain instructions to be processed and processing continues at block 344. During block 344, an empty value set corresponding to the city domain is created (372, FIG. 8~). During block 346, the system determines that there are still more commands for identifying domains, so processing returns to block 344.
During block 344, the system generates an empty value set corresponding to the domain ~colors~ (374, FIG. 8B).
Block 346 determines that there is one more instruction left (G) for creating a domain, and thus, block 344 is called. During block 344, the ~PU 22 creates an empty value set for the ~numbers" domain (376, FIG. 8B).
13386~
When proces6Lng is completed for all o~ the commands illustrated above, processing continues at block 348, during which a -table to be created in the relatlonal database is identified. Speclfically, the system is presented with the following command:
CREATE TAB~E SUPP~IER
(SUPP~iIERS IDi PERSON NAME; STATUS: CIT~) i which indicates that a table for 6upplier6 is to be identified; specifically, an identifier for the 10 suppliers table is stored in memory 18 of the RD~S 10.
Then, during block 350, the system generates an ~ identifier for the first column of the relation, which ; is also stored in memory 18. S~e--~fl~-~lly, it interprets the command above, and a suppliers ID
15 identifier is stored in memory. (Identifiers to be discussed in PART VI~. During block 352, the system ~Ptr~ n~c that the command requires that other columns be identif~ed for the relation, 80 processing continues at block 350. The 6ystem identifies the column for 20 "person names" and stores an identifier indicating such in the memory 18. Additionally, an empty entity select vector is created, which is associated with the "per60n names" domain. In block 352, the system determines that there are still other columns to be identified, and 25 thus, processing continues in block 350. In block 350, -- the system stores an identifier for the next column a6soclated with "status" into the memory 18. Also, an empty entity select ~ector associated with the "status"
domain is created. In block 352, the system determines 30 that there is still a column ~ -~n~n~ to be ldentified;
6pecifically, during block 350, the system Gtores an identifier for the column associated with "city".
During block 350, an empty entity select vector associated with the "city" domain is created. Durlng 35 block 352, the system de~ ni~s that no more columns .
1338~1 need be identified, and thus, processing continues at block 354, which determines whether there are any more tables to be identified for the relational database.
The following instructions can be 6peclfied:
CREATE TABLE PARTS
- - (PART ~D; PART NAME: COLOR; WEIGHT; CITY);
CREATE TABLE SHIP~ENT
(SUPPLIER; PART ID; QUANTITY);
which are for identifying the relatlons and their columns for both the PARTS and bn~ rNlb tables. For purposes of this example, however, we will assume that the syst2m processes blocks 348, 350, 35a and 354 to properly 6tore the identlf iers in the memory 18 f or both the PARTS and b~ NlS relations, and to construct empty ent~ty select vector6 for the columns listed in the instructions above.
Assuming that all the identifiers for the tables, domains and columns have been ~pecified, the RPU 22 during block 356, instructs the external device 12 to ; 2~ download the files associated with each table of the relational database via bus 30 to the RDMS 10 (FIG. lA).
pPrifl~ ~11y, during block 358, the byte values of each '~ column of the relation are sent to the RPU 22 2nd the ~ppropriate vectors of the row use sets are built vla BBVP 14, and the binary bits of the entity select vectors are set there as well. The following section i8 a detailed discussion on the construction of the row use vector6, entity select vectors, and value sets of the relational database in FIG. 2.
B. F le of .~ ildin~ a Bin~rY 17~ esented Rela~Lon ~ IGS. 9A, B and C are results tables dep~cting the formation of the binary representation of the suppliers relation of FIG. 4. Each row of the FIGS. 9A, B and C
depicts an additional row use vector associated with one column in the relation.
Referring to row 378 of the results table (FIG.
9A), the first value S1 (69, 80 FIG. 2) i5 inserted into the first ordinal position of the value set for suppliers identifiers. Additionally, the first binary bit of t~e entity 6elect vector 417 associated with the "suppliers identifier" domain is inserted and set to ''1ll to indicate that the unigue value S1 is referenced in the supplier6 ID column. The column for the suppliers i identifier re~[uires a new row use vector 381 to be generated and the first bit of the row use vector 381 is set to "1" to indlcate that S1 occupies the first row of the column. The process of inserting bits into a row use vector, creating row use vectors, and setting bits in an entity select vector is controlled by a routine called INSERT which will be fl;F~ ced more thoroughly in part V of this specification. This process i8 repeated for all the values S1 to 510 in the value set for supplier identifiers, a6 indicated in rows 380-386 of FIG. sA. A new row use vector is added for each value ,l I present in the subset, as indicated at 383-389, and a i ~ new bit i6 added to the entity select vector 417.
Additionally, five more bits, set to "o", are added to 2s the ordinal positions of the entity select vector --~ coLLe,,l,onding to the new values of the value set.
Referring to FIG. 4, the row use set 260 and the entity select vector 176 have now been generated by the BE~VP 14 via the BINA~Y F~EPF~ESENTATION routine (FIG. 7) .
The next column of the suppliers table is evaluated by the BBVP 14. The next column in the relation is the "person names" column ~82, FIG. 2), and the first value of the "person names" column is Smith (69, 82, FIG. z).
The value Smith i8 inserted to the first position of the 35 "person names" value set. Additionally, the entity select vector 419 associated with "person names" has a first binary bit inserted and set to "1" to indicate that the unique value for Smith is referenaed in the column for person names. A row use vector for Smith has not been created, and thus, a new row use vector 401 is generated (388, FIG. 9A). This process is repeated, a6 indicated in rows 390-396, for the next four names added to the "names" value set, with a new row use vector added for each name, as indicated at 403-409, and a "S~ bit added to the entity select vector 419.
The order of the row use vectors 409, 405, 407, 403 and 401 corresponding to Adams, Blake, Clark, ~ones and Smith is in the order of occurrence of the binary bits set "1" in the entity select vector 419. The values Baker, Fabel, Rahn, ROBS and Young are added to the "person names" value set. Specifically, the values are added in an order corresponding to the ordering (396, FIG. 9B). Binary bits set to "0" are also inserted into entity select vector 414 in the ordinal positions corresponding to the newly added unique values. The new binary bits are set to "0" to indicate the remaining values do not occupy the column. The row use vectors 409, 407, 405, 403 and 401 correspond to the row use set 262 of FIG. 4 and row use vector 262 corresponds to entity select vector 178 of FIG. 4, which is the same as -- the entity select vector 419.
The next column of thc input suppliers relatLon is for the "status'~ (84 , FIG. 2), and the fLrst value of the column i8 "20" (64, 84 FIG. 2). The value "20" is inserted lnto the first position of the value set for status numbers 421. Additionally, a first bit is inserted to entity select vector 421 and set to "1", indicating that uni~ue value "20" is referenced in the relational column. A new row use vector 411 is created, and the first binary bit of the row use vector is set to "1~ indicating the presence of value "20" in the first row of the column (398, FIG. 9B). The process 15 repeated, as lndicated in rows 400-406 of FIG. 9B, for each value in "status" set, resulting in row use vectors 4 11-4 15 .
Note that at row 404 the next value in the column (73A, 84, FIG. 2) is 20. The value 20 already exists in the value set for numbers, and thus, it need not be added again. Additionaliy, the entity select vector 421, for numb~rs, need not be set because a binary bit corr,?r,pQnrl;ng to unique value 20 has already been set to "1" in a previous step. This value also corresponds to an already existing row use vector 411. Binary bits set to "0" are added to row use vectors 413 and 415 to indicate that the values 10 and 3 0 do not occupy the fourth row of the relational column for status. A
binary bit set to "1" is added to the row usc vector 411 to indicate that the unigue value 20 is also in the fourth position of the relational column ~or status.
The last column of the relation for suppliers is the "cltles" column. The fir6t value of the "cities"
column is London (69, 89 TIG. 2). The value London is placed into the first position of the value set for "cities". Additionally, the entity select vector 423 as with the names of cities has a first binary bit inserted and set to "1" to indicate that the city London is referenced in the relational column for cities. A row use vector 417 is created and a binary bit set to "1" i~
added to the row use vector to indicate that London occupies the first row of the column for cities (408, TIG. 9C), "Paris" is added (row 410) in the same way.
Since "Pari6" and "London" occur twice, the row UBe vectors have bits added, as indicated, at rows 412 and 114 .
l33s~al The last value in the column for clties i8 Athens (75, 86, FIG. 2). The value Athens doe6 not exist in the cities value set, and 80 it is added. Specifically, Athens is placed intc the first position of the value 5 set corresponding to the lexical ordering of the city names. The entity select vectcr 423 for citics has a third binary bit in6erted and set to "1" in the first ordinal position of the entity select vector to lndicate that Athens is referenced in the column for cities. A
new row use vector 421 is added to the row use set.
Binary bits set to " 0 " are added to the row use vectors 417 and 419 to indicate that the unique values London and Paris do not occupy the fifth row of the column.
The new row use vector contains five bits and the fifth bit is set to "1" to indicate that the ~ifth row of the column contalns the value Athens. The values Cleveland, Fresno, ~arrisburg, Los Angeles, New York, Rome, and S~n Francisco are added 'co the value set from the input column. The values are arranged in the value set according to a lexical ordering. For each additional value, a corresponding binary bit set to "0" is inserted ~t a corresponding ordinal position of the entity select !~ ¦ vector. The binary bit6 set to "0" indicate that the values are not referenced by the cclumn (418, FIG. sC).
The row use vector6 421, 417 and 419, corr~onrl~n~ to Athens, London and Paris, are in the order of occurrence ccrresponding to the binary bits set in the entity select vector 423. Additionally, the row use vectors 421, 417 and 419 correspond to the row use set 266 of FIG. 4, and entity select vector 423 is associated with the entity select vectojr 182 (FIG. 4).
Referring to FIG. 7, when all of the relations of the relational database have been generated, processing ccntinues at declsion block 3 60 in which the RPU 22 determines whether there aFe more files associated with 1~38601 the PARTS and the b~L~ .LO tables of the relational database. Processing continues at blocks 356 and 358 until the binary representation for the PARTS and SHIP~qENTS tables are constructed (418, FIG. 9C). The binary representations for the PARTS and ~ rN'l~
table6 are generated in the same fa6hion as the SUPPLIERS table tFIGS. 9A, 9B and 9C) discussed above.
Processing returns during 362 to the calling routine of the BINARY REPRESENTATION routine (FIG. 7).
Ii lo v. o~eratiOns pt~rf( - on ~;n~rv Repre~c~ntatinnc of RelatiOnc FIGS. 10A, 10B, 10C, 10D, llA, llB, llC, llD, 12, 13, 14, 15A and 15B, 16, 17, 18, 19, 20, 21, 22A, 22B, 1~ 22C, 22D, 22E, 22F, 2nd 22G depict flowcharts of operations performed on relations in their binary represented form. Specifically, FIGS. 5A, B, C and D
are flow diagrams of the utility function called INSERT.
FIGS. llA, B, C and D are flow diagrams of the function 2p DELETE. FI~ is a flow diagram of the relation operation called SELECT. FIG. 16 is a flow diagram of the relatLonal operation called PROJECT, and FIGS. 22A, 22B, 22C, 22D, 22E, 22F, ~nd 22G are flow block diagrams of the operation JOIN. The functions INSE~T and DBLETB
are b~Ci~l ly for maintaining and manipulatlng data within the relations. The relational operations, SELECT, PROJECT and JOIN are for generating resultant relation6, and in the preferred t~mho~9ir-nt, in a binary represented form. Only three relational operations SELECT, PROJECT and JOIN are discussed in order to simplify this disclosure and to provide a basic understanding on how relational operations are preformed on binary representations of relations. For e~cample, the operations PRODUCT, UNION, INTERSECTION, DIFFERENCE
35 and DIVIDE, which are tdescribed in Date, "An ., Introduction To Database Systems, " Vol. 1. (4th ed., 1986), are not discussed in this specification, due to their complexity. However, one skilled in the art will readily understand theLr operation on binary representations of relations af ter reading the section on SEIiECT, PROJECT and JOIN Thi6 part of the specification is broken up into ive subsections, each dealing with a separate operation as ~9~cc~sed above.
For purposes of this example, it is assumed that one or more relations have been loaded into the RD~IS 10.
Here the relations are encoded by the RPU 22 via BBVP 14 into the binary representations o~ the relations. Then the binary represented relations are either sent directly to memory 18 for storage, or they are sent to the BVE 16 where the bit strings of the binary representations are encoded into compressed impulse formats as discussed in Glaser et al. The resulting compressed bit strings are then stored to memory 18.
They stay in memory 18 until a request to perform a relational operation (i.e., INSERT, DELETE, SEI,ECT, PROJECT, JOIN) iæ initiated and is interpreted at the command interpreter 28. In their ~n~ 38ed ~orm, the binary representations o the relations of the rela-tional data base stored in memory 18 are shown in FIGS.
4, 5 and 6. It is also assumed that the bit vectors of the binary represented relation could also be ln compressed impulse formats. However, for ease of understanding, the bit vectors are processed in the uncompressed form. When the RPU 22 is ready for processing, the binary represented relation~s~ are brought to the ~PU 22 via buses 48 and 31. Here the specified relational operation is performed on the relation ~s) . I any Boolean operations need be performed by the relational operation, then the steps for processing compressed bit strings as set forth in Glaser et al. are preformed by the BLU 24. Once processing is completed at the RPU 22, the RPU 22 outputs a new binary represented relation. The new output relation i5 sent to memory buses 31 and 48, where the resultant relation resides until it is sent back to RPU 22 for further processing.
, A. I~E~
, 10 INSERT is a function which adds one value at a time to a relation. The neces6ity for performing an INSERT
operation occurs in three categories of cases. First, a unique value needs to be added to a domain or value set, and it needs to be added to a column of a relation.
3econd, a unique value already exists in the value set, and it needs to be added into a column of a relation.
Third, a value already exists in a column and lt needs to be added again to the column. Multiple values may be added to a value set or to a column; however, the INSERT
z0 subroutines must be processed more than once corresponding to each time a value is added. In all of the situations above, an assumption is made that the binary representation of the one or more relations to perform the INSERT are in RPU 22 ready for processing.
Additionally, the function INSERT can be used to add values to more than one column of a relation. The INSERT function is separately performed each time a value is added to the relation.
The flow diagrams in FIGS. 10A, B, C and D depict routines for processing any one of the three situations discussed above. SpecifiCally, the flow diagram in FIG.
10A is a routine for adding a unlque value to a value set, and adding the unique value to a column of a relation; the routine is called INSERT. FIG. 10B is a flow diagram of a subroutine for adding a unique value to the value set, and this routine is called INSERT
VALUE INTO VALUE SET. FIG. 10C i5 a routine for updating an entity select vector to reflect the addition o~ a unique value into a particular subset; this routine is called UPDATB SUBSET. FIG. 10D is a routine for updating a column of a relation with a new occurrence of a value; this routine is called ADD VALUE TO COL~MN.
The routine in FIG. 10D would by itself be used to insert a value into a column of a relation when the value already existed in a column. The flow diagrams of FIG. 10C and 10D are combined for inserting a value already existing in the value set into a subset of values of the value set and adding the value to a column of a relation.
Referring to FIG. loA, a more detailed description of the INSERT routine is now discussed. At block 422, the syst~m calls the INSERT VALUE INTO VALUE SET routine (FIG. 10B) to add a unique value to the value set. Once the new unique value has been added to the value set, block 424 is called to call the subroutine UPDATE SUBSET
(FIG. lOC~. This routine updates an entity select Vector corresponding to the value set 60 that the unique value is represented in an associated subset. During block 426, the system performs the ADD VALUE TO COLUMN
subroutine (FIG. 10D) to add the value lnto a specified column of the relation. Processing is completed: a value is added to the value set and to a column and the system returns at block 428 to the calling program.
Referring to FI~. loB, a more detailed description 3 0 of the INSERT VALUE l:NTO VALUE SET routine is now discussed. In 1~1Ock 430, the RPU 22 via the BBVP 14 determines the ordinal~ position in the value set at which the new, unique value is to be inserted.
E~sentially, the value set is stored ln a structure which i6 traversed to find the new value. The structure contains all of the unique values presently stored in the value set, and is built as 6uch to minimlze access time for finding values. The system locates a node of the structure, which corresponds to the ordinal position at which the new, unique value should be placed. During block 432, the system determines whether the value already exists. If the value already exists, then during block 434, the system returns to the calling program. Assuming that the value ls not in the struc-ture, then the new value is added to the value set by adding a node assigning an already existing node in the structur~ to incorporate the unique value. During block 438, processing return6 to the calling program.
Referring to FIG. lOC, a more detalled dlscusslon of the UPDATE SuBSET routine is now discussed. In block 440, the P~PU 22 via the sBVP 14 determines the ordinal position of the entity select vector which corresponds to the ordinal position of the unigue value in the value set. In decision block 441, the RPU 22 d.ot~rmlnf-c whether the unique value has been added to the value set. If the unique value has been added, block 442 is called. During block 442, the system inserts a blt to the entity 6elect vector at the ordinal position corresponding to the new unique value. Processlng continues at block 444. During block 444, the new bit added to the entity select vector ls set to "1" to indicate the uni~ue value in the subset. Returning to block 441, if the unique value already exlsts, block 443 is called. During block 443, the RPU 22 via the BBVP 14 ~lP~c~rm~n~c whether the bit in the entlty select vector has been set to "1". The blnary blt set to "1"
lndlcates that the colllmn contains this unique value.
Ii~ the bit has been set to "1", the processing returns to the calling program during block 445. However, if the bit is not set, then processlng continues in block .
444. In block 444, the new bit added to the select vector i5 6et to "1" to indicate that the unique value is in the subset. During block 446, processing returns to the calling program.
Referring to FIG. lOD, a more detailed r~l cr~lt q~ nn of the ADD VALUE To COLU~N routine is now r1;ccuc,:",9.
During b' ock 448, the RPU 22 via BBVP 14 counts the number of binary bits set to "1" in the entity select vector up to and including the bit at the ordinal position corresponding to the unlque value inserted.
This number i8 called "count". The "count" of binary bits set to " 1 " corresponds to the location of the row use vector in the row use set. During block 450, the ;~ ¦ system inserts the new row use vector at the position in the row use set corresponding to "count. " During block 452, the system appends a binary bit set to "0" to all of the row use vectors of the row use ~et. In block 454, the system sets the last bit of the new or selected row use vector to "1" to indicate that the new value i5 added to the last row of the column. During block 456, processing returns to the calling program. Returning to block 449, if a new row use vector is not required to be built by the RPU 22, then processing contLnues at blocks 452. During block 452, bits ~et to "0" are appendcd to the existing row use vectors and during block 454, the last bit of the ne~ row use vector is set to "1".
1. Detailed RY;tr~le fgr t~e INC~ t~nct~ nn Referring to the suppliers relation depicted in FIG. 4, a detailed example for adding a name to the value set of names (162, FIG. 4) and to a column (170, FIG . 4 ) corresponding to names in the relation is now discussed. In this example, the requirement exists to insert the name "zeus'~ to the value set ~epicted at 162 (FIG. 4), and to the end of the column at 170 (FIG. 4).
_ , ThLs exampl e has been constructed to illustrate all of the routines for inserting a value to a value set and to a blnary representation of a relation. FIG. 12 is a detailed results table depicting the various steps performed by the INSERT routine (FIG. 10A). The results table of FIG. lZ is broken up into three columns. From left to right, the first column depicts the existing value set, the second column i8 the entity select vector characterizing a subset of the value set, and the third column is the row use set representing the names column of the relation. Each column of the results table (FIG. 12) depict6 a change in either the value set, entity select vector, or row use set as the INSERT
routlne (FI~. 10A) is performed. The subroutines depicted in FIGS. 10A, B, C and D are transparent to the application. Only the following type of instruction is required:
INSERT
Into (Relation-Suppliers) Value (Zeus) This instruction is interpreted by the command interpreter 28 (FIG. lA) to add a new and unique value Zeus to the value set for names and to add the word Zeus to the binary representation of the relation for suppliers, currently stored in memory 18. Pursuant to the instruction, the RPU 22 brings the binary representation of the suppliers relation from memory 18 to the RPU 22. Additionally, the value set is brought from the external device 12 via bus 30 to the RPU 22.
Because the unique value Zeus is not part of the value set names, the RPU 22 calls the routine (FIG. 10A) to insert the value Zeus 'into the column of names in the relation and also to add the unique value Zeus to the value set of names, referring to the INSERT routine (FIG. 10A). During block 422, the RPU 22 calls the .
subroutine INSERT VALUE INTO VALUE SET (FIG. 10B).
Referring to FIG. 10B, at block 430 of the INSERT VALUE
INTO VALUE SET routine, the system determLnes the ordinal position in which the value Zeus is to be inserted. During block 432, the RPU 22 determines that -- Zeus does not exist in the value set, and in block 436, the RPU 22 adds Zeus to the proper node in the structure for the value 494 (FIG. 12). If a node did not exist in the structure, then a new node would be added and set to 10 Zeus. At block 438, the RPU 22 returns to block 424 of the INSERT routine (FIG. 10A).
During block 424 (FIG. 10A), the RPU 22 calls the subroutine UPDATE SUBSET (FIG. 10C). Referring to FIG.
10C, at block 440, the RPU 22 via the BBVP 14 determines 15 the ordinal position of the entity select vector which coLL~a~ ds tc the unique value Zeus in the value set.
The ordinal position is the last position of the entity select vector. In block 441, RPU 22 determines that the value Zeus had been previously added to the value set o~
; 20 n~mes, and processing continues at block 442. During block 442, the system inserts a bit to the entity select vector (494, FIG. 12) to indicate that a value exists in the last ordinal position of the value set. During block 444, the ~PU 22 sets the new bit to "1" in order 25 to indicate that the value Zeus is added to the subset (496, FIG. 12) . Then, in block 446, the RPU 22 returns to block 426 o~ the INSERT routine (FIG. 10A) .
Referring to block 15A, durinq block 426, the RPU
22 calls ADD VALUE TO COLUMN routine (FIG. 10D). During 30 block 448 of the ADD VALUE TO COLU~N (FIG. 10D), the RPU
22 determines the "count" of binary bits which are set to "1", up to and including the bit which corresponds to ordinal position corr~cr~-n~ 1 n~ to the new value Zeus.
"Count" is e~aual to 6Lx because there are six binary 35 bits set to "1" in the entity select vector; the bit corresponding to the value Zeus is the sixth binary bit 6et to "1". With the "count", the RPU 22 determines if there presently resides a row use vector which corresponds to the value Zeus. A row use vector doe6 not exist at position 6ix as detcrmined previou~ly at --- block 440 (FIG. l~C), and during block 450, the RPU 22 inserts a new row use vector at the sixth po6itlon of the row use set. During thi6 6tep, the RPU 22 counts, from left to right, six row use vector position5 in the row use set. The RPU 22 adds a new row use vector having binary bits set to "0". During block 452, the RPU 22 ~ppends binary bits set to "0" to the end of all of the row use vectors in the row use set (500, FIG.
12). Then, during block 454, the last bit of the newest and sixth position row use vector i6 6et to "1" to indicate that Zeus now occupie6 thc last row of the column for names in the relation for suppliers t502, FIG . 12 ) .
2 0 B . i2E~
DELETE is an operation which remove6 one value at a time from a binary representation of a relation and possibly from a value set. The DELETE operation occurs in three categories of case6. First, a unique value need not be removed from a subset; however, the unique value needs to be removed from the column. Second, a unique value exi6t6 in a column, and it need6 to be deleted from the relation and from a subset; however, it does not need to be removed from the value set. Thlrd, a unique value already exists in a relation and it needs to be deleted from the column from a corresponding subset and from a value set. Multiple values may be removed from a value set or a column; however, the DELETE routines must be processed more than once according to the number of times a value or values need -52- 133~6~
to be deleted. As In the case of INSERT, the binary representation of the relation and the value set are in RPU 22, ready for proccssing.
The flow diagrams in FIGS. llA, B, C and D depict routines for processing any one of the three situations discussed above. Specifically, the flow diagram ln FIG.
llA is a routine for removing a unique value from a value set and for removing the unique valuQ from a column of a relation; this routine i8 called DELETE.
FIG. llB is a flow diagram of a routine for removing a value only from a column. This routine ls called DELETE
VALUE FROM COLU~N. This routine by itself could be uaed to DELETE a value, one or more times, from a particular column. The flow diagram of FIG. llC is a routine for updating an entity select vector to reflect the removal of a unique value from a particular subset; this routine is called DELETE VALUE FROM SUBSET. The flow diagram of FIG. llD is a routine for removing a unique value from a value set: this routine is called DELETE VALUE FRO~
VALUE SET. The flow diagrams in FIGS. llB and llC can be used together to remove a value already existing in a column, and to remove the value from a subset of a value set without removing the value from the value set.
Referring to FIG. llA, a more detailed description of the DELETE routine is now discussed. During block 460, the RPU 22 calls the DELETE VALUE FROM COLUMN
routine (FIG. llB) to DELETE a value from a column of a relation. Once all of the occurrences associated with a unique value have been removed from a column, the row use vector asffociated with the unique value contains binary bits set to "o". If not all of the bits c~ the row usc vector are set to "0", then during block 461, the RPU 22 determines that processing ls complete ~nd returns processing to the calling routine at block 461.
On the other hand, if all the bits are set to "0", then block 461 determines that processlng is incomplete and processing continues at block 462. Becau6e the unique value is no longer re~erenced in the column of the relation, it can be removed ~rom the subset (or entity select vector) which depicts the column. Sper~f1c~lly, the entity select vector associated with the unique value of the column can be updated via a call to the DELETE VALUE FROM SUBSET (FIG. llC) during block 462.
Once the entity select vector has been updated, the RPU
22 cAlls block 462 (a) to determine whether there is "l"
bit set in the ordinal position corresponding to the value being deleted in any of the entity select vectors for the relational database. If there are, then the value is presently being used in other relations and, proces6ing returns to the calling program at block 462(b). i~owever, if there are no "1" bits set at the ordinal position corresponding to the unique value being deleted, then processing continues at block 464. During block 464, the DELETE VALUE FROM VALUE SET (FIG. ~D) is called to remove the unique value from the value set.
The value is removed from the value set only when none of the other relations in the relational database reference the unique value. When processing is completed, the system returns at block 466 to the calling program. The DELETE routine (FIG. llA) can be called successively to DELETE one or more value of a row of one relation.
Referring to FIG. llB, a more detailed de3cription o~ the DELETE VALUE FROM COLUMN routine is now discussed. During block 470, the RPU 22 determines which row use vector of the row use set ls aGsociated with the particular value to be removed from one row of the column. This operatLon can be conductcd by performing successive Boolean AND operations on the row use vectors and a binary bit only having a Gingle bit 13386~1 set to one at the ordinal position corresponding to the row position of the column. The bit is changed from "1"
to "0", in the appropriate row, indicating the absence of the value in the particular row of the column.
During block 472, the RPU 22 via the BBVP 14 determines whether all of the hits of the row use vector have been set to "0". If not all of the bit6 of the row use vector are set to "0", then a "DONE" signai is returned to the calling routine at block 474. If all of the lo binary bits of the row use vector a~e set to "0", then processing returns at block 478 to the calling routine.
Referring to FIG. llC, a more detailed discussion of the DELETE VALUE FROM SUBSET routine is now discu6sed. During block 480, the RPU via BBVP 14 ~l~t~ n~c the ordinal position in the corresponding entity select vector associated with the unique value whose row use vector has been deleted from the row use set. During block 482, the ~PU 22 via the BBVP 14 sets the binary bit in the entity select vector, associated with the unique value to "~" to indicate the absence of the uni~ue value ln the subset. In block 485, processing returns to the calling program.
Referring to FIG. llD, a more detailed discus~ion of the DELETE VALUE FROr~ VALUE SET routine is now discussed. During block 488, the system removes the value from the value set by removing The value in the value set structure. Additionally, the binary bit corresponding to the deleted value in all entity select vectors is also removed to account for the reduced size of the value set.
1. Det~ ~ l ed ~ E; ~ le for thf~ nT~`T.T"T'T~' O~era~ i on Referring to FI~. 13, the na~es of the suppliers of the suppliers relation depicted in FIG. 4 are shown 3~ - along with the row use set corresponding to suppliers name column in the relation. Essentially, thi6 example atarts where the "Insert Operation Example, " left off .
Zeus had been added to the names value set, and Zeus had been added to the column of names in the auppliers relation. In this example, it ls required to DELETE the name Zeus from the value set and to DELETE the name Zeua from the column (504, FIG. 13). FIG. 13 i8 a detailed results table depicting the various tas3cs performed by the DELETE routine ~FIG. llA). This example has been 10 chosen to illustrate all of the routines for deleting a value from a value set and from a binary representation of a relation.
The results table of FIG. 13 is broken up into three parts. From left to right, the first column 15 depicts the existing value set, including the uni~ue value Zeus; the second column is the entity select vector representing the subset of the value set for names corresponding to column for names, including the , , name Zeus; and the third column is a row use set which 20 represents the "names" column of the suppliers relation.
Each row of the results table (FIG. 13) depicts a change ln either the value set, entlty select vector, or row use set as the DELETE routine (FIG. llA) is performed.
The routines depicted in FIGS. llA, B, C and D are all 25 transparent to the application. The application provides only the following instruction to the system:
DE LETE
From (Relation-Suppliers) Where Name = Zeus 3 0 This instruction is interpreted by the command interpreter 28 (FIG. lA~ to DELETE a value from the column of names in the 'suppliers relation. 3ecause the value zeus only appears once in the column for names in the relation and in the database, the system will call the routine DELETE (FIG. llA) to remove the value Zeus 1338 ~01 from the column of names and also from the value set of names. SpecLfically/ during block 460 (FIG. llA), the RPU 22 call& the subroutine DELETE VALUE FROM COLUMN
(FIG. llB) to remove the value Zeus from the name6 column of the relation. Referring to FIG. llB, during block 470, the RPU 22 via the BBVP 14 changes the binary bit set to "1" to "0" in the row use vector to indicate that the value Zeus is no longer in the column for names (506, FIG. 13). In block 472, the system determines whether all of the bits of the row use vector have been set to "0". Zeus only appeared in the column once, ahd - thus, by changing the one binary bit to "0", Zeus is no longer represented in the column; all of the binary bits of the row use vector are set to "0". Thu6, in block 478, processing returns to block 462 of the DELET~
routine (FIG. llA) .
In block 4 62 (FIG . llB), the DELETE VALUE FRON
SUBSET routine (FIG. llC) is called to remove the unique value Zeus from the 6ubset depicted by the entity select vector. Specifically, during block 463, the row use vector associated with the unique value Zeus is removed from the row use set. Processing continues at block 480, during which the ordinal position of the binary bit associated with the value Zeus in the entity select vector is determined. Speclfically, the RPU 22 vi~ the BBVP14, during block 480 determines that the ordinal position of the value Zeus in the entity select vector is the tenth position. Then, in block 482, the system sets the tenth binary bit from "1" to "0" to indicate the absence of the value Zeus from the subset (510, FIG.
13). During block 487, processinq returns to the DELETE
routine (FIG. llA) at block 462 (a) . In block 462 (a), the RPU 22 determines whether there are any "1" bits set at the ordinal positions corresponding to Zeus in any entity select vector. The entity select vector6 .
corresponding to the value set of names for the entire relational databaGe are evaluated to determine if the unique value is referenced in any other 3ubset o~ the relational database . Because the other tables ( i . e ., parts (FIG. 5) and shipments (FIG. 6) ) do not reference the "names" value set, there is only one entity select vector and the bit corresponding to the ordinal position of Zeus has been set to "0". Processlng continues in block 464 which calls the DELETE VALUE FROI~ VALUE 5ET
lb routine (FIG. llD~ .
Referring to FIG. llD, during block 488, the RPU 22 removes Zeus ~rom the valuc 3et and the coLL~:~yundlng bit in the entity select vector is also removed (512, FIG. 13). Processing continues at block 490, during 15 - ~which the RPU 22 returns to the DELETE routine (FIG.
llA) at block 466, where processing returns to the cal l ing program .
c. ~.ES~ -To this point, we have discussed the operations INSERT and DEliETE, which are basically functions for updating binary representations of relations. The next operations, are query functions for finding relevant information about a relation or groups of relations.
The operations include SELECT, ~OIN and PROJECT, and are principally used for de~orm;nin~ a re3ultant binary relation. The resultant binary relations c~n then be converted into their byte value form for users to under-stand. This section concentrate6 on the operation SELECT which generateG a resultant binary relation for depicting which row or rows of a relation contain selected values Stated differently, given one or more value sets and a relation, SELECT determines the rows of the relation which correspond to a particular value or values, in one or more columns of a relation and the 133g601 result is depicted in a binary representation: a binary bit vector called a "6elect vector". A typical example of a SELECT operation might be for rlDt~ning which suppliers (i.e., Smith, Jones, Blake, Clark and Adams) are located in Athens (63, FIG. 2). A more detailed discussion on this query will be presented shortly.
Re~erring to FIG. 14, a ~low diagram o~ the SELECT
operation is depicted. During block 516, the RPU 22 via the BBVP 14 ( it i6 assumed a~ter thi6 point that any time proce6sing needs to be done on a bit vector that the RPU calls the BBVP 14 for proces6ing) determine6 the ordinal positions of one or more selected unique values, which are in one column of the relation, in a particular value set. In block 518, the RPU 22 determine6 whether the selected unique values are found in the value set.
If the selected unique values are not found in the value set, then proces6ing returns to the calling program during block 520. Assuming that the 6elected unique values are ~ound in the value 6et, then in block 522 a binary bit vector di6playing the ordinal positions of the selected values within the value set is generated.
Specifically, the binary bit vector contains blt6 6et to "1" at the ordinal positions corresponding to those of the 6elected values and the remaining bits OI the bit vector are set to '!0". In block 524, the bit vector generated in blook 522 is "ANDed" with the entity select vector, corresponding to the column in which the values re6ide, to determine whether the 6elected unique value6 are reLerenced in the column. In blocl~ 526, the RPU 22 determines whether the resultant bit vector has all bit6 o~ the resultant vector set to "0"; i.e., i~ the corresponding set is empty. I~ re6ultant bit vector i6 all ~eros, then the selected unique values are not in the column, and thu6, no 6elect vector can be gener~ted and processlng returns at block 528 to the calling program .
Assuming that the resultant bit vector, ~rom the AND operation step, is not empty (e.g., 60me bit6 are 6et to "1" in the bit vector), proce56ing continues at block 530 in which the RPU 22 determine6 count; the number of binary bit6, in the entity select vector, which are set to "1" up to and including the ordinal position of each selected value. For each unique valuQ, the count is then used to determine which row use vectors of the row use set correspond to the selected unigue values. During block 532, the row use vector6, corresponding to "count", are retrieved. Processing continues at block 536, in which the RPU 22 ~otormi~o~a whether one or more unique values were selected from the particular column over which this part of SE~ECT is processed. Assuming that only one unique value from a particular column was selected by the application, the RPU 22 returns the one row use vector, corresponding to the unique value, retrieved at block 532. Processing returns to the calling program at block 542. ~owever, if more than one unique value from a particular column was selected by the applicatLon ~or this operation, then during block 537 the Boolean OF~ operation i8 performed on the selected ro~ use vectors to determine a resultant relation. The Boolean OR operation is performed by the BLU 24 (FIG. lA). The steps for performing Boolean operations for compressed bit string is fully discussed in Glaser et al, which was referenced earlier.
During block 538, F~PU 22 determines whether the values selected are ~rom only one column o~ the relation. If all of the selected values are from one column, then the system returns the resultant select bit vector to the calling routine at block 542. If, during 35 block 538, it is determined that other values are 13386~1 selected from other columns of the relation, then processlng continues at block 53g. During block 539, RPU 22 determines whether any more row use vector for values need to be selected from other columns. If more values need to be processed, then processing continues at blocks 516, 518, 522, 524, 526, 530, 532, 536, 537, 538 and 539 until all of the row use vectors are processed and the entity select vector for each resultant column i8 generated. When there are no more cclumns from which to select values, then during block 540, a Boolean operation specified by the SEI,ECT
instruction is performed on the entity select vectors.
For example, the SELECT instruction might require the determination o~ whether one value, in one column of the relation, is associated with another value in a differen~ column o the relation. The select vectors for the two values would be ANDed together to determine whether both values reside in the same row of the relation. It should be noted that any of the Boolean operations (i.e. OR, XOR, etc.) could be used to calculate the desired result~i. For purposcs of discussion, however, the operation is assumed to be AND.
(For a more detailed discussion, refer to detailed ex-amples. ) In summary, the flow diagram of FIG. 14 depicts the SELECT operation for returning a resultant entity select vector (e.g. binary bit vector) for depicting which rows o a relation contain one or more selected values.
Once the resultant select vector has been determined, the rows of the relation corresponding to the selected values can be displayed to the user. By having the row positions of each column of the relation, which contain the selected values, the RPU 22 determines which row use vector of the corresponding row use set contains a binary bit set to "1" in the ordinal position -corr~cpon~ i n~ to the selected row position. An indexing function is performed, which determines the position of the selected row use vector in the corr~rnn~1 ~ ng row use set. Specifically, RPU 22 counts the number of row use vectors in the row use 6et up to and including the selected row use vector. This number corresponds to the ordinal position of the binary bit set to "l" in the entity select vector, which references the unique value of the relation. The unique value i8 retrieved from the vaIue set. For each column of the relation, the value in the selected row is determined and displayed for the user .
In another embodiment of the SE~ECT operation, a step is added for selecting the unique value according to whether the selected value is greater than, less than, equal tc, not equal tc, equal to or greater than, or equal to or less than a prespecified value selected by the application program or user.
In another ~mhn~ nt to be discu6sed in PART VI, the actual values in the selected rows of the relation are determined and displayed to the user via a mapping function through a vector called thQ "entity use vector~'. For each column o~ the relation, an entity use vector is maintalned for identifying a value in the value set which corresponds to the value at a particular row of the column.
l. Pe~ 11 ed r le of a Two-Csl sEr~r~çT
for Two V;l~uec.
This example relles principally on the supplier6 relatlon of FIG. 2. Specifically, it is assumed for purposes of this exampIe that the suppliers relation (63, FIG. 2) is in its binary represented form as depicted in FIG. 4 and this representa~ion of the relatLon resides in memory 18 to be prooessed by RPU 22.
The SELECT operation in this example involves two columns, namely, Suppliers ID column (168, FIG. 4) and the City column (174, FIG. 4). Speci~ically, the query entered by the user is to determine all information by a supplier ~rhose number is S5 and whose location i6 Athens. The instruction for this query is:
SELECT ID#, CITY
FROM S
WHBRE (ID# = S5) AND
(CITY = ATHENS) This query is interpreted by the command interpreter 28 (FIG. lA) and RPU 22 retrieves the row use vectors associated with the unique values S5 ~nd Athens, and then performs a Boolean AND operation to determine the resultant relation. FIG. 15 is a detailed results table depicting the various steps performed by the SELECT
routine (FIG. 14). The results table of FIGS. 15A and B
are broken up into six columns. From left to right, the first column depicts the value set associated with the ~elected unique value, the second column iB a bit vector corresponding to the ordinal position of the selected unique value within the value set, the third column is the entity select vector associated with the column in which the value re3ides, the ~ourth column i8 a resultant bit vector determined by ANDing the ordinal position bit vector with the entity select vector, the fifth column is the row use set associated with the column in which selected unique value resides, and the last column is the select vector determined by 3 o performing a Boolean OR operation on all of the row use vectors corresponding to the selected unique vAlues from one column.
The query in this example is a simplified SELECT
query to minimize the explanation and steps reguired to perform the operation. However, generally, the query -63- 13386~1 will be over several value columns of the relation (see the next example for select). This example could easily be expanded to determine which of the suppliers IDs (i.e., S1 through 55~ is located in Athens. The row use vector corresponding to Athens indicates the rows of the relation which contain the supplier IDs associated with Athens. For simplicity, in this example we are cnnr~rn~l with only one of the supplier IDs, namely S5, and whether it is associated with Athens.
Referring now to FIGS. 14, 15A and B, a detailed example of the query for de~-~rm~n~n~ whether supplier ID
S5 i8 located in Athens is now discussed. Specifically, during block 516, the system determines the ordinal position of S5 in the value set for suppliers IDs.
Essentially, the system traverses a structure associated with the suppliers ID value set and deter~mines the 6pecific node in the structure where the S5 value resides. During block 518, the system determines whether or not the value S5 has been found in the value set. The value S5 is located in the structure, and thus, it is within the value set for the suppliers IDs.
In block 522, the system creates a binary bit vector for representing the ordinal position within ~che value set associated with value S5 (544, FIG. 15A). As shown at row 544 of FIG. 15A, the fifth ordinal position in the new binary bit vector is set to "1" corr~ p~n~ln~ to the ordinal position of the value S5 in the value set.
During blook 524, the new binary bit vector i~ ANDed with the entity select vector associated with the suppliers identifiers in the suppliers relation (546, FIG. 15A). In block S26, the resultant bit vector from the AND operation is evaluated and it is determined that bit vector is not an empty set. The resultant bit string contains a oinary bit set to 'tl" (548, FIG. 15A).
In other words, the unitaue value S5 is located in the .
133860~
subset re~erenced by the entity select vector of the suppliers column, and thus, it is in the relation of suppliers. In block 530, a count is performed on the entity select vector to determine the number of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique value S5. The RPU 22 determines that there are five binary bits set to "1", and thus, the unique value S5 is associated with fifth row use vector of the row use set associated with suppliers IDs t260, FIG. 4). In block 532, the row use vector associated with unigue value S5 is retrieved from the row use set (260, FIG. 4). The row use vector for S5 is a binary bit vector containing four binary bits set to "0" and a fifth binary bit set to ~ , indicating that the value S5 resides in the fifth row for the column for the suppliers IDs (550, FIG. 15A). During block 536, RPU 22 determines that there is only one value selected from the column for suppliers IDs. Processing continues in block 538, in which the RPU 22 determines that more than one column is involved in this 6elect operation, i . e., the supplier ID
and city columns. In block 539, the RPU 22 determines that the value Athens has also been selected by the user in this query, and thus, processing returns to block 516. In block 516, the RPU 22 determines the ordinal position of Athens in the value set for cities.
Essentially, the system traverse6 the value 6et for cities, and during block 518, the RPU 22 determines that Athens is in the value set for cities. During block 3 0 522, a binary bit vector ls constructed to indicate which ordinal po6ition of the value set for cities contains the city Athens ( 552, FIG . 15A) . Specifically, the system creates the binary bit vector, which shows a binary bit set to "1" in the first ordinal position 35 (552, FIG. 15A). During block 524, the 3001ean AND
-1~38601 operation is performed between the new binary bit vector and the entity select vector associated with cities for the suppliers relation (556, FIG. 15A). In block 526, the RPU 22 determines that the resultant vector does not contain all "o's" (558, FIG. 15B), and thus, the value Athens is determined to be in the suppliers relation.
If the value Athens was not located in the suppliers relation, then the RPU 22 would return at block 528 to alert the user that the selected value for Athens, although found in the value set, is not within the suppliers relation. Prooe3sing continues at block 530, in which the RPU 22 doe6 a count of the binary bits set to "l" up to and including the ordinal position of the , binary bit associated with the unique number value Athens. RPU 22 determines that Athens is associated with the first binary bit set to "l" in the entity select vector, and thus, the uniquc value Athens corresponds to the f irst row use vector in the row use set (266, FIG. 4). In block 532, the RPU 22 retrieves the row use vector (560, FIG. 15B) associated with Athens. The row use vector for Athens contains four binary bits set to "o", followed by a binary bit set to "1", indicating that the value Athens occupies the fifth row of the column associated with cities. During block 536, the RPU 22 determines that there are no more unique values selected in the column of cities, and thus, processing continues at block 538, in which the RPU 22 determines that more than one column, i.e., suppliers 2nd city, was selected by the user. In block 539, the RPU 22 determines that no more value need to be selected. In block 540 the row use vectors, associated with S5 and Athens, ar~e ANDed together to generate a resultant select vector, which represents the rows of the relation which satisfy the query (562, FIG. 15B).
35 As shown, at 562 (FrG. 15B) the resultant binary bit 13386~i vector contains four binary bits set to "O" followed by a binary bit set to "l", indicatin~ that the fifth row of the relation contains the supplier ID S5 and the city Athens. The actual row of the rQlation can be reconstructed and displayed to the user in one or two ways. First, for each column, the system can use the entity use vectors and associated row use vectors to map the row number determined by the select vector to the ordinal position in each value set. A more detailed discussion on the entity use vector approach will be discus6ed in Part VI. Second, the RPU 22 could trace back from the row use vector6 to the entity selcct vectors and then back to the value set to (1.otl~rm~n~ the unique values in the fith row of relation.
2. ~ets1led EYAm~le of Twg Colllmn STTR~'T for lU -l ti~le VAlues -- ~
As ln the previous example, this example relies principally on the suppliers relation of FIG. 2.
Specifically, it is assumed for the purposes of this example that the suppliers relation (63, FIG. 2) is in its binary repre6ented orm, as depicted in FIG. 4 and that this binary representation of the relation resides in memory, to be processed by RPU 22. Again, the SE1ECT
operation in this example involves two columns, namely, suppliers names column (170, FIG. 4) and the city column (179, FIG. 4)- i Specifically, the query entered by the user is to determine whether suppliers Smith or Blake are located in 10ndon or Paris. The standard instruction for the query i~:
SELECT s~, CITY
FROM S
WHERE (SNAME = ' SMITH ' ) OR (SNAME = ' BI.AKE ' AND
(CITY = '10NDON') OR (CITY = 'PARIS'~
This query is interpreted by the command interpreter (FIG. lA) and RPU 22 retrieves the row use vectors assoclated with the unique values Smith or Blake which are, in turn, ANDed with the row use vectors for the unique values London or Paris. FIG. 16 is a detailed results table depicting the various steps performed by the SELECT routine ~FIG. 14). The results table of FIGS. 16A and B are broken up into seven columns. From left to right, the first column depicts the value set associated with the unique values, the second column is a bit vector corresponding to the ordinal positions of the selected unique values within the value set, the third column is the entity select vector associated with the column in which the selected values reside, the 1~ fourth column is a resultant bit vector determined by ANDing the ordinal position bit vector with the entity select vector, the fifth column is the row use set associated with the column in which the 3elected values reside, the sixth column is the select vector determined by performing a Boolean OR operation on all the row use vectors corresponding to selected unique values from one column, and the last column is the resultant vector detcrmined by perf orming a Boolean AND operation on the select vectors determined for the selected values from more than one column.
Referring now to FIGS. 14, 16A and B, a detailed example of the query for det~rm~n~r~ whether the suppliers Smith or Blake are located in London or Paris is now discusæed. Specifically, during block 516, the RPU 22 determines the ordinal positions of the suppliers names Smith and Blake in the value set for suppliers names. Essentially, the system traverses a structure associated with the supplier name value set and det~rmi nPq the speci~ic nodes in the structure where the 81 ith And ~l~k~ v~lu~- r~sid~. Durirlg bl~ok 518, the "
RPU 22 determines whe~her or not the values Smith and Blake have been found in the value set. The valueci Smith and Blake are located in the structure; thus, they are within the value set for the suppliers names. In block 522 the RPU 22 creates a binary bit vector for representing the ordinal positions within the value set associated with the value6. Blake and Smith are as shown in row 565 of FIG. 16A, the third and ninth ordinal positions. The new binary bits are set to "1"
cuLLc:~onding to the ordinal positions o~ the values in the value set for suppliers names. During block 524, the new binary bit vector is ANDed with the entity select vector associated with the suppliers names in the suppliers relation ~567, FIG. 16A). In block 526, the resultant bit vector from the AND operation is evaluated and it is determined that the resultant bit vector is not all zeros. The resultant bit string contains binary bits set to "1" (569, FIG. 16A). In block 530, a single "count", with two ordinal positions as input, is performed on the entity 6elect vector, up to and including the binary bit associated with the unique value Smith, the last value characterized in the new entity select vector. The RPU 22 determines that there are two binary bits set to "1", one for the count corresponding to Blake and that there are five binary bits set to "1" corresponding to the count for Smith.
Therefore, the unique value6, Blake and Smith, are associated with the second and fi~th row use vectors of the row use that is assocLated with suppliers names (262, FIG. 4). In block 532, row use vectors associated with the unique values Smith and slake are retrieved from the row u6e set (262, FIG. 4). The row use vector ~or Blake is a binary bit vector containing five binary bits in which the third binary bit is set to "1", indicating that the value Blake resides in the third row 1338~01 of the column for the suppliers names (570, FIG. 15).
Likewi3e, the row use vector for Smith is a binary bit vector containing five binary bits and the first bit is set to "1", indicating that the value Smith resides in the first row of the column for suppliers names (570, FIG. 16A).
During block 536, RPU 22 determines that there is more than one value selected from a column for suppliers names, thus processing continues at block 537. During block 537, the row use vectors associated with Smith and Blake (570, FIG. 16A) are ORed together to form the select vector as shown at 571 of FIG. 16A. During block 538, the RPU 22 determines that more th~n one column is involved in this select operation, i . e., the supplier3 names column and the city column. It continues at block 539, during which the RPU 22 determines that the values London and Paris are selected in the separate column cities. In block 516 l the RPU 22 determines the ordinal positions for London and Paris in the value set for cities. Essentially, the system transverses the value set for cities, and during block 518, the RPU ~ot~ n~
that London and Parls are both located in the value set for cities. During block 532, a binary ~it vector is constructed to indicate which ordinal positions the value set for cities are associated with the cities London and Paris (573, FIG. 16A). Specifically, the system creates a binary bit vector which shows binary bits set to "1" in the fifth and eighth ordinal positions (573, FIG. 16~). During block 524, the Boolean A~D operation is performed on the new binary bit vector with the entity select vector associate~ with cities for the supplieris relation (575, FIG. 16B). In block 526, the RPU 22 determines that the resultant vector does not contain all zeros (579, FIG. 16B).
London and Paris are determined to be in the suppliers relation. If the values London and Paris were not located in the suppliers relation, then the RPU would return at block 528 to alert the user that the selected values were not found in the supplLers relation.
Processing continues at block 530, in which the RPU
determines the number of bits set to "1" up to and including the binary bit associated With the unique value London, and the same is done for Paris. RPU 22 determines that London is associated with the second binary bit set to "1" in the entity 6elect vector; thus, the unique value London corresponds to the second row use vector in a roW use set (266, FIG. 4). Additional-ly, the RPU determines that Paris is as60ciated with the third binary bit set to "1" in the entity select vector and, thus, Paris corresponds to the third row use vector in the row use set (266, FIG. 4). In block 532, the RPU
22 retrieves the row use vectors associated with London and Paris (561, FIG. 16B). The row use vector for London contains five bits, the first and fourth bits of the row :se vector containing binary bits set to "1".
The row use vector for Paris contains five binary bits, the second and third binary bits set to "1".
Essentially, the row use vectors for London and Paris indicate that the first through fourth rows of tbe column are associated with cities London, Paris, Paris, London .
During block 536, the RPU determines that more than one unique value was selected in the column of cities and, thus, processing continues. In block 537, the RPU
performs the Boolean OR operation on the row use sets for London and Paris. A select vector is generated (563, FIG. 16B), which has five binary bits and bits one through four are set to "1". In block 538, the RPU
determines that more than one column of the relation was 35 involved in the SELEC~, l.e., suppliers names and ,, clties . In block 539, the RPU r~PtPrTn~ nP~ that no more values in other columns need to be selected from the relation. In block 540, the select vectors associated with the suppllers names (5~1, FIG. 16A) and cities (563, FIG. 16B) are ANDed together to generate a resultant select vector, which represents the rows that satisfy the query (565, ~IG. 16B). As shown in 565 (FIG. 16B) ~ the resultant entity select vector lndicates that the first and third binary bits are set to "1", indicating that the first and third rows of the suppliers relation contain information on whether Smith is associated with Paris and/or London and whether Blake is associated with Paris and/or London. As in the last example, the actual rows in the relation can be reconstructed to display to the user in one of two ways.
First, the system can use the entity use vectors associated with each column of the relation to map the row numbers d~Prmi~P~ by the resultant select vector, to the ordinal positions in the appropriate value sets.
Second, the RPU 22 could trace back from the various row use vectors to the entity select vectors and back to the appropriate value sets to determine the actual unique Values in the first and third rows of the relatlon. A
more detailed discussion on the entity usc vectors approach will be discussed in Part VI.
D. RECONSTRUCT
The purpose of the KE~N~l~UCT operation is to generate the values associated with a particular column of a relation for the user to ascertain. Typically, the binary representation of a relation is constructed and stored in memory 18. ~f the user of the system wishes to see the actual relation and the values depicted in the relation, then the RECONSTRUCT operation can be per-formed for reconstructing and displaying the relation to f --~2--the user . The f low diagrams in FIGS . 17A and 17B are for reconstructing and displaying various columns specifLed by the user or an applications program.
Typically, the user will specify one or more columns of a relation to be displayed by the system. The user might re~uest the Suppliers ID column of the relation for suppliers at 63 of FIG. 2, which is currently stored in its binary representation in memory 18 (FIG. lA) (260, FIG. 4) .
Referring to FIG. 17A, the first step in performing the RECONSTRUCT operation is in block 565, in which the user specifies various columns of a relation to be reconstructed. As stated above, an applications program may also specify particular columns of the relation to be pro~ected. For example, when the SE~ECT operation is performed, the resultant binary representation can then be reconstructed and displayed to the user via the RECONSTRUCT operation (FIG. 17A). In the hext step, block 566 calls the routine DISPLAY/RECONSTRUCT (FIG.
17B) to reconstruct one of the specified columns. The DISPLAY/RECONSTRUCT routine (FIG. 17B) essentially performs the necessary steps for obtaining the values and for placing the values in the proper rows in the column. In block 567, the RPU determines whether there are any more columns that need to be displayed. If there are more columns to be displayed, then processing continues at block 566. Blocks 566 and 567 are performed until all of the columns specified by the user or applications program have been reconstructed. If all o~ the columns have been reconstructed, then processing continues at block 568, in which the RPU returns to the calling program.
Referring to FIG. 17B, a flow diagram of the DISPLAY/RECONSTRUCT routine is depicted. During block 35 571, the RPU 22 obtains the entity select vector associated with the partlcular column to be displayed.
During block 575, the first row use vector associated with the row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18, is transferred to the RPU 22. Then during -- block 577, the P~PU 22 performs a 3001ean AND operation on the row use vector obtained in block 57? with a row 6elect vector. (The row select vector i8 created by performing a query operation on the relation thereby selecting which rows of the relation the user or application program wishes to di6play. ) The row select vector is a new binary bit vector and each binary bit corresponds to a row of the column or column6 to be displayed. A binary bit set to "1" indicates that the corresponding row needs to be displayed. The result of the AND operation is a new vector Z which depicts the rows of the coLumn which contain a particular value associated with the row use vector. The results of the AND operation are sent to memory 18 ~or future processing. Then, in block 579, the RPU 22 determines whether the resuLtant vector Z is "0". If the resultant vector Z is "0", then during block 581 a "0" is placed in the first binary position of a new vector called the index vector. The index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the row use set. Each bit indicates whether the unique value associated with the row use vector exists in the relationaL column to be displayed. If a binary bit in the index vector is set to "o", the unique value associated with the row use vector does not exist in the column to be displayed. Wherea6, if the binary bit is set to "1" in the inde'x vector, then the unique value associated with the row use vector exists one or more times in the column. Processing continues at block 593, during which the next row use vector of the row use set i8 obtained.
Returning to block 579, if the result of the Boolean operation performed in block 577 is non-~ero, then processing continues at block 585. Durlng block 585, the RPU 22 6ets the bLnary bit in the index vector, which is associated with the current row use vector, to "1". In block 587, the resultant bit vector Z is stored in memory 18. The resultant vector Z is later used in the reconstruction process.
During block 5ag, the RPU 22 clear6 the binary bits ln the row select vector that match the binary bits set to "1" in the resultant vector Z. The purpose of this step is to shortcut the processlng of the row use vectors in the row use set. Stated differently, when the row select vector is cleared, all of the values in the rows of the column have been determined. The row use vectors which have been processed with the row select vector contain all of the values to be displayed in the column. Then during block 591, the RPU 22 (28, FIG. lA) determines whether the row select vector has been completely cleared, or stated differently, all of the binary bits have been set to "0". If the row select vector contains only binary bits set to "0", then processing continues at blocks 597, 601, 603, 605, 607, 609 and 611 to reconstruct the column with the values 2ssoclated with the row use vectors ln the row use set.
~owever, lf not all of the binary blts ln the row usc vector are set to "0", then processlng contlnues at block 593. During block 593, the RPU 22 gets the next row use vector in the row use set currently being pro-oessed. During block 59~5, the RPU 22 determines whether the end of the row use set has been reached. If the end of the row use set has been reached, then processing 35 continues at blocks 597, 601, 603, 605, 607, 609 and 611 to reconstruct the column. However, assuming that the end of the row use set has not been reached, then processing continues at blocks 577, 579, 585, 587, 589, 591, 593 and 595 until all of the row use vectors of the row use set bave been processed.
Assuming that all cf the row use vectors of the row use set have been processed or the row select vector contains binary bits set to "0", then processing continues at block 597. During block 5g7, the RPU 22 determines the ordinal positions of the binary bits set j to "1" in the index vector. Each binary bit set to "1"
indicates which row use vectors reference unique values which are to be displayed in the column. During block 601, or a row use vector which has a corrPsp~n~lf n~
binary bit set to "1" in the index vector, the ordinal position of the binary bit set to "1" in the entity select vector is determined. Then, during block 603, the value associated with the ordinal position obtained in block 6Cl is obtained from the value set. During block 605, the RPU 22 finds the appropriate location in the index vector associated with the value. Then during 607, the appropriate resultant Z vector stored during step 587 is retrieved. The resultant vector Z indicates which rows of the column contain the unic~ue value associated with the row use vector, and the unique value is placed in the column at the appropriate row locations. During block 609, the RPU 22 determines whether any more values are left or processing.
Assuming that there are more values, then processing continues in blocks 601, 605 and 607 until all of the values have been placed in the proper rows of the cclumn. Once all of the values have been placed into the column, then processing returns to the calling program during block 611.
., 1. Det~iled r- le gf Perform~n~
uCT oPerat ~ on Referring to FIGS. 17A, 17B, 18A, 18B, 18C and 18D, a detailed example for reconstructing the column for 8uppliers IDs in the Supplier6 relation (FIG. 2 ) is now . More particularly, it is assumed that only the binary representation of the column Suppliers IDs exist ln the RDMS 10. The binary representation of the column may be from a result of a SELECT operation or it may have been previously stored after processing by the BBVP 14. In either case, the binary representation of the column Suppliers IDs exist in memory 18 and now the user or an applications program need6 to display the actual value6 of the column. Although, this example is for reconstructing and displaying only one column of a supplier for ,~the suppliers relation, the PRO~ECT
operation could be se~uentially performed to supply all of the columns of the supply relation.
Referring to FIGS. 18A, B, C and D, a Results Table for depicting the results of the RECONSTRUCT operation for reconstructing or supplying the Suppliers ID column ~80, FIG. 2) of the suppliers relation (63, FIG. 2) is shown. Each row (800-848) of the Results Table depicts a result of the routines shown in FIGS. 17A and 17B.
The Results Table is separated into seven columns. From left to right, the first column of the Results Table shows the entity select vector for the selected column to be displayed. The second column shows the row use vector set associated with the specified column to be displayed. The third column is the row use vector currently oeing processed, and the fourth column is the row select vector specified by the user application program for determining which rows of the column are to be displayed. The fifth column is the result of ANDing the row use vector and the row select vector together;
the result is called vector Z. The sixth column depicts the reconstructLon of the index vector for displaying which row u6e vectors of the row use set have associated unique values in the column. The last column is for the reconstruction of the column to be displayed.
Referring to FIGS. 17A and 17B, the operation performed by the RPU 22 (FIG. lA~ for displaying and reconstructing the Suppliers ID column (80, FIG. 2) is now discussed. Specifically, during block 565 (FIG.
lo 17A~ the user or application program selects the various column or columns which are to be reconstructed and displayed as a result of the query operation. For this example, the user has selected the Suppliers ID column (80, FIG. 2). Currently, the Suppliers ID column only exists in the secondary memory (18, FIG. lA) in the form of a ~inary representation or row use set. During block 566, the DISPIAY/RECONSTRUCT routine (FIG. 17B) is called for finding the values and reconstructing the 6uppliers ID column.
2d Referring to FIG. 22B, during block 571, the RPU 22 (FIG. lA) finds the entity select vector assoclated with the Suppliers ID column in the memory 18 (FIG. lA~ he entity select vector stored in memory 1~3 (FIG. lA~ is transferred via bus 30 (FIG. lA) to RPU 22 (FIG. lA~.
~5 Then, during block 573, the RPU 22 finds the row use set associated with the Suppliers ID column in memory 18 and transfers via a bus to the RPU 22 (FIG. lA). During block 575, the RPU 22 obtains the first row use vector o~ the row use set for the Suppliers ID column (804, FIG. 18A). For purposes of this example, the user wishes to display the first four rows of the Suppliers ID column. Constructed earlier in the system is a row select vector having four binary bits set to 1 in the first four ordinal positions, i.e. 1 1 1 1 O . During 35 block 577, the row use vector (804, FIG. 18A) and the row select vector (806, FIG. 18A) are transferred to the RPU 22 (FIG. lA). Here the row use vector (804, FIG.
18A) and the row select vector (806, FIG. 18A) are ANDed together to ~lP~Prm;nP a resultant binary vector Z (808, FIG. 18A). The resultant binary bit vector Z depicts the row6 of the Suppliers ID column in which the unique value associated with the row use vector (804, FIG. 18A) is to reside. The resultant binary vector Z contains a binary bit set to "1" in the first posit$on which means that the uni~ue value associated with the row use vector (804, FIG. 18A) will be placed in only the first ordinal position of the Suppliers ID column. During block 579, the RPU 22 determines that the result of the AND
operation is not "0", thus processing continues to block 585. During block 585, the RPU 22 via BBVP 14 (FIG. lA) sets the first binary bits of the index vector to "1"
(810, FIG. 18A) to indicate that the unique value associated with the row use vector (8Q4, FIG. 18A) exists at least once in the Suppliers ID column. In block 587, the resultant binary bit vector Z is stored in memory 18 for future reconstruction of the column.
Specifically, the binary bit set in the resultant binary bit vector Z indicate the rows of the Suppliers ID
column with the unique value S1 associated with the row use vector (804, FIG. 18A) reslde. Then during block 589, the binary bit6 set to "1" in the row select vector which match the binary bits in the resultant vector Z
are set to "Q" (812, FIG. 18A). During block 591, the row select vector is a value determined if all of the binary bits have been set to "0". The row select vector contains three more binary bits set to "1" (812, FIG.
18A), thus it continues ~at block 593 . During block 593 , the RPU 22 obtains the next row use vector of the row use set (814, FIG. 18B). The row use vector (814, FIG.
18B) and the row select vector (816, FIG. laB) are ANDed .
together during block 577. The resultant vector Z is shown at 818 of FIG. 18B. During biock 579, the resultant vector Z is evaluated to determine if all the binary bits of the resultant vector have been set to "o". Not all of the binary bits of the resultant vector are "o" (the resultant vector is "0 l 0 0 0" (818, FIG.
18B) ), and thus during block 585, the second binary bit of the index vector is set to l'l" (820, FIG. 18B). In block 587, the F~PU 22 (FIG. lA) stores the resultant lb vector Z in memory 18 for future processing. Then during block 589, the binary bits of the row select vector which were set to "l" and matched the ~inary bit set to "1" in the resultant vector Z are set to "0"
(820, FIG. 183). In block 591, the row Gelect vector iB
evaluated to determine if all the binary bits have been set to "0". The row select vector still has two binary bits set to "1" (821, FIG. 18C), and durlng block 579, lt is evaluated to determine i~ all the binary bits are set to "0". The resultant vector Z contalns one blt set to "1~, and thus processlng continues at block 585.
Durlng block 585, the RPU 22 (FIG. lA) sets the thlrd binary bit of the index vector to "l" (828, FIG. 18C).
During block 587, the resultant vector Z is stored in memory 18 for future processing. In block 589, the binary bit of the row select vector which matched the binary bit of the row use vector is set to "0" t830, FIG. 18C). Then during block 591, the row select vector is evaluated to determine if all the binary bits are set to "0". The row select vector still has one binary bit set to "1" (834, FIG. 18C), thus it continues at block 593. During 593, the RPU 22 obtains the next row use vector of the row use sét (832, FIG. 18C). The row use vector (832, FIG. 18C) and the row select vector (834, FIG. 18C) are ANDed together during block 577. The 35 resultant vec~:or Z is shown at block (836, FIG. 18C).
f 1338601 During bLock 579, it is determined that not all of the binary bits of the resultant vector are "0" (the resultant vector is "00010" (836, FIG. 18C). Thus, during block 585, the fourth binary bit of the index g vector i5 set to "1" (838, FIG. 18C). The resultant vector Z is stored for future processing. Then during block 589, the binary bit of the row select vector which were set to "1" and match the binary bit set to "1" in the resultant vector Z are set to "0" (840, FIG. 18D).
I'hen during block 591, the row select vector is evaluated to determine if all the binary bits had been ~et to "0". All the binary bits of the row select vector are set to "0" (830, FIG. 18D), and thus , ¦ processing continues to block 597.
During block 597, the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Specifically, the ~irst, second, third and fourth binary bits of the index vector are set to "1".
Thus the first, second, third and fourth row use vectors of the row use sets are associated with unique values which exist in the Suppliers ID column. During block 601, RPU 22 (FIG. lA) determines the ordinal positions of the entity select vector which coLLc~.uul~d with the row use vectors having corr~qron~in7 binary bits set to "1" in the index vector. Then during block 603, the values associated with each ordinal position of the entity select vector are obtained from the value set of Supplier IDs. Specifically, the values Sl, S2, 53 and S4 are obtained. Then, during block 605 and 607, the first resultant binary vector Z is obtained from temporary storage, and the value Sl is placed in the proper ordinal position 'of the column for Suppliers IDs.
And during block 609, the RPU 22 (FIG. lA~ determines whether there are any more values left for processing.
There are three ~ore values left for processing, and --., ~81--thus block 605 and 607 are performed. During block 605 and 607, the second resultant vector Z is obtained and the value S2 ls pl2ced in the second ordinal position in the column (844, FIG. 18D). In block 609, it is determined that there are more values left for processing, and thus block 605 and 607 are performed.
During block 605 and 607, the third resultant vector associated with the third resultant vector Z which is associated with the value S3 is obtained from memory.
~he value 53 is placed into the third ordinal position of the relational column for Suppliers IDs (846, FIG.
18D). In block 609, it ls determined that there is still one more value to process, and thus block 605 ~nd 607 are performed. During block 605 and 607, the fourth resultant vector Z stored in temporary memory is obtained and the value S4 is placed into the fourth ordinal position of the column for Suppliers ID6 (848, FIG. 18D). During block 609, the RPU 22 ~FIG. lA) determines that there are no more values for proces3ing and thus returns to the calling RECONSTRUCT routine (FIG. 17.~) at block 567.
During block 567, the RPU 22 (FIG. lA) determines whether there are any more columns to be displayed to the user. For this examplel it is assumed that only the Suppliers ID column is to be reconstructed and displayed. However, if more columns of a particular relation were to be di6played, then processing would continue at blocks 566 and 56i until all o~ the columns were displayed. Also, the same row select vector would be used each time the DISPLAY/RECONSTRUCT routine (FIG.
17B) was performed. Assuming that there are no more columns to be displayed, then processing returns to the calling program during block 568.
E.
The ability to "JOIN" two or more relations is considered to be the most powerful feature of a relational system, An IntrDduction to Data S~ SYstems Vol. 1, 4th Ed. (1986). E6sentially, a JOIN is a SELECT
over the Carte6ian product of more than one relation o~ .
the relational database.
To understand the purpose of the JOIN operation, an overall view of how an operation might be lmplemented io for a conventional system s shown. Suppose that a user needs to get all combinations of supplier and part information for the SUPPLIERS relation (63, FIG. 2) and PARTS relation (65, FIG. 2) such that the supplier and part in question were located in the same city. The user might use the following query:
SELECT S.ID#~s.NAMElsTATusls.cITylp.ID#/p.NAME~
COLO~,P.WEIGHT,P.CITY FROM S,P
WHERE S . CITY=P. CITY;
The result of this query produces the following table:
I~BIE A
s# SN~ME STATUS S . CITY P# PN~E C0LCR WEIGHT P. CITY
Sl Smith 20 L~ndon Pl Nut Red 12 L~ndon Sl Smith 20 London P4 Screw ~ed 14 L~ndon 2 5 Sl Smith 20 L~ndon P6 Ccg Red 19 L~ndon S2 Jones 10 Paris P2 801t Green 17 Paris S2 Jones 10 Paris P5 Cam Blue 12 Paris S3 Blake 30 Paris P2 Bolt Green 17 Paris S3 Blake 30 Paris P5 Cam Blue 12 Paris 3 0 54 Clark 20 L~ndon Pl Nut Red 12 L~ndon 84 Clark 20 London P4 Screw Red 14 London S4 Clark 20 Londdn P6 Cog P~ed 19 L~ndon The data shown in Table A above, comes from the two reIations, suppliers 63 and parts 65 (FIG. 2). In the ~; .
., ~uery above, the names of the relations are listed in the FROM clause, and conneotion between the two relatLons (63, 65 FIG. 2) i6 listed in the W~ERE clause (i.e., the fact that the city values must be equal) which is called the JOIN predicate. The JOIN is used to combine relations ba6ed on e~uivalent values in the column if specified by the JOIN predicate. In this case, the specified columns are the "city" columns of each relation. The JOIN pairs each o~ the N rows of a first relation; e.g., the SUPPLIERS relation (63, FIG.
2), with each of the M rows of a second relation, e.g., the Parts relation (65, FIG. 2), to form an N * M
resultant relation. Then, the JOIN operation discards all resultant rows of the JOIN relation which do not satisfy the JOIN specification. This type of JOIN
relation is generally referred to as the "EQUIJOIN"
operation .
As an example of EQUIJOIN, consider any two rows from the two relations (i.e., the suppliers relation (63, FIG. 2) and the Parts relation (65, FIG. 2). For example, the rows shown below:
S.ID~ SNAME STATUS CITY /~ P.ID~ PNAME COLOR WEIGHT crrY
S1 Smith 20 London Pl Nut Red 12 L~ndon TheGe rows show supplier S1 and part P1, are locatcd in the 6ame city, namely, London. Therefore, a reGult row iG generated, since both rows satisfy the predicate in the WHERE clause, namely, S.CITY=P.CITY.
8imilarly, for all other pairs of rows in the SUPPLIERS relation (63, FIG. 2) and the PARTS relation (65, FIG.2) which satisfy the predicate clause, a resultant row is generated (see Table A). Referring to Table A and to FIG. 2, notice that the supplier, S5 at 75, located in Athens, does not appear in the JOIN
relation (Tabie A) because there are no parts associated 1338~01 with the city. I.ikewise, part P3 at i31, associated with Rome, does not appear in the resultant relation, because there are no 6uppliers associated with Rome.
There is no requirement that the comparlson operator, in a JOIN predicate be equality. The EQUIJOIN
by definition produces a result containing two identioal columns as shown $n Table A. If one of these two columns is eliminated, the result ls cailed NATURA~
JoIN .
In conclusion, the JOIN operation is the restriction of the Cartesian product of two or more relations. The Cartesian product of a set of N relations is a new relation consisting of all possible rows "r", such that "r" is the concatenation of all rows from the participation of relations. Once the Cartesian product is generated, all rows that do not satisfy the "JOIN
predicate" are eliminated from the Cartesian product.
What is le~t is the ~QUIJOIN result relation.
In this examp~e, the complete table contains thirty rows. Now, all the rows ' Cartesian product in which S . CITY is not egual to P. CITY are eliminated and what is left is the EQUIJOIN result as shown earlier.
In the following sections, two aspects of the present inventions are discussed. First, the JOIN
relation is efficiently represented by binary bit vectors and second, the JOIN relation is constructed without having to create a cross produot relation.
1. B; n~rV ~ePresentat ~ nn o:E a JOIN ~ t 1 I~n Referring to FIGS. 2, 19, 20 and 21, a binary representation of a JOIN relation is now d~cl~Gs~
Specifically, FIG. 19 represents the depiction of a JOIN
relation from the following query:
133~601 ., SELECT S . ID#, S . STATUS, S . CITY, P. ID#
FRON S, P
WHERE S . CITY=P. CITY;
This guery is a pro~ ection of the JOIN because the P. CITY is not mentioned in the SELECT clause of the query. Like the EQUIJOIN example discussed above, this query requires that data come from two relations, namely the suppliers relation (63, FIG. 19) and the Parts relation (65, FIG. 19). 80th relations are named in the FROM clause and the rnnnPrt~nn between the tables i6 through the CITY columns ln the WHERE clause. The rcsult of the JOIN for dlsplaying the columns CITY, SUPPI.IERS ID#s, STATUS, and PART ID # ' 8 is shown at 628 of FIG. 19. To construct the JOIN relation of the SUPPLIERS relation and the PARTS relatlon, a new set of entity 3elect vectors 600 and 602 Ior depicting the values in the JOIN relation columns, are created. The entlty select vectors at 600 and 602 are binary bit vectors that indicate which rows of the particular relation associated with the entity select vector, participate in the JOIN relation. More particularly, each binary bit has an ordinal position, which corresponds to a row of the relation. Binary bits 601, 603, 605, 607 and 609 correspond to the five rows of the suppliers relation 63. When the binary bit is set to "1" the particular row associated with the binary bit participates in the JOIN relation. SpPr~f~c~l~y, binary bit 601 indicates that the first row of the SUPPLIERS
relation 63 participates in the JOIN relation.
Likewise, the entity select vector 602, having binary bits 611, 613, 617, 619 anc 62i, indlcates that the flrst, second, fourth, 'fifth and slxth rows of the PARTS
relation 65 participate in the JOIN relatlon, Vectors 600 and 602 act ~u6t llke the entity select 35 vectors discussed in FIGS. 4, 5 and 6. The only -difference is that the ordinal positions of the bits in the entity 6elect vectors 600 and 602 do not cuL~ u-ld to unique values in a value set. Instead, the binary bits of entity select vector 600 and 602 refer to row locations in a partLcular relation. Like the entity select vc~ctors in FIGS. 4, 5 and 6, an implied mapping correspondence exists between each binary bit in the entity select vector to and a particular row u6e vector in an associated row use set.
The implied mapping scheme is illustrated by the dotted lines 608, 610, 612 and 614, which show that the binary bits of the entity select vector 600 indicate the ccrrespondence of the rows of the suppliers relation to the row use vectors 615, 617, 619 and 621, respectively.
Likewise, the binary bits of the entity select vector 602, which correspcnd to the row use vectors of the Parts relation 65, are mapped in an implied manner to the row use vectors 623, 625, 627, 629 and 631, as shown by the dotted lines 616, 618, 620, 622 and 624. The row use sets of the JOIN relation perform a dual task of representing the values in the rows of the JOIN relation and for depicting more than one column of the JOIN
relation. Specifically, the row use set 604 represents the columns S.CITY 638, S. ID~ 636 and S.STATUS 634.
Likewise, the row use set 606 represents the P.ID,L" 630, the column of the JOIN relation 628. The columns 638, 636, 634 and 630 depict the result of the JOIN.
To summarize, the entity select vector 600 and the row use set 604, represent all of the binary information necessary to construct the suppliers relation portion 626 of the JOIN relation, 628, namelyt columns 638, 636 and 634. The following' is a detailed discussion on how this representation is achieved.
~eferring to row use vector 615 at FIG. l9 of the 35 row use set 604, a dotted line 608 maps the row use -f vector 615 to the binary bit 601 of the entity select vector 600. ~he binary bits of the row use vector 615 indicate the row positions of the columns 638, 636 and 634, which contain a particular value in the fir6t row of the suppliers relation 63. Binary bit 601 cor-responds to the values London, 20, Smith and Sl. To build the S . CITY column of the JOIN relation, only the value London is referenced. Thus, the three binary bits set to "1" in the row use vector 615 represent three occurrences of the value London in the S.CITY column 638 of the suppliers portion 626 of the JOIN relation. For the S.ID;~ column 636, the first three bLts set to "1" in the row use vector 615 indicate three occurrences of the value Sl. Likewise, in the STATUS column 634, the first three binary bits set to "1" in the row use vector 615 represent occurrences of the value 20. Thus, the first three rows of the suppliers portion 626 of the JOIN
relation 628, are characterized by the fir6t three bits of the row use vector 615. Likewise, the ne~t three rows of the suppliers relation 626, are characterized by the row use vector 621 of the row use set 604. The replication of the threc bits in row use vector 615 indicates that the three values [LONDON, 51, 20] of the first table of the Suppller relation occur in rows 1, 2, 3 of the JOIN relation. The remaining values and the columns of the suppliers portion 626, are indicated by the row use vectors 61? and 619, which contain binary bits set to "1" in the seventh, eighth, ninth and tenth rows of both row use vectors. It should be noted that although the columns S.CITY, S.II~ and S.STATUS of the Suppliers relation are indicated by the query, the column SNAME in the SUPPLIERS relation, could just as easily have been represented by the row use set 604.
The values of the Parts relation are mapped into the JOIN relation 628 by the row use set 606 and the .
entity 6elect vector 602 ln exactly the same ~a6hion as for Suppliers.
In sum, the row u6e sets 606 and 604, together give a binary representation of the JOIN relation 628. Only the binary representation of the JOIN relation ls stored in the RDMS 10 (FIG. lA). The actual values depicted by the binary representation, are retrieved when the system performs a reconstruct and display for the user. FIGS.
20 and 21 represent a more detailed view o~ the JOIN
relation 628 (FIG. 19). More particularly, the SUPPLIERS relation 63 (FIG. 19) i5 depicted by its row use 6ets at 63 (FIG. 21) and the Parts relation 65 (FIG.
20), as depicted by its row use sets 65 (FIG. 20). In addition, the entity select vector6 600 and 602 are 6hown corresponding to the row use sets 630 and (FIG. 20) . Also, FIGS. 20 and 21 depict the value sets referred to in the SUPPLIERS and PARTS relations with their as60ciated entity select vectors.
Referring to FIG. 20 a detailed discussion of the SUPPLIERS relation portion 626 (FIG. 19) ifi now presented. As discussed above, the row use 6et 604 for the JOIN relation (628, FIG. 19) is mapped baclc to the Suppliers relation by the entity select vector 600. As 6hown is FIG. 19, the entity select vector 600 i3 repeated for each column of the Suppliers relation which has one or more values in the suppliers portion of the JOIN relation. The implied mapping from each row use vector o~ the row use set 604 is shown by the dotted lines 608, 610, 612 and 614. The row u6e sets ~or 3p representing the column6 o~ the suppliers relation are shown in FIG. 4. The first three binary bits set to "1"
in the row use vector '615 of the row use set 604 are mapped to the l~inary bit 601 of the entity select vector 600. The entity select vector 600 corre6ponds to the 35 row use set 260. The fir6t binary bit 601 is set to "1"
indicating that the value in the first row of the 5 . ID#
column of the suppliers relation is present ln the JOIN
relation. To determine which 6uppliers ID is referenced by the binary bit 601, a Boolean AND operation is performed on each row use vector 184, 186, 188, 190 and 192 to determine which row use vector contains a .:o~l~al!ol~ding "1" bit in the first position. Row use vector 184 contains a binary bit 6et to "1" in the first position. Row use vector 184 maps back to the first binary bit of the entity select vector 176 for the suppliers relation. The first binary bit of entity select vector 176 corresponds to the value S1 in the value set for suppliers identifiers 160. Thus, the first three binary bits set to "1" in the row use vector 615 indicate that the value Sl is in the first three rows of the s column in the JOIN relation. In the same way, the first three binary bits set to "1" in the row use vector 615 are mapped to the entity select vector 600 associated with the row use sets 264 and 266, corresponding to the S.STATUS and S.CITY columns of the suppliers relation. The row use vector 615 represents the first three values of the S.CITY column 638 and the S.STATUS column 634 of the prior portion of the JOIN
relation .
Referring to FIG. 21, a more detailed view of the Parts relation portion of the JOIN relation is shown.
Specifically, the mapping of the part ID #'s to the P. ID# column 630 of the JOIN relation is shown. Row use vector 623 o~ tlle JOIN relation contains binary bits set to "1" in the first and fourth positions. These binary bits correspond to the first and fourth positions of the P. ID# column 627 of the JC~IN relation. Row use vector 623 i8 impliedly mapped to the first binary bit 611 of the entity select vector 602. The entity select vector 35 602 corresponds to the row use set 304. A Boolean AND
, --so--operatlon is performed on the entity select vector in each row use vector of the row use set 304 to determine which row use vector contains the binary bit set to "l"
in the first position. The left most row use vector contains a binary bit set to "1" and this row use vector maps back to the first binary bit position of entity select vector 284. The first binary bit of the entity select vector 284 is set to "1", indicating that the first row of the value set 282 contains the unique value mapped into the Parts relation 65 and into the JOIN
relation. Thus, binary bit 611 indicates that the value Pl is mapped into the row use vector 623 of the JOIN
relation, specifically, that of the first and fourth rows. The remaining rows of the P. ID~ column 627 of the JOIN relation are depicted by the row use vectors 625, 627, 629 and 631 of the row use set 666.
2 . Con~trUCt; n~/ A BinarY Rel~resenta~ ~ on of a JOIN Relati on Referring to FIGS. 22A, 22B, 22C, 22D, 22E, 22F and 22G, a detailed discussion on the operations performed by the RPU 22 (FIG. lA) for performing a JOIN operation and constructing the JOIN relation is now discussed.
Specifically, referring to FIG. 22A, the routine EQ~I-JOIN, which builds a binary representation of the .JOIN
relation, is shown FIG. 22B is a flow diagram of the routine BUILD ROw USE SETS for constructing the particular row use sets associated with each column of the JOIN relation. FIG. 22C is a flow diagram for the routine CONSTRUCT JOIN ROW USE VECTORS for controlling the overall construction of each row use vector of a row use set in the JOIN relation. FIG. 22D is a routine EVALUATE ROW USE SETS for detorm;n;n~ the number of occurrences of the unique values participating in the JOIN operation. FIG. 22E is a routine called PRODUCTS
13385~1 for calculating a series of proauct terms which characterize the formation of bit pattQrns in each of the row use vector6 fQr a particular row use set of the JOIN relation. FIG. 22F is a routine called NUNS for determining the number of times a particular bit pattern -- repeats itself in a row use vector in the JOIN relatlon.
FIG. 22G is a routine called GENERATE BIT STRING for building the row use vector in the JOIN relation.
When the JOIN operation ls performed, the binary representation of the one or more relations to be JOINed are found in the memory 18 of the RDMS 10. Here the binary represented relation~ are stored until the RPU 22 is ready for processing. When the RPU 22 is ready to per~orm the JOIN operation, the relations are sent via bus 48 to the BBVP 14. Using the relations in the BBVP
14, the RPU 22 performs the EQUI/NATURAI JOIN operation (FIG. 22A) to create a binary representation of the resultant JOIN relation.
Referring to FIG. 22A, a detailed description of the overall process for performing the EQUIJOIN
operation is now discussed. Specifically, during block 652, the RPU 22 obtains all the entity 6elect vectors, for the participating relations, of the columns which contaln values from the same particular value set over 2s which the JOIN operation is performed (e.g., CITY VALUE
SET for S . CITY = P. CITY) .
The JOIN operation is performed over the values of the relations which fulfill the WHERE clause of the JOIN
query (e . g ., S . CITY = P. CITY) . For example, referring 3 o to FIG . 19, where the JOIN relatLon was characterized by the WHE:RE clause (e.g., S.CITY - P.CITY), the values London and Paris in the CITY columns of relation 63 and 65 were common to both relations, and the JOIN operation was performed with respect to these values.
"
During block 654, the RPU 22 performs a Boolean AND
operation on the entity select vector6 obtained to determine a resultant binary bit vector 2- The resultant binary bit vector indicates which values of the particular value set are common to all of the entity select vectors ~ involved in the JOIN operation.
Specifically, binary bits set to "1" in the resultant bit vector "x" indicate the values of the value set wh$ch are common to all the columns represented by the obtained entity select vectors.
j During block 658, the RPU 22 determines which binary bits of each entity select vector CO~l~a~.,lld to the binary bit set to "l" in the resultant bit vector ~.
For each binary bLt set to "l" in the entity seleot vector that correspond6 to a binary bit set to "l" in the resultant bit vector X, the RPU 22 obtains the row use vector in the associated row use 6et during block 658. Then, during block 660, for each row u6e 6et as60ciated with each entLty 6elect vector, the Boolean operatLon OR is performed on the selected row use vectors of the a6sociated row use set. The resultant , vectors are referred to a6 JOIN entity select vectors, which characterize values belonging to one or more JOIN
columns in the JOIN relation. In block 664, a row u6e set corroQr~n~9~n~ to each JOIN entity 6elect vector i8 constructed. Specifically, during block 664, the BUILD
ROW USE SET routin~ tFIG. 22B) is called to construct each row use set corresponding to each JOIN entLty select vectar of the JOIN relation. When all of the row use sets for the JOIN relation have been con6tructed, processing returns to the calling routine in block 668 .
Referring to FIG. '22B, a detaLled dLscu6sLon fcr the routine BUILI) ROW USE SETS i6 now disoussed. The purpose o~ thLs routine is to construct the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns o~ the resultant JOIN relation.
Specifically, during block 672, the RPU 22 selects the first unique value i in the resultant bit vector ~.
In other words, the RPU 22 selects the first value represented by the occurrence of a "1" blt ln the re6ultant bit vector ~. Then, during block 673, a variable "START ROW" is set egual to zero. This variable indicates the ~tart position of the first row o in the JOIN row use vector in the JOIN row use set being generated and will be discussed in more detail along with the description of FIG. 22G. Then, during block 674, the CONSTRUCT JOIN ROW USE VECTORS routine (FIG.
22C is called. This routine is performed by the RPU 22 to determine the characteristics of a particular row use vector of the JOIN relation corresponding to the value i and to build the row use vector or vectors associated with the value in the JOIN relation. During block 676, the RPU 22 determines if there are any more unique 2 0 values which are indicated in the resultant vector 2~-If there are, then the next unique value, i, is obtained at block 678. Processing continues at the ~:ONS~1KIJC~
JOIN ROW USE VECTO3~S routine to build the JOIN row use vector(s) associated with the next unique value.
Assuming that there are no more unique values represented in the resultant bit vector ~, then processing returns to the calling program (EQUI/NATURAL
JOIN, FIG. 22A) at block 680.
Referring now to FIG. 22C, a detailed description of the CONSTRUCT JOIN ROW USE VECTORS routine is now fli Cc~lcc-or~. As stated above, the purpose of this routine i8 to d2termine the ~haracteristics of the row use vectors of the JOIN relation and to bulld the row use vectors for a particular row use set in the JOIN
relation. During block 684, the routine EVALUATE ROW
USE VECTORS is called. The purpo6e of this routine is to determine the number of occurrences of a particular value which particlpates in the JOIN operation. A more f~tA ~ discussion of this routlne will be presented shortly with reference to FIG. 22D. Then, during block 686 the PRODUCTS routine (FIG. 22E) is called. The PRODUCTS routine calculates a 6eries of product terms which characteri~e the formation of bit patterns in each of the row use veators of a particular row use set of the JOIN relation. A more detailed discussion of this routine will be presented with reference to FIG. 22E.
Then, during block 688 the NUMS routine (FIG. 27F) is called. The NUMS routine determines the number of times a particular bit pattern repeats it6elf in a row use vector of the JOIN relation. A more detailed discussion will be shortly presented with reference to FIG. 22F.
~s~ n~ that all the calculations for det~ nin~ the characteristic of a row use set have occurred, processing continues at block 690 during which the first input column "j" (where "j" is set equal to l) is obtained. Then, in block 692, the row use set associated with the first input column ~ is obtained.
Then, during block 694, the GENE~ATE BIT ST3~ING
routine (FIG. 22G) is called for construoting the row 2s use veotors associated with a particular value in the row use set of the JOIN relation. The GENERATE BIT
STRING routine (FIG. 22G) evaluates the calculations of the PRODUCTS (FIG. 22E) and NUMS (FIG. 22F) routines to determine the characteristics o~ the bit patterns in the 3 o row use vectors and constructs these bit patterns in the row use vectors of the JOIN relation. Then, during block 696, the RPU 22 determines if there are any more input columns which need to be proces6e~. Assuming that there are still other input columns to be processed, block 698 increment6 the variable "; " by l. Prooessing continues at blocks 692, 694 and 696 until all of the input columns have been processed. Assuming that there are no more input columns for processing, block 700 is entered to calculate a new value for the variable "START
ROW. " rqore particularly, the equation START ROW = START
ROW + PRODS (1) is determined. The new value for START
ROW will indicate the starting position of the bit pattern in the row use vector for the next value i of the resultant bit vector 2~. Processing returns at block 702 to the calling program, the BUILD ROW USE 8ETS
routine, FIG. 22B.
FIG. 22D is a flow diagram of the EVALUATE ROW USE
VECTORS routine, discussed below in more detail. As stated above, the purpo6e of this routlne is to determine the number o~ ~ccuLL.2l~ces of a particular value which participates in the JOIN operation. During block 704, the RPU 22 selects the first column and obtains the row use set of the column. Then, during block 705, the RPU 22 obtains the row use vector of the ~0 RUS (~ ) which corresponds to the unique value i of the bit vector x. The row use vector i6 reerred to as V~.
During block 706, the number of binary bits set to "1"
in the row use vector V~ is determined. The number o~
binary bits set to "1" in Vj corresponds to the number of occurrences of the particular value i in the current column over which the JOIN is performed. The number of occurrences calculated for this particular row use vector V~ is placed in the variable C~. C~ is used by the PRODUCTS routine (FIG. 22E), to be discus6ed. Then, during block 707, the RPU 22 (1~t~ ne~ whether there are any more input columns. Assuming that there are still more input row llse vectors, processing continues at block 709, durin~ which the variable ~ is incremented by 1. Processing continues at block 705, 706 and 707 until all of the input columns are processed. Assuming 1338GOl that all of the input columns have been processed, then during block 711 processing returns to the calling program, the CONSTRUCT JOIN ROW USE VECTORS, FIG. 22C
(AT BLOC~C 636).
FIG. 22E, i6 a flow diagr2m for the routine PRODUCTS. As stated earlier, the purpose of this routine is for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors of the JOIN relation. During block 710, an array called PRODS is set equal to the series PRoDs s ~ (l,C1 * C2 * C3 * * Cn), ( 2 , C2 * C3 * * Cn), (3,C3 * * Cn), (N - 1, Cn - 1 * Cn), (N, Cn), (N + 1, 1) }
where C; is equal to the number of o~ ~uL~ .ces of value i in column j participating in the JOIN operation. For example, suppose the JOIN operation is performed over the CITY columns for two relations and the value of London is found to be present in both columns, Assume that the CITY column in the first relation contains two occurrences of the value London and the CITY column in the second relation contains three occurrences of the value London. Then C1 is set equal to 2 and C2 is set equal to 3. Therefore, PRODS(l) is equal to C1 * C2, which is equal to 6. This number is used by the GBNERATE BIT STRING routine (FIG. 22G) to determine the characteri atics of the bit patterns in the row use vectors of the JOIN relation. When processing block 710 ls completed, processing continues at block 712, to return processing to~ the calling program or the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
Referring to FIG. 22F, a detailed de$cription of the NUMS routine is now discussed. The NUMS routine is .
9, for do~rm;n;n~ the number of times a particular bit pattern repeats ltself in a row use vector of the JOIN
relation. Specifically, during block 716 the followlng series is calculated NUMS = (1, PRODS(l)/PRODS(1)) (2, PRODS (l)/PRODS (2) ) (3, PRODS (1)/PRODS (3) ) (4, PRODS (1)/PRODS (4) ) (N, PRODS ( 1) /PRODS (N) ) .
Then, during block 718, proceGsing returns to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C).
Referring now to FIG. 22G, a detailed description of the GENERATE BIT STRING routine is now ~ ~ Gc~l~G~d. As stated above, this routine determines the characteristics oi a bit pattern in a ~OIN row ufie vector associated with a particular value i. An OFFSET
value is determined for specifying the number of binary "O" bits in the row use vector ahead of the first binary "l" bit in addition to the zero-bits specified by START-ROW. During block 722 the offset value is equal to an initial value of zero.
During block 724, RPU 22 obtains the input bit vector associated with the variable V~. During block 726, RPU 22 obtains the first binary "l" bit in the Vj bit vector. At block 728, a variable K is set equal to the ordinal position of the selected (726, FIG. 22G) bit o~ Vj. Processing continues at block 730, during which the Create output vector "W" is generated. The output vector "~ " at a position in the row use set corresponds to the bit position k, in the entity select vector (the destination entity select vector corr~Gr~n~l ~ n~ to column j ) . The characteristics of the output bit vector W are determined by calculating NUMS (~ ), PRODS (~ + 1) and PRODS (~ UMS (~ ) indicates the number of repetitions 13386û~
o~ a bit pattern having PRODS (~ + 1) "1" bits.
PRODS (; ) indicates the total number of blts in the bit pattern assoclated with the output bit vector W. The po6ition of the first bit set to 1 in the output bit vector is determined by the ecuation: POSITION = START
RON + OFFSET; where START ROW is the first bit position of the bit 6tring indicating a particular value and OFFSET specifies the number of binary bits set to "O" in the row use vector before the series of bits set to "1"
occurs.
Processing continues in block 732, during whlch the RPU 22 determines if there are any more bits set to "1"
1 in the bit vector Vj. Assuming that there are more " I binary bits set to "1" in the bit vector V;, processing continues at block 736 during which OFFSET is set to OFF8ET f PRODS (~ ~ 1). Then, during block 734, the next bLt of the bit vector V~ is obtained. Processing continues at blocks 728, 730, 732, 734, 736, until all of the binary "1" bits in the bit vector V~ have been 2 0 evaluated and output bit vectors W have been built .
When all the binary bits ln the bit vector V~ have been ' evaluated, processing return6 to the calling program ; ~ during block 738.
3. De~ ed r le For ~ , u~ n~ a Bi n;~y Re~rc~:~ntatinn gf th~ JoIN
Relatic,n Referring to FIGS. 19, 22A, 22B, 22C, 22D, 22E, 22F, 22G, 23A, 23B, and 23C, a detailed example for constructing a binary representation of the JOIN
relation a~ shown in FIG. 19 i8 now described. As discussed earlier, FIG.~ 19 depicts a JOIN operation for the following ~uery:
SE~ECT S.ID~,S.STATUS,S.CITY,P.ID#
FROM S,P
.
13386~
"
WHERE S . CITY=P. CITY;
The result of thls JOIN operatLon for columns CITY, SUPPLIERS ID#s, STATUS and PART ID#6 is shown at 628 of FIG. l9. The binary representation for the new JOIN
relation is shown at 604 and 606. ~ore particularly, the SUPPLIERS relation portion of the JOIN relation 628 is shown at 604 and the PARTS relation portion of the JOIN relation is shown at 606. The purpose of this discussion is to oonstruct the binary representation of the JOIN relation for the ~uery above.
Fig. 23 is a Results Table depicting the JOIN
operation performed on the SUPPLIERS relation and the , ~ ~ PARTS relation for the specific query above. Each row of the Results Table, indicated ~t 850-878, depicts a different step of the creation of the binary representation of the JOIN relation, as shown by the flow diagram of Fig. 22. There are ten oolumns in the Results Table of FIG. 23. From left to right, the first column i5 the value set over which the JOIN operaticn is performed. The second column is the entity select vector associated with the column of the first relation (or SUPPLIER relation) over which the JOIN operation is performed. The third column is the entity select vector for the column in the second relation (or PARTS
relation) over which the JOIN operation is performed.
The fourth column is the resultant bit vector generated by Boolean ANDing the entity select vectors for the two columns over which the JOIN operation is being performed. The fifth column is the selected input row use vectors of the SUPPLIERS relation. The sixth column is the selec~ed input row use vectors for the PARTS relation. The seventh column is the entity select vectcr l, corresponding to the first cclumn of the JOIN
relaticn which also refers to the SUPPLIERS relaticn portion of the JOIN relation. The eighth column i6 the entity select vector 2 co~Le:.~ollding to the second column of the JOIN relation which also refers to the PARTS relation portion of the JOIN relation. The ninth column is the fir6t JOIN row use set corresponding to entity select vector 1, and the tenth column is the second JOIN row use set corresponding to entity select vector 2.
Referring to FIG. 22A, a more detailed di30ussion for performing the EQUIJOIN operation is now discussed.
D~ring block 652, the RPU 22 obtains all entity select vectors for the SUPPLIERS relation and the PARTS
rclation, for the columns which contain values from the values set over which the JOIN operation is performed (e.g., the CITY value set cu~ ~ e~ n~ to the Where clause, S.city = P.city). Thus the entity select vectors for the CITY oolumns of the SUPPLIERS relation (63, FIG. 19) and the PARTS relation (65, FIG. 19) are obtained (~350, FIG. 23A). The second column of the ; 20 Results Table of FIG. Z3 depicts the entity select vector for the CITY column of the SUPPLIERS relation, and the third column of the Results Table of FIG . 2 3 depicts the entity select vector associated with the CITY column of the PARTS relation. Then during block 654, the RPU 22 performs a Boolean AND operation on the entity 3elect vector6 for the SUPP~IERS and PARTS
relatLons to obtain a resultant binary bit veotor ~E
(852, FIG. 23A). The resultant binary bit vector, shown in the thi- d column of Fig. 23A, indicates which values 3 0 of the CITY value set are common to the entity select veators for the CITY columns of the SUPPLIERS and PARTS
relation6. During block 658, the RPU 22 determineG
which binary bits of each entity 6elect vector corre6pond to the binary bits set to "1" in the 35 resultant vector x. For the entity select vector 13386Ql ., aoLL_D~ul-ding to the CITY column in the SUPPLIERS
relation, the row use vectors ~oLL~:D~ol~ding to the valUes London ~L) and Paris (P) are obtained (854, PIG.
23A). Additionally, the row use vectors corresponding to the values London and Paris in the CITY column for the PARTS relation are obtained (856, FIG. 23). Then, during block 660, for each row use sct assoclated with an entLty select vector (i.e., 854 and 856 of FIG. 23A), the Boolean OR operation is performed. The resultant vectors of the Booiean OR operations are entity select vectors for the output or JOIN row use sets of the JOIN
relation being formed. Speciflcally, the entlty select vector 1, for the Boolean OR of the bit vectors for London and Paris is shown at 858 of FIG. 23A. The entity select vector 2 for the Boolean OR of the bit vectors for London and Paris is shown at 860 of FIG.
23A. In block 664, the routine BUILD ROW USE SETS is called for constructing the row use sets associated with the JOIN entity select vectors.
Referring to FIG. 22B, the BUILD ROW USE SETS
routine for constructing the row use sets (i.e., 604 and 606 of FIG. 19) corrP~pnn-lin~ to the columns of the resultant JOIN relation is shown.
During block 672 (FIG. 22B), the RPU 22 selects the first value represented to be present by the first occurrence of a bit set to "1" in the resultant bit vector ~. The first unique value of the resultant relation which corresponds to a bit set to "1" is London. During ~lock 673, the variable START-ROW is set equal to zero. In block 674, the CONSTRUCT JOIN-ROW USE
VECTORS routine is called. Processing contLnues at block 684 of the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
During block 684, the EVALUATE ROW USE VECTORS
35 routine (FIG. 22D) is called. Referring to FIG. 22D at 1~38601 " .
block 704, the RPU 22 obtalns the first row use set associated with the S . CITY column over which the JOIN
operation is performed. Particularly, the RPU 22 obtains the row use set co~Le~Gnding to the S.CITY
column of the Supplier relation. The variable ~
corresponding to this particular column i8 set equal to 1. Processing continues in block 705, during which the row use vector of the row use set for 5 . CITY column is obtained for the unique value London which corresponds to the first binary bit set to "1" in the resultant binary bit vector ~. The variable Vl 18 ~et equal to the row use vector "10010" (854, FIG. 23A). Then, during block 706, the RPU 22 sets the variable Cl equal to 2, the number of "1" bits in the bit vector V1.
During block 707, the RPU 22 determines lf there are any more input columns to be processed. There ls a 6econd input column corresponding to the PARTS relation. Thus, processing continues at block 709, during which the variable ~ is incremented by 1 (~ = 2). Processing continues at block ~05, during which the row use vector "100101" (856, F~G. 23B), corresponding to London in th2 row use set for the P.City column, is obtained. V2 is set equal to this vector. Then, during block 706, the RPU 22 sets the variable C2 equal to 3, the number of "1" bits in the bit vector V2. During block 707, the RPU 22 determines that there are no more input columns over which the JOIN operation is to be performed. Thus, processing returns during block 711 of FIG., 22D to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 686.
During block 686, the PRODUCTS routine (FIG. 22E) i~ called. Referring to FIG. 22E during block 710, a series of numerical products which characterize the formation of the bit patterns in each of the row use - - -1~3~6~1 ., vectors of the JOIN row use set i8 constructed.
Sp~c~f~lly, the function called PRODS i3:
PROD5 -- [ (1, 6) (2, 3) (3, 1) ] .
Processing returns during block 712 to the CONSTRUCT JOIN ROW USE VECTORS routlne (FIG. 22C) at block 68>3. During block 688, the NUMS routine (FIG.
22F) i8 called to determine the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. The NUMS determination is a series of divisions for calculating the number of repetitionG of a particular pattern of "1" bits ln a row use vector of the JOIN relation. SrP~ c~lly, NUMS is equal to the following series:
NUMS 3 (1~ 1) (2, 2) .
15- Processing returns during block 718 to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at bl ock 6 9 0 .
During block 690, the RPU 22 sets; -- 1 and selects the first input column, S.CITY, over which the JOIN
operation is being performed. Processing continues in block 692, during which the row use set associated with the S.CITY column is obtained. Then, in blook 694, the GEN~RATE BIT STRING routine (FIG. 22G) is called.
Referring to FIG. 22G at block 722, the varlable OFFSET is set equal to an inltial value of zero. During block 724, the RPU 22 obtains the bit vector associated with the variable V where ~ = 1. Then, during block 726, the RPU 22 performs a count function on the first bit set to "1" in the binary bit vector Vl. Then, at block 728, the variable K is set equal to the ordinal position of the first binary bit set to lllll in Vl; the ordinal position is 1. ~ Processing continues at block 730, during which the create output vector W is generated. The output vector W corresponds to the bit 35 position 1 in the entity select vector 1 (the 1338GOl ., Destination or ;rOIN entity select vector corresponding to S . CITY column) . The characteristicE of the output bit vector W are determined by calculating NUMS (1), PRODS (2) and PRODS (1). NUMS (l) indicates the number o~ repetitions of the bit string of the output vector W
corr~cp~ n~ to the first occurrence of London in the bit vector V1. PRODS (2~ lndicates that there are three "l" bits in the bit pattern of the output vector W.
PRODS (1) indicates the total number of bits, six, in the bit pattern as60ciated with the output bit vector W.
The position of the first bit set to "1l' in the output bit vector W is determined by the eguation:
POSITION = START ROW + OFFSET: where START ROW - zero and OFFSET = zero. Thu6, the position of the first binary bit set to "l" in the output bit vector W is zero. The output vector W is "lllOOO" (862, FIG. 23).
Processing continues in block 732, during which the RPU
22 determines that there is a second "l" bit in the bit vector Vl. Thus, processing continues at block 736, 2 o ~ during which the VARIABLE OFFSET is determined to be 3 (i.e., OFFSET = OFFSET + PRODS (2) ) . Then, in block 734, the next bit of Vl is obtained. The po6ition of the selected bit of V1 is 4. Thus, during block 728, K
is set to the 4. During block 730, the characteristics of the output vector W associAted with the bit poGition 4 in Vl i6 determined by calculating NUMS (l), PRODS (2) ~nd PRODS (1) ) . The calculation for these formulas was determined above. The position of the first bit set to "1~' in the output vector W is determined to be 3 3 0 (position - O + 3 ) . Thus, the resultant output vector W
is "OOolll" (864, FIG. 23). Processing continues in block 732, during which~the RPU 22 determines that there are no more "1" bits in Vl to evaluate. ThUs, processing returns during block 738 to the JOIN ROW USE
35 VECTOR routine (FIG. 22C) at block 696.
-~338601 Dur1ng block 696 (FIG. 22C) the RPU 22 determines that there i5 another column, P. CITY, over which the JOIN operatlon ifi performed. Processing continues at bloclc 698, during which the variable ~ is incremented by 1 (~ - 2)- Then in block 692, the row use set associated with the column, P.CITY, i8 obtained.
Processing continues ln block 694, during which the GENERATB BIT STRING routine (FIG. 22G) is called.
During block 722 (FIG. 22G) the variable OFFSBT is set equal to the initial value of zero. During block 724, the bit vector V2 is obtained. This is the row Use vector corresponding to London in the P. CITY row use set. In block 726, the first bit set to "1" of the bit vector V2 is selected. During block 728, the position of this selected bit of V2 i6 determined to be 1.
During block 730, the output vector W corresponding to the bit position 1 of the entity select vector 2 (Destination entity select vector corresponding to P. CITY column) is determined . The characteristics of the output vector W are determined by calculating NUMS
(2), PRODS (3) and PRODS (2). NUMS (2) is the number of repetitions of the generated bit string which is egual to 2. The bit string consists of PRODS (2) bits which is equal to three bits. The number of "1" bits in the bit pattern is given by PRODS (3) which is equal to one "1" bit. This series commencing with the first bit set to "1" begins at position zero as indicated by the START
ROW + OFFSET, where ST~RT ROW and OFFSET are both zero.
Thus, the bit pattern of output vector W is "100100"
(866, FIG. 23). Processing continues at block 732, during which the RPU 22~ determines that there are still "1" bits in the bit vector V2 to be evaluated.
Specifically, bits at positions 4 and 6 of the bit vector V2 need to be evaluated. Thus, processing continues at block 736, during which the variable OFFSET
.
is determined to be equal to 1. Processing contlnues at block 734, during which the next bit of the bit vector V2 is obtained and the position of the newly selected bit of bit vector V2 is determined to be 4. This value is set equal to variable K during block 728. During block ~30, the output bit vector W corre6ponding to the bit position 4 of the entity select vector 2 ~destination entity select vector corresponding to column P.CITY) is determined. Specifically, the output bit vector W is determined by calculating NUMS (2), PRODS (3), and PRODS (2). The calculations for these formulas have been determined above and the series of bits corresponding to PRODS (2) begins at the first bit position correspondlng to an offset of 1 (POSITION
START ROW + OFFSET; where STA~T ROW = zero and OFFSET =
1). Thus, the output bit vector W is "010010" (868, FIG. 23). Processing continues at block 732, during which the RPU 22 determines that there is still one "1"
bit in the bit vector V2 to be evaluated. Thus, processing continues at block 736, during which offset iB incremented by the value PRODS (3). PRODS (3) is equal to 1 and thus the value of offset is equal to 2.
During block 734 the next bit of the bit vector V2 is obtaLned and proce6sing continues at block 728. The position of the newly selected bit of the bit vector V2 is 6 and the variable K is set to 6. During block 730, the output vector W corresponding to bit position 6 in the entity select vector 2 is determined. The characteristics o the output vector W have been determined above and the series of bits PRODS (2) begins at the second position which corresponds to an offset equal to 2. Thus, the output vector W is "001001" (870, FIG. 23). Proces6ing continues at block 732, during which the RPU 22 determlnes that there are no more "1"
35 bits to evaluate in the bit vector V2. Thus, processing -continues at block 73~3, during which processing returns to the CONSTRUCT JOIN RoW USE VECTOR routine (FIG. 22C) at block 696.
During block 696, the RPU 22 tlPt~ n~: that there are no more lnput columns for processlng. Thus, processing continues at block 700, during which the START-ROW variable is calculated to be 6: START-ROW plus PRODS tl) = 6. Processing returns at block 702 to block 676 of the BUILD ROW USE SETS routlne, FIG. 22B. During block 676, the RPU 22 determines that there is one more unique value indicated to be present ln the resultant bit vector ~. The bit set to "1" in the resUltant bit vector corresponds to the unique value Paris. During block 678, the unique value i = 2 for Paris is obtained.
Then, durin~ block 674, the CONSTRUCT JOIN ROW USE
VECTOR~3 routine (FIG. 22C) is called. In a similar fashion, the row use vectors corresponding to the value PARIS are formed.
2 0 4 . Construct ~ nq a BINARy ~ h,~ ATTON of a t~RFATER TNbN JOJN
Referring to FIGS. 2 and 24, a detailed discussion for performing the GREAT~R THAN JOIN operation is now discussed. Specifically, in the JOIN operation, the "where" clause contains the predicate "greater than"
( > ). For example, referring to FIG. 2, suppose that a usQr wishes to combine the SUPPLIERS relation 63 and the PARTS relation 65 such that the SUPPLIER CITY follows the PARTS CITY in alphabetical order. The command for 3 o this query ls:
SELECT S . *, P. *
FRO~q S, P
WHERE S . CITY > P . CITY
::
.
133o601 The result of this query on the Supplier5 relation 63 and thQ Parts relation 65 (FIG. 2) i8 the following relation:
5~A
__ __ _ _ _ S# SNA~qE STATUS S.C~ P# PNAME COLOR WEIGHT P.CFrY
.
S2 Jones 10 Paris Pl Nut R~d 12 London S2 Jones 10 Paris P4 Screw Red 14 London 10 S2 Jones 10 Paris P6 Cog Red 19 Lcndon 53 Blake 30 Paris P1 Nut Red 12 london S3 Blake 30 Paris P4 Screw Red 14 London S3 Blake 3~ Paris P6 Ccg Red 19 L~don As in the case of the EQUIJOIN examples discussed above, the command for the GREATER THAN JOIN operation requires that the data oome rom two relations, namely, the Suppliers relation (63, FIG. 2), and the Parts relation (65, FIG. 2). As shown above, both relations 2 are named in the FROM clause, and the express connection between the tables is the City column in the WHERE
clause. The result o the GREATER THAN JOIN 18 for dlsplaying all the columns of the Suppliers relation with all the columns of the Parts relation. The process or creating the binary representation of the GREATER
THAN JOIN relation is the same as for the EQUIJOIN
operation~ Namely, the BUIL~ ROW USE set ~FIG. 22B) ~8 called to physically construct the columns of the ;JOIN
relation. The ma~or difference in performing a GREATER
3 TE~AN JOIN rather than an EQUIJOIN is inserting and preparing the data prior to the construction of the row use sets for the ,JOIN relation. Speci~ically, FIG. 24 is a flow diagram which depicts the steps for evaluating and preparing the data prior to performing the BUILD ROW
~338601 --1 os--USE SETS routine (FIG. 22B), which construct3 the bLnary representation of the JOIN relation.
Note ln Table A for the GREATER THAN JOIN, that the CLty column of the Suppliers portion of the relation contalns the value Paris, which is in all instances lexically greater than the value London in the Parts relation. Whereas, the value for London is not in the City column for the Suppliers portion of the JOIN
relation because London i5 not greater than the lexical values for Rome, Paris or London. Also note that the GREATER THAN JOIN operation can be performed for any particular characteristic. The results table above relied on the lexical ordering of the cities. However, any other kind of ordering could have been used to perform a comparison o greater than, for example, numerical ordering of values, etc.
Referring to FIG. 24, a generic routine for performing the GREATER THAN JOIN operation is shown.
Specifically, the JOIN operation is constructed for all the values of column A (i.e. all the value6 of the City column of the Suppliers relation) which are greater than all the values of column B ( i . e . all the values of the City column for the Parts relation~. It should be noted that this routine could be modLfied for any other of the predlcate JOINs such as LESS THAN, LESS THAN OR EQUAL
TO, and GREATER THAN OR EQUAL TO.
During block 1004, a sort operation is performed on all the values of column A to order the values from the greatest value to the least greatest value. Then, in block 1006, a binary bit vector n is generated. The binary bit vector ~ contains a number of binary bits equal to the number of~ values in the value set over which the GREATER THAN JOIN operation is performed. The binary bit vector ~2 has its binary bits set to "1" at all the ordinal po6itions corresponding to the values of 133~601 --11 o--the value set, which are less than the value y of column A.
During block 1008, the Boolean AND operatLon i6 performed on the binary bit vector ~2 with the entity select vector corresponding to column B. The resultant vector contains binary bits set to "1" at the ordinal positions corresponding to the values of column B, which are less than the value y of column A. In block 1010, the resultant vector is checked to see if it is "0". If the resultant vector is "O", then processing returns to the calling routine during block 1012 because none of the values of B is less than the value of A. Therefore, the GREATER THAN JoIN operation cannot be satisfied, and ;; thus, the user is noti~ied. As:s~ n~ that the resultant vector is not "0", then processing continues at block 1014. During block 1014, the ordinal positions of the binary bit set to "1" in the result vector calculated in block 1008 are determined. The ordinal positions correspond to the ordinal positions of the entity seleot vector, which correspond to the values of column B, which are less than the values of column A. Then during block 1016, all of the row use vectors associated with the ordinal positions of column B, which are associated with the binary bits set to "1" in the result vector, are obtalnad. Additionally, the row use vector for value y of column A is also obtained. During block 1018, the row use vectors associated with column B are placed into a temporary area, along with the row use vector for column A. During block 1020, the Boolean OR
operation is performed between the row use vectors associated with column B to generate the entity select vector for the column B portion of the JOIN relation.
With the entity select vector for the column A portion of the JOIN relation is a row use vector associated with 3s he v~lu~ y o~ colu~n a. stated di~r~nely~ the el~t1ey 13~8601 select vector for the input column B depict5 all of the values of B, which are less than the value y of column A. Thus, the BUILD ROW USE SETS routine (FIG. 22B) will - construct the first 6everal rows Or the ~OIN relation, which will consist of the value y of column A and all of the values of column B, which are less value y. Then, during block 1024, a determination is made o~ whether any more values of column A need to be evaluated. If there are no more values to be evaluated by the GREATER
o THAN JOIN operation ~FIG. 24), then processing returns to the calling program at 1026. However, if there are more values to be evaluated by the routine, then processing continues at block 1028. During block 1028, the next value y of column A, which is less than the last value of column A evaluated, is obtained.
F. DISPrAY/~ ~N~ J~;L For JOIr~ Operat~on Typically, the JOIN operation is performed over two relations and a resultant JOIN relation is generated.
Recall that the resultant JOIN relation (FIGS. 19, 20 2nd 21) was a binary representation. The system can automatically perform the DISPLAY/REc~1Nb1~uul FOR A JOIN
operation routine (FIG. 25A) to generate the particular columns of the JOIN reLation for the user to ascertain.
The DISPLAY/RECONSTRUCT FOR JOIN operation tFIG.
25A) will be referenced by the PROJECT operation at block 566 (FIG. 17A), instead of the DISP~Y/RECONSTRUCT
routine (FIG. 17B). Referring now to FIG. 25A, a more detailed discussion for reconstructing and displaying the columns of a JOIN r~lation is now tl~C~ QC~ ore particularly, during block 1102, a row use vector colL~ cllding to the rcws of the JOIN relation to be constructed is created. Then, during block 1104, an index vector assocIated with the JOIN relation is 3 5 created . The index vector is a binary bit ve~tor in 1338~01 which each binary bit corresponds to a row use vector of the JOIN row use set. Each bit indicates whether the unique value associated with the row use vector exists in the JOIN column, which is to be displayed. If a binary bit in the index vector is set to "0", the unique value associated with the row use vector does not exist in the JOIN column to be displayed. Whereas, if the binary bit ls 6et to "1" in the Lndex vector, then the unique value associated with the row use vector exists one or more times in a column. At this time, the index vector contains a quantity of binary bits equal to the number o~ row use vectors of the JOIN row use set.
During block 1106, the row use set for the JOIN column to be displayed is obtained. During block 1108, the i~irst row use veator associated with the JOIN column row use set is obtained. 2qore particularly, the first row use vector, which is currently stored in memory 18 tFIG.
lA), is transferred via bus 48 to the BBVP 14 (FIG. lA).
Then, during block 1108, the RPU 22 via BBVP 14 performs a Boolean AND operation on a row use vector with the row select vector. As discussed earlier in Part V. (D. ), the row select vector is a binary bit vector, and each binary bit corresponds to a row of the column or columns to be displayed by the system. A binary bit set to "1"
in the row select vector indicates that the - corresponding row needs to be displayed by the system.
The result of the AND operation is a blnary bit vector z, which depicts the rows of the column which contain a particular value associated with the current row use vector. The results of the AND operation are sent via bus 48 back to memory 18 (FIG. lA) for future processing. Then, in biock 1110, the RPU 22 determines whether the resultant vector i~ "0". ~ore particularly, the resultant vector Z contains all binary bits set to "0". If the binary bit vector is "0", then during block 1112, a "0" is placed in the first binary bit position of the index vector to indicate that the unique value associated with the row use vector does not exist in the JOIN column to be displayed. However, if the resultant bit vector Z is not equal to "0", then processing continue6 to block 1114. The RPU 22 via BBVP 14 sets the binary bit of the index vector, which is associated with the current row use vector, to "1". In block 1116, the resultant bit vector Z is stored in memory 18. The resultant bit vector Z is identified for this particular row use vector presently being processed, and it will later be used in the reconstruction process.
During bLock 1118, the RPU 22 via BBVP 14 (FIG. lA) clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z. l'he purpose o~ this step is to "shortcut" the processing o~ the row use vectors in the row use ~et.
Stated di~ferently, when the ,JOIN row select vector is cleared, all of the values in the rows to be displayed for the column can be determined. During block 1120, the RPU-22 (FIG. lP.) determines whether the JOIN row select vector has had all its binary bits set to "0".
If the JOIN row select vector contains only binary bits set to "0", then processing continues at blook 1126.
However, if all of the binary bits in the row use vector are not set to "o", then processing continues at block 1122. During block 1122, the RPU 22 (FIG. lA) gets the next JOIN row use vector in the JOIN row use set currently being processed. During block 1124, the RPU
22 (FIG. lA~ determlnes whether the end of the JOIN row use vector has been reached. If the end of the JOIN row use vector has been re~ched, then processing continues at block 1126. However, assuming that the end of the row use vector has not been reached, then processing 3s oor~in~le~ ~t l:lo~ 1108, 1110, 1114, 116, 1118, 1120, 13386~1 "
1122 and 1124 until all of the JOIN row use vector3 of the JOIN row use set have been completely processed.
Assuming that all of the JOIN row Use vectors in the JOIN row use set have been processed or that the JOIN row 6elect vector contains only binary blts set to "O", then processing continues at block 1126. During block 1126, the RPU 22 (FIG. lA) determines whether the present column for display references a value set or a relation. The DISPLAY/RECONSTRUCT routine for the JOIN
operation (FIG. 25A~ starts with the row use set of the JOIN column and then references the relation corresponding to the row use sQt in the JOIN relation.
Thus, the column to be dlsplayed is a JOIN column, and it does not reference a value set. Therefore, processing continues at block 1112, during which the REFERENCE RELATIO~ routine (FIG. 25B) is called.
During block 1136 of the REFERENCE RF~T.2lrrTO~ routine (FIG. 25B), the RPU 22 obtains the entity select vector as60ciated with the JOIN column. The entity select vector for the JOIN column is used as a row select vector for obtaining values from the referenced relation. Proces6ing returns to the DISPLAY/R~CONSTRUCT
FOR JOIN operation routine (FIG. 25A) at block 1104.
During block 1104, an index vector for the referenced relation is created. More particularly, the index vector contains a number of binary bits ls equal to the number of row use vectors of the reference relation. Then, during 1106, the RPU 22 (FIG. lA) obtains ~ row use set for a column of the relation to be displayed. The first row use vector ~ssociated with the row use set iG obtained. During block 1108, the RPU 22 instructs the BBVP 14 to perform a Boolean AND operation on the row use veotor obtained in block 1106 with the row select vector obtained during the R~ ;~;Nc~ T.Z~'l'Tt)N
routine (FIG. 6B). The reGult of the AND operation is a vector Z which displays the rows of the oolumn which contain a particular value associated with the row use vector. Then, during block 1110, the RPU 22 ~FIG. lA) determines whether the resultant vector Z i8 1lOl'. If the resultant vector Z is "0", then during block 1112, a "0" is 6et in the first binary bit position of the index vector. Processing continues at block 1122, during which the next row use vector associated with the column in the reference relation is obtained. R,otl~rn~n~ to block 1110 ~ if the result of the Boolean operatlon performed in block 1108 is "0", then processing continues at block 1114. During block 1114, the RPU 22 vla BBVP 14 (FIG. lA) sets the binary bit in the Lndex vector, which is associated with the current row use vector to "1". During block 1116, the resultant bit vector Z is stored in memory 18 (FIG. lA).
During block 1118, the RPU 22 instructs the BBVP 14 (FIG. lA) to clear the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z. Again the purpose of this step ls to 6hortcut the processing of the row use vectors of the row use set associated with the referenced relation.
During block 1120, the RPU 22 (FIG. lA) instructs the BBVP 14 to determine whether the row select vector has been completely cleared. stated di~ferently, whether all the binary bits have been set to "0". The row select vector contains only binary bits set to "o", and processing continues at decision block 1126. Otherwi6e, assuming that all of the binary bits in the row use 30, vector are not set to "0", then proces3ing continues at block 1122. During block 1122, the RPU 22 (FIG. lA) acquires the next row ~se vector in the row use set of the referenced relation currently being processed.
Then, during block 1124, the RPU 22 instructs the BBVP
14 to determine whether the end of the row use set has 1338~0~
been reached. If the end of the row use set has been reached, then processlng continues ~t block 1126.
Otherwise, assuming that the end of the row u6e set has not been reached, then processing continues at block 1108, 1110, 1114, 1116, 1118, 1120, 1122, and 1124 until all the row use vectors of the row use set of the referenced relation have been processed.
Assuming that all of the row use vectors of the row use set have been proces6ed, or in the event that the row select vector corresponding to the referenced relation contains only binary bits set to "0", then processing continues at block 1126. During block 1126, the RPU 22 (FIG. lA) dPtpr~npc whether the column currently being evaluated references a relatlon or a value set. The currently column presently being evaluated references a value set, and thus, processing continues at block 1130. During block 1130, the REFERENCE VALUE SET routine (FIG. 25C) is called.
The REFERENCE VALUE SET routine (FIG. 25C~ obtains the values from the value set and places the values ln the proper rows of the JOIN relation column. More particularly, referring to block 1142, the R~?U 22 instructs BBVP 14 (FIG. lA) to determine the ordinal po~;itions of the binary bits set to "1" in the index vector for the referenced column. Each binary bit set to "1" indicates which row use vectors reference unicue values to be displayed in the referenced column. In addition, for each row use vector which has a corresponding binary bit set to "1" in the lndex veotor, the ordinal position of the corresponding binary bit set to "1" in the entity select vector associated with the value set is de~ nP~ This process is performed for each of the row use vectors which has a coLLei.~onding binary bit set to "1" in the index vector for the referenced column. Then during block 1144, the values associated with the ordinal positions obtaLned in the block 1142 are retrieved from the referenced value set.
During block 1146, the RPU 22 (FIG. lA) finds the appropriate~location in the index vector associated with each of the values retrieved in block 1144. Then, during block 1148, the appropriate resultant Z vector associated with each of the row use vectors is retrieved. The resultant Z vector indicates which rows of the referenced column contain the unigue value associated with the row use vector. Then, during block 1150, the RPU 22 (FIG. lA) determines whether any more values are left for processing. Assuming that there are more values, then processing continues at blocks 1146 and 1148 until all the values from the value set have been placed in the proper rows of the referenced column.
Assuming that all of the values have been placed in the proper rows of the column, then processlng continues at block 1152. During block 1152, for each row of the referenced column, the corresponding JOIN row use set is determined. More particularly, the corresponding binary bit set to "l" in the lndex vector for the JOIN column is determined. I`hen, during block 1153, the approprlate vector Z temporarily stored at an earlier step is obtained. The resultant vector Z indicates the row positions of the JOIN column where the value from the referenced column should be placed. During block 1155, the RPU 22 determines whether any more values from the referenced column need to be processed. If more values need to be processed, then processing continues at 3 o blocks 1152, 1153 and 1155 until all the values of the re~erenced column have been processed. 7~ n~ that ~ll the values have Been processed, then processing returns to the DISP~AY/RECONSTRUCT FOR JOIN routine (FIG. 25A). During block 1132, processing returns to ~5 'che cmll~og progr~lm.
1. T~ s~nle of the DT!::PT.~Y/~ ,'O~ .J
Ol~eration FQr A JOrN R~ ion Referring to FIGS. 20, 25A, Z5B, 25C, 26A, 26B, 26C, 26D and 26E, a detailed example ~or performing the DISPLAY/RECONSTRIICT program on the JOIN column for Suppliers IDs in the ~OIN relatLon (604, FIG. 29) is now ~CGllCC~ ore particularly, it is assumed that only a binary representation of the JOIN column for Suppliers IDs exists in the RD~S 10. The binary representation of the JOIN relation is a result of the JOIN operation, or it may have been previously stored after processing of the BBVP 14 tFIG. lA). In either cace, the binary representation of the JOIN column exists ln memory 18 and now the user or applications program needs to display the actual values of the JOIN column. Although, this example is for reconstructing and di~playing only one column of the JOIN relation (604, FIG. 20), the inln~ columns of the JOIN relation could be reconstructed by sequentially performing the PROJECT
operation. Recall, that the PROJECT routine (FIG. 17A) calls the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS.
20A, 20B and 20C). Referring to FIGS. 21A, 21B, 21C, 21D and 21E, a results table for depicting the results of the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 25A, 25B and 25C) for reconstructing and supplying the Suppliers ID column of the JOIN relation (604, 626 of FIG. 20) is shown. Each row of the results table depicts a result of the routines shown in FIGS. 25A, 25B
~nd 25C. The results table is separated into 12 columns from left to right. The first column of the results table shows the row se'lect vector associated with the JOIN column to be displayed. The second column shows the row use set associated with the JOIN column to be displayed. The third column is the row use vector of 133860~
"
the row use set which is currently being proces6ed, and the fourth column is the index vector associated with the row use set corresoonding to the JOIN column. The fifth column is a resultant binary bit vector ~. The resultant binary bit vector Z is determined by ending a JOIN relation row use vector and the row select vector.
The sixth column deplcts the entity select vector associated with the referenced relation. The seventh column is an index vector associated with the row use set corresponding to the re~erenced relational column.
The eighth column depicts the row use set corresponding to the referenced relational column. The ninth column is for the index vector associated with the row use set for the referenced relational column. The tenth column is for the resultant bit vector Z. The resultant bit vector Z is determined by ending a row use vector of the row use set with the entity 6elect vector. The eleventh column is a depiction of the creation of the referenced relational column. The last column depicts the ~OIN
relational column, which is to be displayed.
i Referring to FIG. 25B, the referenced relation obtains the entity select vector (600, FIG. 20) associated with the J0rN column. The entity select vector is used as a row select vector for specifying the rows of the referenced relation which have values in the JOIN columr (1184, FIG. 26C). Then, during block 1138, processing returns to the DIS3~LAY/RECONSTRUCT FOR JOIN
routine (FIG. 25A) at block ll04.
Block 1104 creates an index vector for the row use sets corresponding to the Supplier ID column tll81, FIG.
20). During block 1106, the row use set associated with the Supplier ID column' is retrieved (1186, FIG. 26C).
Then, during bloc~c 1108, the first binary bit vector of the row use set (1186, FIG. 26C) is ANDed with the entity select vector to form a resultant vector Z (1192, 1338~01 ., FIG. 26C). During block 1110, the resultant vector is evaluated to determine if all the binary bit6 are set to "0". One bit in the re6ultant vector Z is set to "1", and thus, processing continues at block 1114. During block 1114, the binary bit in the index vector associated with the current row use vector is set to "1"
(1194, FIG. 26C). Then, during block 1116, the resultant vector Z associated with the current row use vector is stored in memory 18 (FIG. lA). During block 1118, the binary bit set to "1" in the row select vector that matched the binary bit set to "1" in the resultant vector Z are cleared (1196, FIG. 26D). Processing continues at block 1120, during which the row 6elect vector is evaluated to determine if all the binary bits are set to "0". All the binary bit6 of the row select vector are not "0" (1196, FIG. 26D), and thus, processing continues at block 1122. During block 1122, the next row use vector of the row set for the Supplier ID column is obtained (1198, FIG. 26D) . During block 1124, the P.PU 2Z (FIG. lA) determines if the end of the row use set has been reached. The end of the row use 6et has not been reached, and thus, processing continues at block 1108.
VI. E~TITY US~ ~7CctnrS
To this point, the use of binary bit vector6 in the setting of a relational database has been extensively discussed. The purpose of the binary bit vectors as described in the Binary Bit Vector Technology (Part III) section of this specification is for characterizing a subset of an ordered set. In con~rast, the pu~pose o~
an entity use vector ~is for defining a relationship between the elemen~s of two sets.
More particularly, referring to FIG. 27, sets 5 35 1234 and T 1236 define the set of all ordered pairs (S, 13386~1 T) such that 8 is an element of S and t is an element of T. A "mapping" exists from S 1234 to T 1236. Nore than one element of set S 1234 may map onto unique elements of T 1236; however, there will never exist more than one element of T into which maps the elements of S. In mathematical terminology, the set S 1234 is the "domain", and the set T 1236 is the "range". FIG. 27 shows the many-to-one mapping. Specifically, sl and 52 map to the value tl. The purpose of the entity use vector is to facilitate this relationship between ordinal positions of elements in one set to another set.
Stated differently, the entity use vector is a vector whose elements are values expressed ln ordinal positions of elemer.~s within another set.
1~ In the context of the current invention, the entity use vector constitutes a function (or a mapping) between either a value in a value set or a relational row in another or the same relation. The mapping may apply specifically ta a relational column, whether it be a single physical column or a physical column oonsisting of many conceptual columns. The implied ordinal position within the entlty use vector corresponds to a row number in the "domain". The as60ciated value of the element is the entity location (ordinal position of thQ
2~ value in the value set or row position of the relational row being referenced) of the "range".
Re~erring to FIGS . 4, 2 0, 2 8, and 2 9, a more detailed discussion regarding the entity use vector and its use in the relational database is now rl i qc~qsed.
Referring to FIG. 4, a binary representation of the Supplier relation of the Supplier and Parts relational database is shown. In this database, the mapping between the domains and the columns was achieved by entity select vecto~s and row use vector fiets . q he 3 5 add~d re~ture, entlty u~ vectore, hAve b~en Added to the Supplier relation as depicted in FIG. 4, as shown in FIG. 28. Note that the entity use vectors 1238, 1240, 1242 and 1244 correspond to the columns 168, 170, 172 and 174. Recall that the columns do not actually exist ln the RDMS 10; instead, the binary representations of the columns 260, 262, 264 and 266, respectively, are stored in memory 18 area of the RDMS. If the entity use vectors are employed into the relational database setting, an efficiency for det~rTnin~n~ the values afisociated with each row of a column in the relation is created. For example, referring to the entity use vector 12~, which corresponds to the city column 174 of the Supplier relation, note that in the first ordinal position 1237 of the entity use vector, the value 5 resides. Essentially, the value 5 maps the value in the first row of the city column 174 to the fifth row of the domain 166. Referring to the dotted line 1249, the mapping of the first row of the city column 174 to the fifth row of the value set is shown. Likewise, the fourth row of the clty column 174 is also mapped to the fifth row position of the value set 166. Thus, the first row and the fourth row of the city column 174, both map to the same row of the value set 166. In this way, the many-to-one mapping can be achieved via the entity use vector. Likewise, the second row and the third row of the city column 174 map to the same row of the value set 166 may be implied mapping at 1247.
Lastly, the fifth row of the city column 174 maps back to the first row of the value set 166. With all the elements in the entity use vector, the city column 174 can be easily reconstructed with the values of the value set 166. More particu~arly, the fLrst row of the city column 174 corresponds to the value 5 in the entity use vector at lZ37 which OLLe:~OlldS to the value London.
The second row of the city column 174 corresponds to the ., .
entity use vector value 8 which corresponds to the value Paris. Likewise, the third row of the city column 174 corresponds to the value 8 of the entity use vector and to the value Paris. The fourth row coLLt~ ol~ds to the value 5 in the entity use vector and to the value London. Lastly, the fifth row of the city column 174 cuLLe:~ollds to value 1 in the entity use veotor which corresponds to the value Athens. The entity use vector will be used specifically during the DISPLAY/KE~u~lblKu~l operations for the RDMS 10. The entity use vector facilitates the steps for det~rmin1n~ the values in the particular columns af a relation and hence facilitates the DISPLAY/KECONSTRUCT process.
Referring to FIG. 29, a depiction of how entity use vectors may be used in the setting of a JOIN relation is 6hown. More particularly, the entity use vectors 1238, 1240, 1242, 1244, and 1246 have been added to the binary representation of the JOIN relation of FIG. 20. The entity use vectors 1238, 1240, 1242 and 1244 are identical to the entity use vectors of the Supplier relation in FIG. 28. The new entity use vector o~ FIG.
29 is at 1246. This entity use vector maps the rows of the JOIN columns to the rows of the input relations.
More particularly, the values of the entity use vector correspond to the row numbers of the referenced relation. In particular, the first three rows of the entity use vector, 1250, 1251, 1252, respectiveIy, correspond to row 1 of the Supplier relation. Thus, the first three rows of the JOIN relation will contain values from the first row of the city column. The city column has associated with it an entity use vector 1244 which has a value 5 in~ the f irst row corresponding to the first row of the city column. The value 5 maps back to the fifth row of the value set 166. The fifth row of 3 5 t~ lue s ~t ~ ~ ~ cont~nll the v~lu~ Lo~on . ~her~rore, "
the first three rows of the entity use vector, 1250, 1251, and 1252, correspond to the value London as shown by the physical representation of the S . CITY column of the JOIN relatLon. The entity use vectors associated with the JOIN relation provide a mapping between the columns of the Supplier relation and the columns of the JOIN relation. As stated earlier, the columns are mapped via the entity use vectors to the value sets of the relational database. Once again, the entity use vector will be used during the DISPLAY/RBCONSTRUCT
process of the JOIN relation. The entity use vector will substantially improve the processing time for generating the actual representation of the JOIN
relation by not having to perform the multitude of ~teps associated with the DISPL~Y/R~ ON~1KUC1 oper~tions as discussed earlier. Again, a more detailed discussion on the DISPLAY/RECONSTRUCT will be presented shortly.
Referring to FIG. lA, typically, the entity use vectors will be stored in byte or multi-byte form in memory 18. When a binary representation of a JOIN
relation, or any other type of relation for that matter, needs to be displayed to the user, the RPU 22 cause3 the appropriate entity use vectors to move from the memory 18 via bus 46 to r~vP 15. There, the RPU 22 evaluates the entity use vectors via the DISPIAY/RE~ON~lr~u~ L
routines (FIGS. 30 and 31). When the relation has been reconstructed and it is ready for display to the external device 12 for storage or to display 3 (FIG. 1) for the user to ascertain.
Referring now to FIG. 7, the entity usc vectors are constructed during the load operation performed by the ~INARY K~rK~;~rNlATION 'routine at block 356. More particularly, when the creation of the relational database has been completed, the system loads the file representation of the relations into Rxternal device 12, .
where they reslde until summoned by the RDMS system 10.
A particular column i8 retrieved by referring to its system ldentifier as discussed earlier and in more detail in part VII. The first value of the particular column i5 brought to the RPU 22, as ~Rc~R~d earlier, a row use vector as60ciated with the value is built by the BBVP 14. Additionally, a bit is set to "l" in the first position of the row use vector to indicate that the value occupies the first row of the first column of the relation. At the same time the first value is brought to the ~PU 22, it is also evaluated by the MVP 15. When the first value is evaluated by the MVP 15, an entity use vector associated with the column is built.
Specifically, the first posltion of the entity use vector is assigned a numerical value to indicate the row of the value ~et corresponding to the input value. As each value is input from the column, the MVP 15 generates a corresponding value associated with the row position in the value set which contains the particular value. Once the column has been completed and the next column is entered from the external device to the MVP
15, a new entity use vector is created. This process continue~ until all of the columns input from the external device have associated entity use vectors. As the entity u6e vectors are constructerl, they are sent via bus 46 to memory 18, where they reside until they are called for processlng via the DISPLAY/K~ KU~:L
routine at FIGS. 25A, 25B and 25C.
Referring to FIG. 30A, a flow diagram for the DISPLAY/K~CON~lKuCT - WITH ENTITY USE VECTORS is shown.
The purpose of this routine is to efficlently reconstruct and display ~the values of a specified column or columns. More particularly, referring to block 1270, the RPU 22 (FIG. lA) causes the entity use vector associated with the particular column t~ be displayed to move from memory 13 via bus 46 to MVP 15. Then during block 1272, the RPU 22 determines whether the entity use vector associated with the column references a value set or another relation. AGsuming that the entity use vector references another relation, then processing Continues at block 1273, during which the REFERENCE
RELATION (FIG. 30B) is called. However, if the entity use vector associated with the column to be displayed references a value set, then the REFERENCE VALUE SET
routine (FIG. 30B) is called. The R~ ;~;N~:~ RELATION
routine (FIG. 30C) is called when the column to be displayed is, for example, a column of the JOIN
relation. In contrast, the REFERENCE VAL~E SET routine (FIG. 30C) is called when the column to be displayed is a column of a base relation (one which references one or more value sets di~ectly). In either case, when the column has been reconstructed and is ready for display, processing rèturns to the calling program at block 1276.
Referring to FIG. 3~B, a flow diagram for the REFERENCE RELATION routine is shown. Specifically, during block 1280, the entity select vector associated with the column is obtained. Thi6 entity select vector is used as a row ~elect vector for obtaining the values from a REFERENCE RELATION. For example, the entity select vector 600 ~FIG. 29) which is associated with the JOIN column depicting the row use set of the JOIN
relation 604 (FIG. 29) could be such a Kl~ N~i RELATION. Processing continues at block 12~32, during which processing returns to the DISP~Y/RE~:O~ Uo1-WITH ENTITY USE VECTORS routine (FIG. 29A).
Referring to FIG. 30C, a flow diagram for the REFERENCE VA1UE SET routine is now discussed. ~ore particularly, during block 1286, the values from thQ
value set are obtained corresponding to the ordinal positions expressed by the entity use vector elements.
., Then, during block 1288, a value is placed in the appropriate row o~ the column. The ordlnal position of the value in the column correspond6 to the ordlnal poGition of the corresponding element in the entity use vector. During block 1290, the next value of the value set ls obtalned and in block 1292, the RPU 22 (FIG. lA) rl~torm~ n.~e whether all the values have been processed.
Assuming that not all of the values have been processed, then processing continues at block 1288, 1290, 1292, until all of the values are placed in the appropriate row positions of the column. Once all the values have been placed in the column, processing continues at block 1294. During block 1294, it is determined whether there 18 another column to go through. ~lore particularly, whether there is a ,JOIN column which needs to be reconstruoted, for example, the CITY column 638 tFIG.
29) of the JOIN relation. Assuming that a JOIN relation needs to be reconstructed, then processlng contlnue~ at blocks 1288, 1290, 1292, 1294 until all the values associated with the JOIN column have been placed into the proper rows of the column. Assuming that there are no more levels of column6 to reconstruct, then processing continues at hlock 1296, during which processing returns to the DIS~LAY/RECONSTRUCT - WITH
ENTITY USE VECTORS routine (FIG. 30A) at block 1276.
During block 1276, processing returns to the calling program and the display of the column has been completed .
VII. DatAhA~e Id~ntificat~r~n Referring to FIGS. 31, 32, 33, 34, 35, 36, 37A, 37B, 37C and 37D, a' detailed description of the structure for maintaining identification of the elements of the relational database is now ~1 cc~le~e~l. Referring to FIG. 31, a repre~entation of a relational database is .
133860~
,f shown. The top portion of FIG. 31 depicts seven domalns of unique values referenced by two relatlons (Supplier and Parts). The third relation, labelcd ~OIN relation, references l~oth the Supplier and Parta relations as shown. All three relations depicted in PIG, 31 were thoroughly discussed in the previous sections of the sre~ tion. Specifically, the Supplier relation is detailed in FIG. 4, the Parts relation i8 detailed in FIG. 5, and the JOIN relation is detailed in FIG. 19.
The three relations are assumed to be in their binary representations and stored in memory 18 (FIG. lA). The RD~S 10 maintains track of all of the columns of the relational database (FIG. 31) via an identification scheme, called the System relation. The System relation logically connects all of the necessary information for describing a column of any of the relations.
Referring to FIG. 32, the System relation for the supplier/Parts relational database (FIG. 31) is shown.
The System relation i8 broken up into four columns.
From left to right, the column identifLer (CID), the relation identifier (RID), the attribute identifier (AID), and the domain identlfier (DID). The CID i8 a number which distinguishes a column from all of the other columns of the relational database. The RID
identifies the relation in which a column logically resides. For example, column 80 (FIG. 2) resides in rclation 63 (FIG. z). The AID de~ines the order or position in which the column resides in a particular relation. For example, column 80 (FIG. 2) is the first 3 0 column of the relation 63 (FIG . 2 ), and thus , has an AID
number o~ 1. The DID identifies the particular domain associated with the column. All four identi~iers together characterize each column of the relational database. Stated differently, all four identifiers ., characterize the relationship of a column in the relatlonal database.
Referring to FIG. 31, each element of the supplier/Parts~ relational database ls identified by a CID number. Speclfically, the Supplier/Identifier domain is characterized by a CID number "l" and an RID
number "1". Domains do not have AID or DID numbers beoause the relation only has a single column, and it is itself a domain. Normally, a single oolumn in the rclational database would have associated with it an AID
number of l; however, in the preferred ' ~ L, the domain is always oonsidered to have an AID of 0.
I.ikewi6e, the Parts Identifier domain i8 characterized by a CID number "2" and an RID number "2". The domains are more fully depicted in FIG. 33 with their assoclated CID and R~D numbers. The CID and RID numbers for the domains are shown in the first seven rows of the system relation (FI~. 32). With the CID and the RID number, any of the domains for the Supplier/Parts relation (FIG.
2~ 31) can be refe~enced and obtained.
Now, referring to the Supplier relation of FIG. 31, the CID, RID, AID and DID numbers associated for each of the columns and the relation is now discussed. The Supplier relation is identified by a special row of the System relation, which generates a virtual column (as seen by tl^e user) for ldentLfying the Supplier relation.
Each relation of the relatLonal databases has a virtual column. The user never sees this column. This column has a CID number of 8 and the Supplier relation has an RID number of 8. The virtual column oontains row numbers associated with the rows of the relation. This column by convention does not have associated with it an AID or a DID number. Thus, in the eighth row of the system relation ~FIG 32), the AID and DID numbers are set to 0. The first column (which the user sees) of t~e ~3386ol Supplier relation i8 the Supplier ID column and it ha6 a660ciated with it a CID number 9. The Supplier ID
column is part of the Supplier relation, and thus, its RID number is 8. Likewise, the Per60n Name column of the Supplier relation has a CID number 10 and it, too, i6 part of the Supplier relation, and thus, has an RID
number of 8. The Statu6 column o~ the Supplier relation has a CID number 11 and it i6 part of the Supplier relation, and thus, has an RID number 8. The last column of the Supplier relation is the City column and it has a CID number Qf 12 and an RID number 8.
Note in FIG. 32 that the Supplier ID column has sn AID number of 1, which identifies the Supplier ID column as the fir8t column of the Supplier relation. The Supplier ID column is a6sociated with the Supplier Identifier domain, and thus, the DID number a6sociated with the Supplier column is 1. Recall that the CID
number for a Supplier Identifier domain is etaual to 1, and this corresponds to the DID numoer for the Supplier ID column. Likewise, the Person Name column of the Supplier relation has an AID number of 2 and a DID
number of 3. More particularly, the AID number 2 mean6 that the Supplier Name column i5 the 6econd column of the Supplier Relation and the DID number 3 means that 2 5 the Person Names column i6 associated with the Person Names domain. The Status column of the Supplier Relation has an AID number 3 and a DID number 7. The AID number 3 means that the Status column i6 the third column of the Supplier relation and the DID number 7 means that the Numbers domain is associated with the Status column. Lastly, the City column has AID number 4 and DID number 5 associ~ted with it. The AID number 4 means that City column is the fourth column of the system relation and the DID number 5 means that the City 1338~01 domain is associated with the City column of the Supplier relation.
The Parts relation also has a virtual column which is identified by the CID number 13. Additicnally, the JOIN relation is identlfied by a virtual column having a CID number 19. The columns of the Parts relation are identified by the CID, RID, AID and DID numbers in the same way the columns are identified for the Supplier relation . The CID number 2 0 corresponds to a JOIN
column which references the Supplier relation having RID
8 (shown by the dotted lines). More particularly, the DID number a6sociated with the JOIN column (CID 20) is 8, which identifies the Supplier relation (RID 8) as the domain to the JOIN column. The AID number associated with the JOIN column having CID 20 is 1, which means that this column is the first column of the JOIN
relation. The second column of the JOIN relation having CID number 21 references the Parts relation. ~ore particularly, the DID number associated with the JOIN
column having CID 21 is 14, which is the RID number for the Parts relation.
Referring to FIGS. 34, 35, 36, 37A, 37B, 37C and 37D, four constructs for developing each column re~erenced by the CID numbers in the system relation rFIG. 32) are shown. Speoifically, a set containing domains and entity select vectors (FIG. 34~, an entity use set (FIG. 35), a row select set (FIG. 36), and a row use vector set (FIGS. 37A, 37B, 37C and 37D) are all used to build a particular column of the relational database. Essentially, each construct is a "set of sets " .
Referring to FIG. 34, the set contains 21 elements, and each number oelow each element corresponds to a different CID number or row of the System relation ~FIG.
35 32). The first element, 1, of the set (domain for supplier ID) corresponds to the first row of the Systcm relation (FIG. 32) in which the CID number is 1, RID
number is 1, AID number i8 0, DID number is 0. The first clcment of this set characterizes all of the elemcnts of thc valuc sct for Supplicr IDs in a sorted fashion .
Thc domains of the relational database depicted in FIG. 34 ~re sorted by a lexical ordering scheme.
However, any ordering scheme may bc uscd, such as, order by entry, or by sizc (number of letter or number or both) etc. FIG. 35 shows the set of entity use vectors for mapping each value in each column of each relation ~FIG. 31) back to the ordinal posLtion of the associated unigue value in the domain . E lements 1 through 7 (FIG.
34) of the entity s~lect set are lexically ordered which do not require entity use vectors for mapping because the domains are ordered to begin with. If the valucs in the domains ~elements 1 through 7, FIG. 25) are in order, entity use vectors can be used for mapping the values in the column in the relations ~elements 9-12, 14-18, 20 and 21).
The column having CID 9, RID 8, AID 1 and DID 1 is the Supplier ID column of the Supplier relation. This column is characterized by the entity select sct (FIG.
34), cntity use set (FIG. 35), row select sct (FIG. 36) and row use set (FIGS. 37A, 37s, 37C and 37D). ~ore particularly, referring to FIG. 34, the entity select vector associated with the Supplier ID column is referenced by the vector elQment 9. The vector element 3c 9 corresponds to the entity select vector ~1010111000~, which characterizes a s~ubset of the Supplier identifier domain comprising those values which appear one or more times in the Supplier ID column.
Referring to FIG. 35, the entity use vector associated with the Supplier ID column is referenced by 13386~1 the ninth element of the set. The entity use vector for the ninth element characterizes a mapping between the value3 of the Supplier IdentLfier domain and the Supplier ID column. The values of the column are shown in the order they occur in the domain and thus, the ordering shown in the entity use vector represents a direct ordinal numbering scheme. The entity use vector represents a mapping between the row positions of the column and the values of the Supplier Identifier domain.
~e~erring to FIG. 3 6, the row select vector as50ciated with the Supplier ID column is referenced by the ninth element of the vector. The ninth element of the vector corresponds to the row select vector "lllll" which mean6 that all of the rows of the column are valld.
Referring to FIGS. 37A and 37C, the row use set associated with the Supplier ID column is referenced by the ninth element of the set. The row use set corresponding to the ninth element also represents a mapping from a unique value from the Supplier's domain to a set of rows of the Supplier ID column. The set of rows for the Supplier ID column is characterized by the row use vector.
The row use set contains a series of row use vectors for providing a mapping between the sorted value set and the unsorted value set. The function per~ormed by the row use vector associated with the first element o~ the vector (FIG. 37A) is the same as the entity use vector associated with the first element of the vector in the entity use set (FIG. 35). However, the entity use vector is more efficient than the row use vector for mapping an unsorted value set to a sorted value set in the row use set. Addit~onally, the row use set is more efflcient than the entity use vector for mapping the sorted value set to the unsorted value set. The relational database system could perform w1th either ~3386~1 ., one, the row use set or the entity use set; however, for efficiency, both are maintained by the preferred ~rhQ~ ~ r ^nt o~ the invention . The ninth elements of the row use set and the entity use set are shown in (FIG.
37C) and (FIG. 35), respectively.
The column at rows 8, 13 and l9 of the System relation (FIG. 32) is also charactcrized by the constructs depicted in FIGS. 34, 35, 36, 37A, 37B, 37C
and 37D. More particularly, referring to FIG. 34, the eiqhth element of the set corresponds to the entity select vector for the column having CID number 8. The entity select vector referenced by thc eighth element of the vector has five binary bits 6et to "1", which indicates that the relation contains five valid rows.
Referring now to FIG. 35, the entity use vector a6sociated with the column having CID 8 is referenced by the eighth element of the 6et. The entity use vector associated with the eighth element is empty because a mapping would not add any useful information for characterizing the column. Referring to FIG. 36, the eighth element of the row select set is depicted for representing the valid rows of the column. Notc that there is the binary bit vector havLng five binary bits 6et to "1" corresponding to the five binary bits set to "l" ~n the entity select vector (FIG. 34). The row use vector associated with the column is referenced by the cighth element of the set as shown in FIG. 37B. The row use vector is empty because a mapping would not add any further useful information for characteri~ing the 3 0 column .
A. Perfor7nln~ ~hè ~At~h~e Itlt~nt~ t~on 5cheme --The row use set is more efficient for mapping a unique value from a value 6et to one or more rows of a column than the entity use vector and vice versa. Thus, ~338601 the system maintains both constructs to create a maximum efficiency for mapping rows of a column to and from a value set.
In Part IV entitled, "Binary Representatlon of a Relational Database, " a detailed descrlptlon of the BINARY ~ ATION routine (FIG. 7) was 11 c~--ccO~ for generating the binary representation of the relation~l database and also for creating the necessary identifiers ln each of the relations and their columns ln the =relational database. ~he process for creating the identlflcation scheme is now more thoroughly dlscussed.
Speclflcally, referring to FIG. 38A, the DATABASE
IDENTIFICATION routine for generatlng a System relatlon for ldentlfying and characterizing each column in a relation of the relational database i5 now ~iccllccer~.
Specifically, during block 1286, each domain necessary for specifying unique values in the relations of the relational database is identifled. Commands are provided to be interpreted by the command interpreter 28 (FIG. lA), for creating domain identifiers in memory 18 (FIG. lA) of the RDMS 10 (FIG. lA). For example, the command interpreter might read the following lnstructlons in~o the system:
I
A) Create domain Supplier Identlflers:
B) Create domain Parts Identifier;
C) Create domain Person Names;
D) Create domain Part Names;
E) Create domain City;
F) Create domain Colors;
G) Create domain Number;
As the command interpreter 28 (FIG. lA), reads the first instruction above, the RPU 22 creates an empty row in the system relation ~FIG. 32) for the domain identifiers -1~38601 during block 1288. Then, durinq block 1290, the RPU 22 (FIG. lA) sets the CID number equal to the row number of the system relation. For example, the first domain identified by the RPU 22 IFIG. lA) would have a CID
number of 1 because lt occupies the first row of the system relation (FIG. 32). During block 1292, the RPU
22 sets the RID number equal to the CID number, and during block 1294, the AID number and DID number are set equal to 0. In block 1295, the RPU 22 generates an element associated with the entity select set tFIG. 34), the entity use set tFIG. 35), the row ~elect set ~FIG.
36), and the row use set (FIG. 37A, 37B, 37C and 37D).
During a later step in this routine, the necessary lnformation for characterizing each of the columns is loaded during block 1332 (FIG. 38D).
During block 1298, the RPU 22 determines whether any more domains need to be identified by the system.
In the example above, seven different "create domain"
commands have been specified for ldentifying each of the seven domains in the relational database. Thus, processing will return to blocks 1286, 1288, 1290, 1292, 1294, 1296 and 1298, during which the RPU 22 will identify each of the domalns of the relatlonal database.
Assuming that all of the domains have been identified with their respective CID, RID, AID, DID and element locations in the vectors, processing will continue at block 1300.
Now that all of the domalns to be referenced by the relatlonal database have been speclfied, the application, during block 1300, identifies each table of the relational database. Specifically, the application provides the system with the following commands:
~338~01 CREATE ~ABLE - SUPPLIER
(Supplier ID; Person Name; Status; City):
CREATE TABLE - PARTS
(Part ID; Part Name; Color; Weight; City);
CREATE TABLE - ~OIN
(S. City; S. ID#; Status; P. ID#);
The RDMS enters another row into the system relation (FIG. 32) for identifying the relation with a column. During bLock 1304, the CID number is set equal to the row number of the system relation, and during : ` block ~306, the AID number and a DID number are set equal to 0. Then, during block 1308, a new element is entered to each of the setG corroCp~n~l 1 n~ to the entity select set (FIG. 34), the entity use set (FIG. 35), the row select set (FIG. 36), and the row use set vector (FIG. 37).
Referring to block 1310 (FIG. 38C), the columns of the database have been identiied. During block 1312, the RPU 22 sets the variable CURRENT AID NUMBER equal to 1. Then during block 1314, the RPU 22 (FIG. lA) generates a new row in the system relation (FIG. 32).
During block 1316, the CID number ls set equal to the row of the system relation, and during block 1318, the RID number is set equal to the RID number of the column for identifying the relation corresponding to the current column. ~he AID number i5 set equal to the "current AID number" and during block 1322, the "current AID number" is incremented by 1. In block 1324, the DID
number for the current column is set equal to the CID
number of the remaining re~erence by the current column.
Processing continues at block 1326 (FIG. 39D) where new elements are added to the entity select set (FIG. 34), .
1338~01 the entity use set (FIG. 35), the row select set (FIG.
36), and the row use set vector (FIG. 37A, 37B, 37C and 37D). Then, during block 1328, the RPU 22 (FIG. lA) determines whether any more columns need to be created for the particular relation specified during block 1300.
If more columns need to be identified by the RPU 22 (FIG. lA), then processing continues at blocks 1310, 1312, 1314, 1316, 1318, 1320, 1322, 1324, 1326 and 1328 until all the columns of the specified relation have been identified. Assuming that all the columns of the 6pecified relation have been identified, then processing continues at block 1330.
During block 1330, the RPU 22 (FIG. lA) determines whether any more relations need to be identified.
Assuming that more relations need to be identiried, then processing returns to block 1300, where the next relation to be created by the next CREATE TABI,E
inst~uction, listed above, is specified. Blocks 1300 through 1330 are performed until all of the relation6 in the relational database have been identified and the columns associated with each of the tables also have been identified. Assuming that all the tables and their related columns have been identified, then processing continues at block 1332.
During block 1332, the RPU 22 instructs the file associated with each table o~ the relational database to be transferred from externa1 device 12 to the RDMS 10 via 30 to the RPU 22. The RPU 22 instructs the BBVP 14 to generate ~he enti~y select vector (FIG. 34), the row select vector (FIG. 36) and the row use set (FIG. 37A) as~ociated with each column. Simultaneously, the RPU 22 instructs the MVP 15 tc~ generate the entity use vector (FIG. 35) associated with each of the columns. The entity selec~ vector, entity use vector, row select vector, and row use set are retrieved by the RPU 22 via -1338~01 the system identifiers (i.e. CID, DID, AID, RID
identifiers). Then, during block 1334, thc RPU 22 d~ot~r-n~noc whether any more files associated with the relations need to be loaded into the RDMS 10. If more files need to be loaded for another table, then processing returns to block 1332; otherwise, processing continues at block 1336. Assuming that all the files have been loaded into the RDMS 10, then processing returns to the calling program at block 1336. At this time, all of the column~ and the relations of the relational database have been identif ied and all of the characterizing vectors associated with each of the columns are also organized.
The invention has been described in an exemplary and preferred embodiment, but it is not limited thereto.
Those skilled in the art will recognize a number of additional modifications and improvements which can be made to the invention without departurc from its essential sphere and scope. For example, a number of different software and hardware embodiments and any number of different software languages and hardware configurations would be suitable for implementing the disclosed invention.
i
relation. The second column of the JOIN relation having CID number 21 references the Parts relation. ~ore particularly, the DID number associated with the JOIN
column having CID 21 is 14, which is the RID number for the Parts relation.
Referring to FIGS. 34, 35, 36, 37A, 37B, 37C and 37D, four constructs for developing each column re~erenced by the CID numbers in the system relation rFIG. 32) are shown. Speoifically, a set containing domains and entity select vectors (FIG. 34~, an entity use set (FIG. 35), a row select set (FIG. 36), and a row use vector set (FIGS. 37A, 37B, 37C and 37D) are all used to build a particular column of the relational database. Essentially, each construct is a "set of sets " .
Referring to FIG. 34, the set contains 21 elements, and each number oelow each element corresponds to a different CID number or row of the System relation ~FIG.
35 32). The first element, 1, of the set (domain for supplier ID) corresponds to the first row of the Systcm relation (FIG. 32) in which the CID number is 1, RID
number is 1, AID number i8 0, DID number is 0. The first clcment of this set characterizes all of the elemcnts of thc valuc sct for Supplicr IDs in a sorted fashion .
Thc domains of the relational database depicted in FIG. 34 ~re sorted by a lexical ordering scheme.
However, any ordering scheme may bc uscd, such as, order by entry, or by sizc (number of letter or number or both) etc. FIG. 35 shows the set of entity use vectors for mapping each value in each column of each relation ~FIG. 31) back to the ordinal posLtion of the associated unigue value in the domain . E lements 1 through 7 (FIG.
34) of the entity s~lect set are lexically ordered which do not require entity use vectors for mapping because the domains are ordered to begin with. If the valucs in the domains ~elements 1 through 7, FIG. 25) are in order, entity use vectors can be used for mapping the values in the column in the relations ~elements 9-12, 14-18, 20 and 21).
The column having CID 9, RID 8, AID 1 and DID 1 is the Supplier ID column of the Supplier relation. This column is characterized by the entity select sct (FIG.
34), cntity use set (FIG. 35), row select sct (FIG. 36) and row use set (FIGS. 37A, 37s, 37C and 37D). ~ore particularly, referring to FIG. 34, the entity select vector associated with the Supplier ID column is referenced by the vector elQment 9. The vector element 3c 9 corresponds to the entity select vector ~1010111000~, which characterizes a s~ubset of the Supplier identifier domain comprising those values which appear one or more times in the Supplier ID column.
Referring to FIG. 35, the entity use vector associated with the Supplier ID column is referenced by 13386~1 the ninth element of the set. The entity use vector for the ninth element characterizes a mapping between the value3 of the Supplier IdentLfier domain and the Supplier ID column. The values of the column are shown in the order they occur in the domain and thus, the ordering shown in the entity use vector represents a direct ordinal numbering scheme. The entity use vector represents a mapping between the row positions of the column and the values of the Supplier Identifier domain.
~e~erring to FIG. 3 6, the row select vector as50ciated with the Supplier ID column is referenced by the ninth element of the vector. The ninth element of the vector corresponds to the row select vector "lllll" which mean6 that all of the rows of the column are valld.
Referring to FIGS. 37A and 37C, the row use set associated with the Supplier ID column is referenced by the ninth element of the set. The row use set corresponding to the ninth element also represents a mapping from a unique value from the Supplier's domain to a set of rows of the Supplier ID column. The set of rows for the Supplier ID column is characterized by the row use vector.
The row use set contains a series of row use vectors for providing a mapping between the sorted value set and the unsorted value set. The function per~ormed by the row use vector associated with the first element o~ the vector (FIG. 37A) is the same as the entity use vector associated with the first element of the vector in the entity use set (FIG. 35). However, the entity use vector is more efficient than the row use vector for mapping an unsorted value set to a sorted value set in the row use set. Addit~onally, the row use set is more efflcient than the entity use vector for mapping the sorted value set to the unsorted value set. The relational database system could perform w1th either ~3386~1 ., one, the row use set or the entity use set; however, for efficiency, both are maintained by the preferred ~rhQ~ ~ r ^nt o~ the invention . The ninth elements of the row use set and the entity use set are shown in (FIG.
37C) and (FIG. 35), respectively.
The column at rows 8, 13 and l9 of the System relation (FIG. 32) is also charactcrized by the constructs depicted in FIGS. 34, 35, 36, 37A, 37B, 37C
and 37D. More particularly, referring to FIG. 34, the eiqhth element of the set corresponds to the entity select vector for the column having CID number 8. The entity select vector referenced by thc eighth element of the vector has five binary bits 6et to "1", which indicates that the relation contains five valid rows.
Referring now to FIG. 35, the entity use vector a6sociated with the column having CID 8 is referenced by the eighth element of the 6et. The entity use vector associated with the eighth element is empty because a mapping would not add any useful information for characterizing the column. Referring to FIG. 36, the eighth element of the row select set is depicted for representing the valid rows of the column. Notc that there is the binary bit vector havLng five binary bits 6et to "1" corresponding to the five binary bits set to "l" ~n the entity select vector (FIG. 34). The row use vector associated with the column is referenced by the cighth element of the set as shown in FIG. 37B. The row use vector is empty because a mapping would not add any further useful information for characteri~ing the 3 0 column .
A. Perfor7nln~ ~hè ~At~h~e Itlt~nt~ t~on 5cheme --The row use set is more efficient for mapping a unique value from a value 6et to one or more rows of a column than the entity use vector and vice versa. Thus, ~338601 the system maintains both constructs to create a maximum efficiency for mapping rows of a column to and from a value set.
In Part IV entitled, "Binary Representatlon of a Relational Database, " a detailed descrlptlon of the BINARY ~ ATION routine (FIG. 7) was 11 c~--ccO~ for generating the binary representation of the relation~l database and also for creating the necessary identifiers ln each of the relations and their columns ln the =relational database. ~he process for creating the identlflcation scheme is now more thoroughly dlscussed.
Speclflcally, referring to FIG. 38A, the DATABASE
IDENTIFICATION routine for generatlng a System relatlon for ldentlfying and characterizing each column in a relation of the relational database i5 now ~iccllccer~.
Specifically, during block 1286, each domain necessary for specifying unique values in the relations of the relational database is identifled. Commands are provided to be interpreted by the command interpreter 28 (FIG. lA), for creating domain identifiers in memory 18 (FIG. lA) of the RDMS 10 (FIG. lA). For example, the command interpreter might read the following lnstructlons in~o the system:
I
A) Create domain Supplier Identlflers:
B) Create domain Parts Identifier;
C) Create domain Person Names;
D) Create domain Part Names;
E) Create domain City;
F) Create domain Colors;
G) Create domain Number;
As the command interpreter 28 (FIG. lA), reads the first instruction above, the RPU 22 creates an empty row in the system relation ~FIG. 32) for the domain identifiers -1~38601 during block 1288. Then, durinq block 1290, the RPU 22 (FIG. lA) sets the CID number equal to the row number of the system relation. For example, the first domain identified by the RPU 22 IFIG. lA) would have a CID
number of 1 because lt occupies the first row of the system relation (FIG. 32). During block 1292, the RPU
22 sets the RID number equal to the CID number, and during block 1294, the AID number and DID number are set equal to 0. In block 1295, the RPU 22 generates an element associated with the entity select set tFIG. 34), the entity use set tFIG. 35), the row ~elect set ~FIG.
36), and the row use set (FIG. 37A, 37B, 37C and 37D).
During a later step in this routine, the necessary lnformation for characterizing each of the columns is loaded during block 1332 (FIG. 38D).
During block 1298, the RPU 22 determines whether any more domains need to be identified by the system.
In the example above, seven different "create domain"
commands have been specified for ldentifying each of the seven domains in the relational database. Thus, processing will return to blocks 1286, 1288, 1290, 1292, 1294, 1296 and 1298, during which the RPU 22 will identify each of the domalns of the relatlonal database.
Assuming that all of the domains have been identified with their respective CID, RID, AID, DID and element locations in the vectors, processing will continue at block 1300.
Now that all of the domalns to be referenced by the relatlonal database have been speclfied, the application, during block 1300, identifies each table of the relational database. Specifically, the application provides the system with the following commands:
~338~01 CREATE ~ABLE - SUPPLIER
(Supplier ID; Person Name; Status; City):
CREATE TABLE - PARTS
(Part ID; Part Name; Color; Weight; City);
CREATE TABLE - ~OIN
(S. City; S. ID#; Status; P. ID#);
The RDMS enters another row into the system relation (FIG. 32) for identifying the relation with a column. During bLock 1304, the CID number is set equal to the row number of the system relation, and during : ` block ~306, the AID number and a DID number are set equal to 0. Then, during block 1308, a new element is entered to each of the setG corroCp~n~l 1 n~ to the entity select set (FIG. 34), the entity use set (FIG. 35), the row select set (FIG. 36), and the row use set vector (FIG. 37).
Referring to block 1310 (FIG. 38C), the columns of the database have been identiied. During block 1312, the RPU 22 sets the variable CURRENT AID NUMBER equal to 1. Then during block 1314, the RPU 22 (FIG. lA) generates a new row in the system relation (FIG. 32).
During block 1316, the CID number ls set equal to the row of the system relation, and during block 1318, the RID number is set equal to the RID number of the column for identifying the relation corresponding to the current column. ~he AID number i5 set equal to the "current AID number" and during block 1322, the "current AID number" is incremented by 1. In block 1324, the DID
number for the current column is set equal to the CID
number of the remaining re~erence by the current column.
Processing continues at block 1326 (FIG. 39D) where new elements are added to the entity select set (FIG. 34), .
1338~01 the entity use set (FIG. 35), the row select set (FIG.
36), and the row use set vector (FIG. 37A, 37B, 37C and 37D). Then, during block 1328, the RPU 22 (FIG. lA) determines whether any more columns need to be created for the particular relation specified during block 1300.
If more columns need to be identified by the RPU 22 (FIG. lA), then processing continues at blocks 1310, 1312, 1314, 1316, 1318, 1320, 1322, 1324, 1326 and 1328 until all the columns of the specified relation have been identified. Assuming that all the columns of the 6pecified relation have been identified, then processing continues at block 1330.
During block 1330, the RPU 22 (FIG. lA) determines whether any more relations need to be identified.
Assuming that more relations need to be identiried, then processing returns to block 1300, where the next relation to be created by the next CREATE TABI,E
inst~uction, listed above, is specified. Blocks 1300 through 1330 are performed until all of the relation6 in the relational database have been identified and the columns associated with each of the tables also have been identified. Assuming that all the tables and their related columns have been identified, then processing continues at block 1332.
During block 1332, the RPU 22 instructs the file associated with each table o~ the relational database to be transferred from externa1 device 12 to the RDMS 10 via 30 to the RPU 22. The RPU 22 instructs the BBVP 14 to generate ~he enti~y select vector (FIG. 34), the row select vector (FIG. 36) and the row use set (FIG. 37A) as~ociated with each column. Simultaneously, the RPU 22 instructs the MVP 15 tc~ generate the entity use vector (FIG. 35) associated with each of the columns. The entity selec~ vector, entity use vector, row select vector, and row use set are retrieved by the RPU 22 via -1338~01 the system identifiers (i.e. CID, DID, AID, RID
identifiers). Then, during block 1334, thc RPU 22 d~ot~r-n~noc whether any more files associated with the relations need to be loaded into the RDMS 10. If more files need to be loaded for another table, then processing returns to block 1332; otherwise, processing continues at block 1336. Assuming that all the files have been loaded into the RDMS 10, then processing returns to the calling program at block 1336. At this time, all of the column~ and the relations of the relational database have been identif ied and all of the characterizing vectors associated with each of the columns are also organized.
The invention has been described in an exemplary and preferred embodiment, but it is not limited thereto.
Those skilled in the art will recognize a number of additional modifications and improvements which can be made to the invention without departurc from its essential sphere and scope. For example, a number of different software and hardware embodiments and any number of different software languages and hardware configurations would be suitable for implementing the disclosed invention.
i
Claims (16)
1. A data processing method, utilizing a computer having a processor controlled by a stored program implementing the method and coupled to a memory, for generating and storing binary data in the memory of the computer in which the binary data is encoded by the computer to represent one or more relations from a relational database the relations being initially in the form of data values arrayed in columns and rows, with each column consisting of data values having a common characteristic, each one of the data values of any one column corresponding to a different one of the rows, each of the rows comprising data values from one or more-columns, each data value of each of the rows being from a different column, said method comprising the steps of:
for each said common characteristic of the data values in a relation being encoded, generating and storing in the memory a single domain set of unique data values, taken from all the columns of all relations being encoded and sharing said common characteristic, whereby all data values from all the relations being encoded are grouped into separate domain sets in memory, each domain set consisting of unique data values sharing a common characteristic;
for each column of each relation being encoded, selecting the domain set of data values having the common characteristic associated with that particular column;
for each selected domain set, generating and storing in the memory first binary data identifying a subset of data values within the selected domain set in which all data values in the subset correspond to the data values of one column of a relation being encoded; and for each data value identified in each said subset of values, generating and storing in the memory second binary data identifying one or more rows to which each identified data value is assigned in each relation being encoded.
for each said common characteristic of the data values in a relation being encoded, generating and storing in the memory a single domain set of unique data values, taken from all the columns of all relations being encoded and sharing said common characteristic, whereby all data values from all the relations being encoded are grouped into separate domain sets in memory, each domain set consisting of unique data values sharing a common characteristic;
for each column of each relation being encoded, selecting the domain set of data values having the common characteristic associated with that particular column;
for each selected domain set, generating and storing in the memory first binary data identifying a subset of data values within the selected domain set in which all data values in the subset correspond to the data values of one column of a relation being encoded; and for each data value identified in each said subset of values, generating and storing in the memory second binary data identifying one or more rows to which each identified data value is assigned in each relation being encoded.
2. The method of claim 1 wherein said step of generating and storing in the memory a single domain set of unique data values further comprises forming said unique values in a predetermined order of occurrence.
3. The method of claim 2 wherein said step of generating and storing first binary data identifying a subset of the selected domain set further comprises the steps of forming, for each said subset, an entity select vector consisting of a string of binary bits, each binary bit of said entity select vector having an order of occurrence within the string corresponding to the order of occurrence of a unique data value within the selected domain set with each unique data value in the domain set having a corresponding bit position in the string of bits, and setting each binary bit in the entity select vector to indicate the presence or absence in said subset of the associated data value in the domain set by order of occurrence, the entity select vectors comprising said first binary data.
4. The method of claim 1 wherein said step of generating and storing second binary data identifying one or more rows includes the steps of forming for each data value in a subset, a separate row use vector consisting of a string of binary bits, each binary bit of each said row use vector corresponding to one of the rows of the particular relation being encoded, and setting each binary bit in a row use vector to indicate the presence or absence of the associated unique value from the subset in the associated one of said rows of the relation being encoded, the row use vectors comprising said second binary data.
5. The method of claim 4 wherein said step of forming a row use vector further comprising the step of forming said bits of the row use vector in an order of occurrence corresponding to the order of occurrence of the rows in the relation being encoded.
6, An apparatus, in combination with a computer having a processor and a memory, for generating and storing binary data in the memory of the computer, in which the data is encoded to represent relations from a relational database, the relations being initially in the form of data values arranged in columns and rows, with each column consisting of data values having a common characteristic, each one of the data values of any one column corresponding to a different one of the rows, each row comprising values from one or more columns, each data value of each of the rows being from a different column, said apparatus comprising:
for each said common characteristic of data values in a relation being encoded, means for generating and storing in the memory a single domain set of unique data values, taken from all the columns of all relations being encoded and sharing said common characteristic, whereby all data values from all the relations, being encoded are grouped into separate domain sets in memory, each domain set consisting of unique data values sharing a common characteristic;
for each column of each relation being encoded, means for selecting the domain set of values having the common characteristic associated with that particular column;
for each selected domain set, means for generating and storing in the memory first binary data identifying a subset of data values within the selected domain set in which all data values in the subset correspond to the data values of one column of a relation being encoded; and for each data value identified in each said subset of data values, means for generating and storing in the memory second binary data identifying one or more rows to which each identified data value is assigned in each relation being encoded.
for each said common characteristic of data values in a relation being encoded, means for generating and storing in the memory a single domain set of unique data values, taken from all the columns of all relations being encoded and sharing said common characteristic, whereby all data values from all the relations, being encoded are grouped into separate domain sets in memory, each domain set consisting of unique data values sharing a common characteristic;
for each column of each relation being encoded, means for selecting the domain set of values having the common characteristic associated with that particular column;
for each selected domain set, means for generating and storing in the memory first binary data identifying a subset of data values within the selected domain set in which all data values in the subset correspond to the data values of one column of a relation being encoded; and for each data value identified in each said subset of data values, means for generating and storing in the memory second binary data identifying one or more rows to which each identified data value is assigned in each relation being encoded.
7, A method, utilizing a computer, for creating a relational database, said relational database comprising at least one relation, said at least one relation comprising one or more columns and rows, each said column having one or more. values, each said value of each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, each said value of each said row being from a different column, and said one or more values of each said row having one or more characteristics, said method comprising the steps of:
for each said characteristic of said relational database, forming a set of a plurality of unique values;
for each said set, separately forming one or more subset representations of each said set, each said subset comprising one or more said unique values of said set; and forming said at least one relation of said relational database, said step comprising the step of forming said column of said at least one relation, said column comprising one or more of each said unique value of one of said subsets and each said unique value of said column corresponding to one or more of said rows of said at least one relation.
for each said characteristic of said relational database, forming a set of a plurality of unique values;
for each said set, separately forming one or more subset representations of each said set, each said subset comprising one or more said unique values of said set; and forming said at least one relation of said relational database, said step comprising the step of forming said column of said at least one relation, said column comprising one or more of each said unique value of one of said subsets and each said unique value of said column corresponding to one or more of said rows of said at least one relation.
8. The method of claim 7 wherein said step of forming said at least one relation further comprises forming a binary representation of said at least one relation, said step further comprising, for each said subset, the step of forming a vector of binary bits corresponding to each said unique value of said subset, each said binary bit of said binary bit vector corresponding to one of said rows of one of said at least one relation, and each said binary bit representing the presence or absence of one of said unique values in one of said rows of one of said columns of said at least one relation.
9. A method, utilizing a computer having a processor controlled by a stored program implementing the method and coupled to a memory, for performing relational operations on one or more relations of a relational database to determine a binary bit vector representation of a resultant relation, wherein each said relation of said relational database initially comprises data values arranged in columns and rows, with each column consisting of data values having a common characteristic, each one of the data values of any one column corresponding to a different one of the rows, each of the rows comprising data values from one or more columns, data values of any one of the rows each being in a different one of the columns of the relation, said method comprising the steps of:
storing in memory in ordered domain sets all the data values from each column of said one or more relations, with each domain set comprising unique data values sharing a common characteristic;
for each column of said one or more relations, selecting from one of said domain sets those data values in the domain set that correspond to the data values in the column;
for each selected data value, generating and storing a row use vector comprising a string of binary bits, each binary bit position within a row use vector corresponding to a separate one of the rows in the associated relation; setting the binary bit in each bit position of each row use vector to represent the presence or absence of the associated unique value in the corresponding row;
and performing a binary operation on one or more sets of binary row use vectors of said relations to produce a resultant set of row use vectors for defining said resultant relation.
storing in memory in ordered domain sets all the data values from each column of said one or more relations, with each domain set comprising unique data values sharing a common characteristic;
for each column of said one or more relations, selecting from one of said domain sets those data values in the domain set that correspond to the data values in the column;
for each selected data value, generating and storing a row use vector comprising a string of binary bits, each binary bit position within a row use vector corresponding to a separate one of the rows in the associated relation; setting the binary bit in each bit position of each row use vector to represent the presence or absence of the associated unique value in the corresponding row;
and performing a binary operation on one or more sets of binary row use vectors of said relations to produce a resultant set of row use vectors for defining said resultant relation.
10. The method of claim 9 wherein said step for performing said relational operation comprises determining which one or more rows of each said relation correspond to a selected unique value, said selected unique value being in one of said columns of the associated relation, selecting the row use vector corresponding to said selected unique value from a set of row use vectors, said row use vector being included in the set of row use vectors for the corresponding column of said resultant relation and indicating said one or more rows of a column in the resultant relation which contain said selected unique value.
11. The method of claim 9 wherein the step for performing said binary operation comprises the steps of:
selecting binary row use vectors corresponding to selected unique values, each said selected binary row use vector indicating said one or more rows of said relation which contain said selected unique values; and performing a Boolean OR operation on said selected binary row use vectors to form resultant vectors, said resultant vectors indicating said one or more rows of said resultant relation which contain the selected unique values.
selecting binary row use vectors corresponding to selected unique values, each said selected binary row use vector indicating said one or more rows of said relation which contain said selected unique values; and performing a Boolean OR operation on said selected binary row use vectors to form resultant vectors, said resultant vectors indicating said one or more rows of said resultant relation which contain the selected unique values.
12. A data processing method, utilizing a computer having a processor controlled by a stored program implementing the method and coupled to a memory, for generating and storing binary data in the memory of the computer in which the binary data is encoded by the computer to represent a plurality of relations in a relational database, each relation comprising columns and rows, each column having one or more data values, each data value of the same column having a common characteristic, each data value of each column corresponding to one of said rows, each row comprising one or more data values, each data value of each row being in a different column, said method comprising the steps of:
for each said common characteristic of the data values, generating and storing in the memory, in a predetermined order of occurrence, an ordered domain set of a plurality of unique data values taken from all the columns of the relations being encoded;
for each column of each relation being encoded, selecting the domain set of data values having the common characteristic associated with the particular column;
for each selected domain set, generating and storing in memory an entity select vector identifying a subset of the selected domain set, each said subset comprising one or more of said unique values of said set, each subset comprising all the unique values in an associated column of a relation, each entity select vector comprising a string of binary bits, there being one binary bit position within the entity select vector for each unique value in the ordered set; setting each binary bit in an entity select vector to represent the presence or absence of each corresponding unique value in said ordered domain set within the subset; and generating and storing in memory a row use vector of binary bits for each unique value in the subset, each binary bit position of each row use vector corresponding to one of said rows of the relation, and each said binary bit representing the presence or absence of each said unique value in the associated one of the rows of said relation, whereby one entity select vector and one set of row use vectors in combination with one domain set define one column of a relation being encoded.
for each said common characteristic of the data values, generating and storing in the memory, in a predetermined order of occurrence, an ordered domain set of a plurality of unique data values taken from all the columns of the relations being encoded;
for each column of each relation being encoded, selecting the domain set of data values having the common characteristic associated with the particular column;
for each selected domain set, generating and storing in memory an entity select vector identifying a subset of the selected domain set, each said subset comprising one or more of said unique values of said set, each subset comprising all the unique values in an associated column of a relation, each entity select vector comprising a string of binary bits, there being one binary bit position within the entity select vector for each unique value in the ordered set; setting each binary bit in an entity select vector to represent the presence or absence of each corresponding unique value in said ordered domain set within the subset; and generating and storing in memory a row use vector of binary bits for each unique value in the subset, each binary bit position of each row use vector corresponding to one of said rows of the relation, and each said binary bit representing the presence or absence of each said unique value in the associated one of the rows of said relation, whereby one entity select vector and one set of row use vectors in combination with one domain set define one column of a relation being encoded.
13. The method of claim 12 further comprising the step of also adding the unique value to the subset representation of the ordered domain set, said step comprising the steps of:
adding a binary bit to the entity select vector to indicate the occurrence of the additional unique value in the domain set; and setting the added binary bit in said entity select vector to indicate the presence of said additional unique value in said subset.
adding a binary bit to the entity select vector to indicate the occurrence of the additional unique value in the domain set; and setting the added binary bit in said entity select vector to indicate the presence of said additional unique value in said subset.
14. The method of claim 12 further comprising the step of constructing a relation from the stored domain set, entity select vectors and row use vectors of an encoded relation, said last-named step comprising the steps of:
for each column of the relation being constructed, recovering from memory the corresponding entity select vector for each column and associated set of row use vectors, identifying the binary bits of the entity select vector that indicate the presence of a unique value in the associated column, identifying the unique values of the domain set corresponding to the setting of each binary bit in the entity select vector;
providing an output display from the computer of said display; and displaying in a column each occurrence of each identified unique volume from a domain set in selected rows referenced by the associated set of row use vectors in binary represented relation.
for each column of the relation being constructed, recovering from memory the corresponding entity select vector for each column and associated set of row use vectors, identifying the binary bits of the entity select vector that indicate the presence of a unique value in the associated column, identifying the unique values of the domain set corresponding to the setting of each binary bit in the entity select vector;
providing an output display from the computer of said display; and displaying in a column each occurrence of each identified unique volume from a domain set in selected rows referenced by the associated set of row use vectors in binary represented relation.
15. The method of claim 12 wherein said step for performing said relational operation comprises determining which one or more rows of a selected encoded relation correspond to a selected unique value, said selected unique value being in one of said columns of the associated encoded relation, selecting the row use vector corresponding to said selected unique value, said row use vector corresponding to said selected unique value, said row use vector being included in the net of row use vectors for the corresponding columns of the resultant encoded relation and indicating said one or more rows of a column in the resultant encoded relation which contain said selected unique value.
16. The method of claim 12 further comprising the step of forming a resultant vector for each column of a relation, said resultant vector comprising an identifier corresponding to each row of said column, said identifier mapping to one of said unique values of a domain set.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US107,447 | 1979-12-26 | ||
US10744787A | 1987-10-09 | 1987-10-09 | |
US23875488A | 1988-08-29 | 1988-08-29 | |
US238,754 | 1988-08-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
CA1338601C true CA1338601C (en) | 1996-09-17 |
Family
ID=26804791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA000579597A Expired - Lifetime CA1338601C (en) | 1987-10-09 | 1988-10-07 | Relational database representation with relational database operation capability |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0398884A4 (en) |
AU (1) | AU632267B2 (en) |
CA (1) | CA1338601C (en) |
WO (1) | WO1989004013A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1902357A2 (en) * | 2005-07-12 | 2008-03-26 | Sand Technology Systems International, Inc. | Method and apparatus for representation of unstructured data |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0360387B1 (en) * | 1988-09-23 | 1996-05-08 | International Business Machines Corporation | Data base management system |
IT1275529B (en) * | 1995-07-14 | 1997-08-07 | Alcatel Italia | EMULATOR FOR A RELATIONAL DATABASE IN SQL LANGUAGE |
JP2000515654A (en) * | 1995-12-01 | 2000-11-21 | サンド テクノロジー システムズ インターナショナル インコーポレイティド | Method and system for performing Boolean operations on bit strings using maximum bit slices |
DE19715723A1 (en) * | 1997-04-15 | 1998-11-12 | Dci Datenbank Fuer Wirtschafts | Array method |
EP1049030A1 (en) | 1999-04-28 | 2000-11-02 | SER Systeme AG Produkte und Anwendungen der Datenverarbeitung | Classification method and apparatus |
WO2001018742A2 (en) * | 1999-09-03 | 2001-03-15 | Whamtech, L. P. | Index relational processor |
US9177828B2 (en) | 2011-02-10 | 2015-11-03 | Micron Technology, Inc. | External gettering method and device |
EP1182577A1 (en) | 2000-08-18 | 2002-02-27 | SER Systeme AG Produkte und Anwendungen der Datenverarbeitung | Associative memory |
DK1288792T3 (en) | 2001-08-27 | 2012-04-02 | Bdgb Entpr Software Sarl | Procedure for automatic indexing of documents |
EP1422636A1 (en) * | 2002-11-25 | 2004-05-26 | Sun Microsystems, Inc. | Structured data set generation system and method |
US8321357B2 (en) | 2009-09-30 | 2012-11-27 | Lapir Gennady | Method and system for extraction |
US9158833B2 (en) | 2009-11-02 | 2015-10-13 | Harry Urbschat | System and method for obtaining document information |
US9213756B2 (en) | 2009-11-02 | 2015-12-15 | Harry Urbschat | System and method of using dynamic variance networks |
US9152883B2 (en) | 2009-11-02 | 2015-10-06 | Harry Urbschat | System and method for increasing the accuracy of optical character recognition (OCR) |
US9218379B2 (en) * | 2013-03-15 | 2015-12-22 | Informatica Llc | Method, apparatus, and computer-readable medium for efficiently performing operations on distinct data values |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4276597A (en) * | 1974-01-17 | 1981-06-30 | Volt Delta Resources, Inc. | Method and apparatus for information storage and retrieval |
US4068298A (en) * | 1975-12-03 | 1978-01-10 | Systems Development Corporation | Information storage and retrieval system |
US4318184A (en) * | 1978-09-05 | 1982-03-02 | Millett Ronald P | Information storage and retrieval system and method |
EP0079465A3 (en) * | 1981-11-13 | 1985-01-23 | International Business Machines Corporation | Method for storing and accessing a relational data base |
-
1988
- 1988-10-07 AU AU27100/88A patent/AU632267B2/en not_active Ceased
- 1988-10-07 WO PCT/US1988/003528 patent/WO1989004013A1/en not_active Application Discontinuation
- 1988-10-07 CA CA000579597A patent/CA1338601C/en not_active Expired - Lifetime
- 1988-10-07 EP EP19880910209 patent/EP0398884A4/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1902357A2 (en) * | 2005-07-12 | 2008-03-26 | Sand Technology Systems International, Inc. | Method and apparatus for representation of unstructured data |
US7467155B2 (en) | 2005-07-12 | 2008-12-16 | Sand Technology Systems International, Inc. | Method and apparatus for representation of unstructured data |
EP1902357A4 (en) * | 2005-07-12 | 2011-05-11 | Sand Technology Systems International Inc | Method and apparatus for representation of unstructured data |
Also Published As
Publication number | Publication date |
---|---|
AU2710088A (en) | 1989-05-23 |
AU632267B2 (en) | 1992-12-24 |
EP0398884A4 (en) | 1992-08-12 |
WO1989004013A1 (en) | 1989-05-05 |
EP0398884A1 (en) | 1990-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA1338601C (en) | Relational database representation with relational database operation capability | |
US7058621B1 (en) | Method for extracting information from a database | |
Landsteiner et al. | New curves from branes | |
CN111768850B (en) | Hospital data analysis method, hospital data analysis platform, device and medium | |
US6510435B2 (en) | Database system and method of organizing an n-dimensional data set | |
JPH10505440A (en) | Programming language-computer-based information access method and apparatus enabling SQL-based manipulation of concrete data files | |
JP3452531B2 (en) | Method and system for data mining | |
JPS6051732B2 (en) | Data processing system with data base | |
US7624326B2 (en) | Encoding device and method, decoding device and method, program, and recording medium | |
CN108874395A (en) | Hard Compilation Method and device during a kind of modularization stream process | |
CN110389953B (en) | Data storage method, storage medium, storage device and server based on compression map | |
CN114138735A (en) | Method for quickly loading Janus graph data in batches | |
CN108090034A (en) | Document code Unified coding generation method and system based on cluster | |
CN115114293A (en) | Database index creating method, related device, equipment and storage medium | |
CN110110024B (en) | Method for importing high-capacity VCT file into spatial database | |
DE3587612T2 (en) | Search method for association matrix. | |
JPS6172333A (en) | Merge processing system | |
CN116644103B (en) | Data sorting method and device based on database, equipment and storage medium | |
JPS6143338A (en) | Searching of thin data base using association technology | |
JPH05158911A (en) | Method for generating grain simulation program | |
CN117633087A (en) | Stored procedure execution method, device, apparatus, medium and product | |
JPH047759A (en) | Data file form converter | |
JPH0461382B2 (en) | ||
CN113887189A (en) | Information identification method and device applied to automatic data query | |
CN116862165A (en) | Dimension reduction matching method for selective assembly |