WO1989004013A1 - A relational database representation with relational database operation capability - Google Patents

A relational database representation with relational database operation capability Download PDF

Info

Publication number
WO1989004013A1
WO1989004013A1 PCT/US1988/003528 US8803528W WO8904013A1 WO 1989004013 A1 WO1989004013 A1 WO 1989004013A1 US 8803528 W US8803528 W US 8803528W WO 8904013 A1 WO8904013 A1 WO 8904013A1
Authority
WO
WIPO (PCT)
Prior art keywords
relation
column
vector
value
binary
Prior art date
Application number
PCT/US1988/003528
Other languages
French (fr)
Inventor
Edward Lewis Glaser
Douglas Wyche Caldwell
Original Assignee
Nucleus International Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nucleus International Corporation filed Critical Nucleus International Corporation
Publication of WO1989004013A1 publication Critical patent/WO1989004013A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • This invention relates to a computer apparatus and/or method for creating a relational database. This invention also relates to a computer apparatus and/or method for efficiently processing relational operations on one or more relations.
  • a database system is basically a computerized record-keeping machine — that is, a system whose overall purpose is to maintain data and to make that data available.
  • the data maintained by the system can be any data deemed to be of significance.
  • the basic purpose of a database system is to assist in the process of accessing data.
  • relational database is a database that is perceived by users as a collection of tables and nothing but tables.” Id. at 96.
  • the relational database system enables the user to generate new tables from old tables and to extract and combine subsets of information from one or more tables.
  • the present invention introduces a computer apparatus and/or method for efficiently accessing, representing and manipulating data in a relational database. More particularly, the present invention introduces a computer apparatus and/or method for creating and representing a relational database and an apparatus and/or method for efficiently processing relational operations on one or more relations.
  • one aspect of the preferred embodiment of the present invention includes an apparatus and/or method, utilizing a computer, for creating a relational database.
  • the relational database contains a plurality of relations and each of the relations contains one or more columns and rows.
  • a column has one or more values, which all have a common characteristic.
  • Each value in the column corresponds to one of the rows of the relation.
  • Each row contains one or more values, and each value is from a different column.
  • the values in each row thus have one or more characteristics.
  • the creation of a relational database occurs in three steps. First, for each characteristic of the relational database, a set, called a "value set" or "domain” is formed containing all the values entered in the data base having the same characteristic.
  • each relation (table) of the relational database one or more subsets of each value set containing values unique to the particular relation is formed. Each of the subsets contains one or more of the unique values of one of the value sets.
  • the relations (tables) of the relational database are formed. More particularly, for each subset associated with a particular relation, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more of the rows of the relation.
  • the first step involves an additional step of forming the set of unique values in some desired order of occurrence (e.g., numerical, lexical, or order of entry into the system) .
  • the second step may involve a step of representing a subset in the form of an entity select vector of binary bits, where each of the binary bits of the binary bit vector has an order of occurrence which corresponds to the order of occurrence of the unique values in the set.
  • the binary bits of the bit vector represent the presence or absence of each of the unique values in the subset, i.e., the vector selects which entities or values in the value set are in the associated subset forming the column of the relation.
  • the third step may also involve the additional step of forming a binary representation of each relation in the relational database.
  • a vector of binary bits is created for each unique value in the subset.
  • Each binary bit of this binary bit vector corresponds to one of the rows of the column, and the binary bits can be formed in an order of occurrence corresponding to the order of occurrence of the rows in the column or in some other appropriate order.
  • each binary bit represents the presence or absence of each unique value in one of the rows of the column.
  • the present invention also has the capability of efficiently performing database operations on one or more relations.
  • Binary bit vector represented relations need not be decompressed in order to perform database operations.
  • Computer time is saved by not having to decompress the data, and processing is made more efficient, because only a relatively small amount of compressed data needs to be considered while performing the relational operations.
  • the relational operations are reduced to operations on bit strings.
  • This aspect of the present invention involves a method and/or apparatus using a computer, for efficiently performing a relational operation on one or more relations of a relational database in order to produce a binary representation of a resulting relation.
  • the relational operations performed are in the form of "SELECT,” "JOIN” (“EQUIJOIN” or "NATURAL JOIN”) and "PROJECT.”
  • the SELECT operation determines which one or more rows of a relation corresponds to one' or more selected unique values.
  • the selection process occurs in two steps. First, the binary bit vector(s) associated with the one or more selected unique values are retrieved. (Each binary bit of the binary bit vector indicates the one or more rows of the binary represented relation which contain the selected unique value) . Second, when there is more than one selected unique value, a Boolean OR operation is performed on the retrieved binary bit vectors to determine the binary representation of a resultant relation. More particularly, the resultant relation is represented by a binary bit vector which indicates the one or more rows of the relation which contain the selected unique values from one column of the relation.
  • the quantity of the rows of the relation which contain the one or more selected unique values can be quickly determined by counting the binary bits of the resultant relation which indicate the presence of the selected values.
  • the preferred embodiment also has the capability of performing SELECT on more than one selected unique value which occur in more than one column of the relation.
  • the selection process occurs in three steps. First, for each column, the binary bit vectors associated with the selected unique values are retrieved. Each binary bit of the binary bit vector indicates a row of the binary represented column which contains the selected unique values. Second, for each column, when there is more than one selected unique value from the column, a Boolean OR operation is performed on the retrieved binary bit vectors to determine a SELECT binary bit vector indicating the one or more rows of the relation which contain the selected unique values. This bit vector characterizes a subset of the rows of the original relation. Third, a Boolean operation (i.e., AND, OR, etc.) is performed on the SELECT binary bit vectors to determine a resultant relation.
  • the step for performing a relational operation also may include the step for performing a JOIN on one or more rows of a first binary represented relation with one or more rows of a second binary represented relation. More particularly, the JOIN operation may be specified to occur only when a column of the first relation contains values having a particular characteristic (i.e., belonging to particular domain) and a column of the second relation contains values having the same characteristic, which bear a desired relationship with values from the column of the first relation. That is, they are greater than, less than, equal to, equal to or greater than, equal to or less than the corresponding values of the first relation. (The second relation need not be distinct from the first.)
  • performing a PROJECT operation is also included.
  • the PROJECT operation comprises generating the binary bit vectors associated with any subset of columns of a particular relation. The columns selected can be reconstructed for a user to interpret.
  • Any relational operation which creates a second resultant relation can be implemented with the present invention.
  • SELECT, PROJECT and JOIN are thoroughly discussed, one skilled in the art of database systems understands that there are comparatively few practical problems that cannot be solved by SELECT, PROJECT and JOIN alone.
  • the other relational operations such as PRODUCT, UNION and DIFFERENCE can be implemented by the present invention as well.
  • functions such as INSERT and DELETE for updating rows of a binary represented relation and for updating unique values to sets of unique values, are also implemented by the present invention.
  • the present invention significantly enhances a computer's capability of processing relational operations by performing the relational operations directly at the binary bit level. Additionally, the Boolean operations can also be performed more efficiently by a special Boolean Logic Unit discussed in a co-filed application to Glaser et al. , entitled “A BIT VECTOR PROCESSOR WITH BOOLEAN OPERATION PROCESSING CAPABILITY,” filing date September 24, 1987.
  • FIG. 1 depicts a computer system equipped with a Relational Database Management System (RDMS) for creating a binary representation of a relational database and/or for efficiently processing relational operations on the one or more binary represented relations in accordance with the present invention
  • RDMS Relational Database Management System
  • FIG. 1A depicts the RDMS on which software programs and/or hardware are performed for representing the relations and for performing relational operations;
  • FIG. 2 represents a typical relational database provided by a user of the system depicted in FIG. 1;
  • FIG. 3A represents an ordered set S where the ordering is a lexical ordering
  • FIG. 3B represents an (ordered) subset of the ordered set represented in FIG. 3A;
  • FIG. 3C represents an empty or null set A
  • FIG. 3D depicts a binary bit vector expression a for characterizing the subset depicted in FIG. 3B
  • FIG. 3E is a binary bit vector a 1 for characterizing the ordered set shown in FIG. 3A;
  • FIG. 3F depicts a binary bit vector representation of a null subset a" shown in FIG. 3C;
  • FIG. 4 depicts a binary representation of the Suppliers relation as shown in FIG. 2;
  • FIG. 5 depicts a binary representation of the Parts relation as shown in FIG. 2;
  • FIG. 6 depicts a binary representation of the Shipments relation as shown in FIG. 2;
  • FIG. 7 is a flow diagram of the BINARY
  • FIGS. 8A and 8B are a table depicting the results of executing a series of commands for setting up the columns of the relational database shown in FIG. 2;
  • FIGS. 9A, 9B. and 9C are a table depicting the results of constructing the binary representation of the Suppliers relation (FIG. 2) by the BINARY REPRESENTATION routine;
  • FIG. 10A is a flow diagram of the INSERT routine;
  • FIG. 10B is a flow diagram of the INSERT VALUE INTO VALUE SET routine
  • FIG. IOC is a flow diagram of the UPDATE SUBSET routin ;
  • FIG. 10D is a flow diagram of the ADD VALUE TO
  • FIG. 11A is a flow diagram of the DELETE routine
  • FIG. 11B is a flow diagram of the DELETE VALUE FROM COLUMN routine
  • FIG. 11C is a flow diagram of the DELETE VALUE FROM COLUMN routine
  • FIG. 11D is a flow diagram of the DELETE VALUE FROM VALUE SET routine
  • FIG. 12 is a results table depicting the operations performed by the INSERT routine (FIG. 10A) ;
  • FIG. 13 is a table depicting the results of the operations performed by the DELETE routine (FIG. 11A) ;
  • FIG. 14 is a flow diagram of the SELECT routine;
  • FIGS. 15A and 15B depict a table of the results of the operations performed by- the SELECT routine for a two-column SELECT for two values;
  • FIGS. 16A and 16B depict a table of results of the operations performed by the SELECT routine (FIG. 14) on two columns for multiple values;
  • FIG. 17A is a flow diagram of the PROJECT routine;
  • FIG. 17B is a flow diagram of the DISPLAY/RECONSTRUCT routine
  • FIGS. 18A, 18B, 18C and 18D depict a table of the results of the operations performed by the PROJECT routine (FIG. 17A) ;
  • FIG. 19 depicts a binary representation of a JOIN relation;
  • FIG. 20 depicts , a more detailed version of the binary representation of the Suppliers portion of the JOIN relation (FIG. 19) ;
  • FIG. 21 depicts a more detailed version of the Parts relation portion of the JOIN relation (FIG. 19) ;
  • FIG. 22A is a flow diagram of the EQUIJOIN routine
  • FIG. 22B is a flow diagram of the BUILD ROW USE SETS routine
  • FIG. 22C is a flow diagram of the EVALUATE ROW USE SETS routine
  • FIG. 22D is a flow diagram of the CONSTRUCT JOIN ROW USE VECTORS routine
  • FIG. 22E is a flow diagram of the PRODUCTS routine
  • FIG. 22F is a flow diagram of the NUMS routine
  • FIG. 22G is a flow diagram of the GENERATE BIT STRING routine
  • FIGS. 23A, 23B, and 23C represent a table of the results of the operations performed by the EQUIJOIN routine (FIG. 22A) ;
  • FIG. 24 is a flow diagram of the GREATER THAN JOIN routine
  • FIG. 25A is a flow diagram of the DISPLAY/RECONSTRUCT FOR JOIN routine
  • FIG. 25B is a flow diagram of the REFERENCE RELATION routine
  • FIG. 25C is a flow diagram of the REFERENCE VALUE SET routine
  • FIGS. 26A, 26B, 26C, 26D, 26E, 26F and 26G depict a table of results for the operations performed by the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) ;
  • FIG. 27 represents a mapping of the elements of Set S into Set T; Set S is the "domain” and Set T is the “range.”
  • FIG. 28 represents the binary representation of
  • FIG. 29 depicts a binary representation of the Suppliers portion of the JOIN relation (FIG. 20) with entity use vectors added;
  • FIG. 30A is a flow block diagram of the DISPLAY/RECONSTRUCT WITH ENTITY USE VECTORS routine
  • FIG. 3OB is a flow block diagram of the REFERENCE RELATION routine
  • FIG. 30C is a flow block diagram of the REFERENCE
  • FIG. 31 depicts a relational database
  • FIG. 32 depicts a SYSTEM RELATION
  • FIG. 33 depicts the domains of the relational database shown in FIG. 31;
  • FIG. 34 depicts the ENTITY SELECT SET associated With the SYSTEM RELATION (FIG. 32) ;
  • FIG. 35 depicts the ENTITY USE SET, which is associated with the SYSTEM RELATION (FIG. 32) ;
  • FIG. 36 depicts the ROW SELECT SET associated with the SYSTEM RELATION (FIG. 32) ;
  • FIGS. 37A, 37B, 37C, and 37D depict the ROW USE SETS associated with the SYSTEM RELATION (FIG. 31) ;
  • FIGS. 38A, 38B, 38C, and 38D are flow block diagrams of the DATABASE IDENTIFICATION routine.
  • FIGURE 1 depicts a computer system having a programmable computer and computer programs for creating a relational database and for processing operations on one or more relations (also called tables) of a relational database.
  • the system includes programmable computer 2, display 3, entry device 11 for the computer and external device 12 such as a disk for storage of data.
  • Hardware/software for representing the relations and hardware/software for performing relational operations are housed in a Relational Database Management System (RDMS) 10 (shown in phantom lines) , which is connected within the computer 2.
  • RDMS 10 coordinates the various activities related to representing relations in the relational database and to performing relational operations on one or more relations.
  • RDMS Relational Database Management System
  • RDMS 10 is a programmable computer on a printed circuit board which can be easily employed within most standard computers, including per ⁇ sonal, mini-, and mainframe computers. It is envisioned that RDMS 10 may be a special purpose computer formed by one or more integrated chips.
  • RDMS 10 includes an optional Binary Bit Vector Processor (BBVP) 14, an optional Bit Vector Encoder (BVE) 16, an optional Map Vector Processor (MVP) 15, an optional memory 18, a Relational Processing Unit (RPU) 22, including a Boolean Logic Unit (BLU) 24, and a Command Interpreter 28.
  • BBVP Binary Bit Vector Processor
  • BVE Bit Vector Encoder
  • MVP Map Vector Processor
  • RPU Relational Processing Unit
  • BLU Boolean Logic Unit
  • External device 12 is a permanent or buffered storage, typically in the form of a hard disk, for storing information used in relations which are represented in expanded form where each value is typically no smaller than a byte.
  • the contents of the external device are typically maintained in records which are divided into fields where the n ⁇ 1 field of each record corresponds to a specific type of data»
  • the contents of the external device 12 is loaded via bus 30 to the RDMS 10 and specifically to RPU 22.
  • the RPU instructs BBVP 14 to convert each relation stored on the external device into a binary representation (to be more thoroughly discussed in PART IV) .
  • Bus 32 then transfers the binary representation of each relation to optional BVE 16.
  • the BVE 16 compresses the binary representation, the techniques of which are discussed in a co-filed patent application to Glaser et al entitled "A BIT-STRING COMPRESSOR WITH BOOLEAN OPERATION PROCESSING CAPABILITY", having Serial No. 07/100,761, filed September 24, 1987, which is herein incorporated by reference.
  • the BVE 16 employed within the RDMS 10, evaluates uncompressed bit-string representations of each relation and separates the bit- strings into one or more "impulses."
  • An impulse is a run, which is a string of one or more bits of a same binary value or a polarity (e.g., "O's" or "l's"), and an ending bit which has a polarity opposite the polarity of the run.
  • Software programs executed by the bit- vector encoder encode the bit vectors into one of several different compressed impulse formats. The compressed bit vectors are then sent via bus 32 back to the BBVP and then in turn to the optional memory 18.
  • Memory 18 may be a memory component included in the host computer such as an external device or a memory component included within the RDMS 10 (as shown in FIG. 1A) .
  • Memory 18 holds the compressed binary representations of relations before processing the relations at the RPU 22 or stores the relations after processing at RPU 22.
  • RPU 22 via the BBVP 14 performs relational type operations (e.g., SELECT, PROJECT, JOIN, INSERT, DELETE, etc.) . on one or more relations by processing unique software programs at the RPU 22 via a microcontroller (e.g., Intel 80386).
  • relational type operations e.g., SELECT, PROJECT, JOIN, INSERT, DELETE, etc.
  • the Boolean operations can be performed by the hardware or software embodiments of the BLU 24 discussed in Glaser et al (Serial No. 07/100,761). Even though the relational operations in the preferred embodiment may be implemented primarily by software, the RPU 22 can perform relational operations more efficiently in terms of storage, speed, etc. , than presently known techniques for performing relational operations on one or more relations.
  • BLU 24 can take full advantage of the unique properties of the latest components, such as 32-bit microprocessors, CPU'S, etc.
  • the MVP 15 generates map vectors, called entity use vectors, and the map vectors are used by the RPU .22 to map each row of a column of a relation to a unique value of the relational database.
  • the purpose of the entity use vectors is for facilitating the reconstruction and display of the information represented by the binary representations of the relations.
  • the MVP 15 is optional because the RDMS 10 can reconstruct and display the relations without having entity use vectors.
  • the entity use vectors are used to perform the DISPLAY/RECONSTRUCT process more efficiently.
  • the preferred embodiment contains a command interpreter 28 which interprets instructions for processing data in the relational database.
  • the instructions used could be those found in Structured Query Language (SQL) which has become an industry standard language for enabling users to communicate with relational databases.
  • SQL Structured Query Language
  • the operation of the components in the RDMS 10 are controlled by software programs implemented by RPU 22. Data lines 30, 31, 41, 44, 46, 48, 50, and 52 of FIG. 1A depict the data flow between the various components of the RDMS 10.
  • the command interpreter 28 need not be a part of the RDMS 10; instead, it could be loaded into the host computer 2. In the preferred embodiment, however, the command interpreter 28 is located within the RDMS 10.
  • relations stored in the external device 12 need to be updated by inserting or deleting values in the different domains, etc.
  • the information can be updated by transferring data, via bus 30, to RPU 22, which then inserts or deletes necessary values, in the form of bytes, in the domain, etc., and restores the domain back to external device 12 via bus 30.
  • This capability enables the RDMS 10 to add new and unique values to the relational database.
  • FIG. 2 is an example of a relational database, namely, the "SUPPLIERS and PARTS" relational database, and it has been adopted from Date, An Introduction to Database Systems, 4th Ed. , Chapters 4 & 11 (1986) , which is incorporated herein by reference.
  • FIG. 2 depicts three relations, a table for suppliers 63, a table for parts 65, and a table for ' shipments 67.
  • “Relation” is just a mathematical term for the word "table”, and these two words are used interchangeably in the specification.
  • Table 63 represents the information on the suppliers for a particular company. Each supplier has a Supplier ID column 80, a Supplier Name column 82, a Status column 84, and a City column 86, indicating where the supplier is located. As shown in table 63, each row of the table depicts information on a different supplier, and each column 80, 82, 84, 86 represents a different characteristic of each supplier. For example, row 69 of table 63 depicts supplier "SI" having the name "Smith” with a Status "20" and located in "London.”
  • Table 65 represents the parts the suppliers may sell. Each part has a Part ID 88, a Part Name column 90, a Color column 92, a Weight column 94, and a City column 96. As shown by each row of the table, it is also assumed that the part only comes in one color and it is stored in a warehouse in exactly one city. For example, row 77 of table 65 corresponds to a single part having part having part ID "PI”, a part name "NUT”, a color "RED”, a weight "12", and a location or city, "LONDON.”
  • Table 67 represents the shipments of parts (table 65) made by suppliers (table 63) .
  • Table 67 is really a connection between tables 63 and 65 (to be discussed in Part V) .
  • the first row 87 of table 67 connects a specific supplier (SI) with a specific part (PI) ; stated differently, it represents a shipment of parts of kind PI, by supplier SI, and the shipment quantity is 300.
  • SI specific supplier
  • PI specific part
  • each shipment is uniquely described by supplier ID, corresponding to column 98, a part ID, corresponding to column 100, and a quantity, corresponding to column 102. It is assumed that at most, one shipment at any given time for a given supplier can be made for a given part.
  • the combination of SI and PI having a quantity shipment of 300 at row 87 is unique with respect to the set of the shipments appearing in table 67.
  • the Shipments table is a "relationship" between a particular supplier and a particular part.
  • the "SUPPLIERS and PARTS" relational database (FIG. 2) describes, in reality, a very elementary database. Databases are likely to be much more involved, containing many more entities and relationships. This database, however, is sufficient to illustrate what a relational database is and to illustrate the novel features of the present invention.
  • each of the data values depicted in the tables 63, 65, 67 are "atomic". That is, at every row and column position in every table, 63, 65, 67, there is always exactly one data value, never a set of values. For example, in table 63, at row 69 and column 89, the status for the supplier Smith is a single value "20" and not a set of statuses. Also, note there are no links connecting one table to another table. In the example of FIG. 2, there is a relationship between the supplier row 69 of table 63 and the part row 77 of table 65, because supplier SI supplies PI as shown by the existence of row 87 of table 67, in which SI has sold 300 Pi's.
  • relational models as disclosed by the prior art represent database systems "at a level of abstraction that is somewhat removed from the details in the underlying machine.”
  • relational database at the external level remains the same, the internal level of the relational database is actually depicted in the form of columns, as discussed in PART IV.
  • a domain is a set of unique values, and each domain has only one characteristic.
  • the domain for supplier identifiers 66 is the set of all possible unique supplier identifiers which are referenced in the system.
  • the domain for person names 70 is the set of all unique names
  • the set of numbers 78 is the set of all integers greater than 0 and less than 10,000 (for example). Domains are pools of unique values, from which actual values appearing in the columns of the tables 63, 65 and 67 are drawn. Typically, there will be values in a given domain that do not concurrently appear in one of the columns of one of the relations.
  • the value S8 or S10 may appear in the domain of supplier identifiers 66, but note that no supplier S8 or S10 actually appears in relation 63. They may appear in some other column of some other relation. Furthermore, each column of a relation corresponds to one of the domains. For example, supplier identifier domain 66 corresponds to column 80 of table 63.
  • RPU 22 acts as a file manager to retrieve each of the records associated with the particular columns. The columns will then be sent via bus 30 to the RDMS 10 or for further processing.
  • the RDMS 10's view of the database on the external device 12 is a collection of stored columns (in byte form) , and that view is supported by a file manager and disk manager (not shown) which may or may not be a part of the RDMS system.
  • a file manager and disk manager not shown
  • the relation will be converted into a binary representation via the BBVP.
  • This unique tabular representation of the relation is one aspect of the invention and it will be thoroughly discussed in part IV.
  • Part III is devoted entirely to creating a fundamental framework in which a binary representation of a relation can be developed.
  • One aspect of the present invention is the representation of a relational database by bit strings which do not contain the raw data values of the relational database. Instead, each value of the database is represented by a single binary bit within an ordered set of binary bits.
  • the basic building block for representing a relation of a relational database relies on the concept of ordered sets.
  • Ordered sets consist of ordered pairs, ⁇ M, X>, where each ordered pair represents an element of a set, where M is a symbol that defines the location or position of the value X within the set.
  • M is a non-negative integer.
  • every element of the ordered set constitutes a function between the integer describing the ordinal position M and the corresponding value X in the set.
  • the value M increases by one for each forward progression of one element in the set.
  • An ordered set with unique values can be defined as a set having elements X and Y which have ordinal positions M and N, respectively, and for all ⁇ M, X>, ⁇ N, Y> in the set, M is equal to N if, and only if, X is equal to Y.
  • Two elements of an ordered set can only be equal when they are at the same ordinal position in the set, as all ordinal positions in the ordered set are unique.
  • the ordering of values within ordered sets (ordering rule) namely, the way in which each value matches an ordinal position, is arbitrary. It depends on the system's or user's choices and requirements.
  • a "vector" consists of an ordered set of elements.
  • a vector consisting of a set "S" of elements is a one- dimensional array, where each value corresponds to a vector element whose ordinal position over the vector is implied. More particularly, a vector may define an ordering to a set of elements in an ordered or non- ordered set. Given the ordered set "S" 148 of FIG. 3A, a "binary-bit vector" a can uniquely define a subset "A" (FIG. 3D) of "S" 148 (FIG. 3A) , by representing the elements of "S" in "A".
  • each data value must represent whether an element of set "S" is in subset "A” or is not in subset "A” and, thus, a binary bit string, where each bit corresponds to an ordinal position of set "S", can represent subset "A”.
  • binary bit-vectors contain l's and O's, a "1” indicating a set element is present, and a "0” indicating that it is not present.
  • Each position of each binary bit within a binary bit-vector corresponds to an ordinal position of an element in subset "A" (FIG. 3A) .
  • a binary bit-value combined with its implied ordinal position within the vector, is an "element" of the vector.
  • the presence, of a "1" bit in the vector is equivalent to representing a value of the ordered set by only its ordinal position. In this way, representations of data can be operated on without operating on the data values themselves.
  • FIG. 2 depicts a relational database for suppliers, parts and shipments.
  • the relational database consists of tables 63, 65, and 67, which depict the Suppliers, Parts and Shipments relations, respectively.
  • domain 60 contains the unique values which are referenced by each of the tables 63, 65 and 67.
  • the domain 60 contains unique values for supplier identifiers 66, part identifiers 68, person names 70, part names 72, cities 74, colors 76, and numbers 78.
  • One aspect of the present invention is to create a binary representation of a relational data base, such as the one shown in " FIG. 2. Specifically, the binary- representation for the relational database shown in FIG.
  • FIG. 4 shows the binary representation of the suppliers table 63 (FIG. 2)
  • FIG. 5 is a binary representation of the parts table 65
  • FIG. 6 is a binary representation of the Shipments table 67 (FIG. 2) .
  • FIG. 4 can be broken up into two portions as shown.
  • the top portion of the figure represents the value sets or domains associated with the suppliers relation 63 of the relational database in FIG. 2. More particularly, domain 160 (FIG. 4) is associated with suppliers identifier domain 66 of FIG. 2.
  • Domain 162 corresponds to the Person Names domain 70 of FIG. 2.
  • Domain 164 corresponds to the Numbers domain 78 (FIG. 2)
  • domain 166 corresponds to the Cities domain 74 of FIG. 2.
  • the values of domains 160, 162, 164 and 166 have been arbitrarily chosen and formed in particular orders of occurrence, namely, lexical ordering for domains 162 and 166 and numerical ordering for domains 160 and 164.
  • each of the domain 160, 162, 164 and 166 is represented by corresponding binary bit-vectors 176, 178, 180, and 182.
  • Each binary bit-vector which is used for the purpose of characterizing a domain, is called herein an entity select vector.
  • the binary bit-vector is used to represent a selection (subset) of entities from a set of entities, where an entity is defined to be a unique value or grouping of values.
  • an entity may be a value in a value set or a row in a relation.
  • the entity select vector has a quantity of binary bits equivalent to the quantity of unique values in a given domain.
  • domain 160 for supplier identifiers contains ten supplier identifiers for ten corresponding binary bits.
  • Each binary bit of the entity select vector has a position which directly corresponds to the position of each of the unique values in the supplier identifier domain 160.
  • each of the binary bits represents the presence or absence of each of the unique values in the subsets represented by the entity select vector 176.
  • the binary bit 236 of entity select vector 176 corresponds to the unique value SI at 216 of domain 160.
  • the binary bit 236 has a value of "1" which indicates that the unique value SI is present in the subset represented by entity select vector 176.
  • binary bits 238, 240, 242 and 244 also indicate that the unique values S2, S3, S4, and S5 are present in the subset.
  • Binary bits 246, 248, 250, 252 and 254 are set to "0", indicating that unique values S6 226, S7 228, S8 230, S9 232, and S10 234, respectively, are not in the subset.
  • the binary bit vector represen ⁇ tation of the unique values of the domain 160 is a short-hand way of representing a subset of the domain 160 by indicating those unique values which are present in the subset. This aspect of the present invention is an important one. By only representing a subset of a domain, all values of a domain need hot be associated with a column. More particularly, only the entity select vector which corresponds to a particular column of a relation, and not the domain, needs to be associated with the column.
  • entity select vector 176 is associated with column 168
  • entity select vector 178 is associated with column 170
  • entity select vector 180 is associated with column 172
  • entity select vector 182 is associated with column 174.
  • Columns 168, 170, 172 and 174 correspond to the columns of the suppliers relation 63 Of FIG. 2.
  • each binary bit-vector 184, 186, 188, 190 and 192 corresponds to one of the values indicated to be present in the subset by the entity select vector 176.
  • the binary bit-vector is called a row use vector because it indicates the presence or absence of a unique value in one or more rows of a column in the relation.
  • the row use vectors are combined into a row use set 260 to form the binary representation of the column 168.
  • the order of row use vectors occurs in the same order as the order of values characterized by the entity select vector.
  • the lower order binary bits of the entity select vector have been arbitrarily assigned to correspond to the left most row use vectors of the row use set, and the higher order binary bits of the entity select vector correspond to the right most row use vectors of the row use set.
  • the implied mapping between the bits set to "1" of the entity select vector could have just as. easily been reversed.
  • the lower order binary bits may correspond to the right-most row use vectors of the row use set.
  • Binary bit 236 of the entity select vector 176 corresponds to row use vector 184 of the row use set 260.
  • binary bit 238 of the entity select vector 176 corresponds to the next row use vector 186 of the row-use set 260.
  • the system interpreting the entity select vectors and the row use set is pre-programmed so that the binary bits indicate unique values present in the subset.
  • the implied ordering scheme is illustrated by the dotted lines 268, 270, 272, 274 and 275, which show that the binary bits in entity select vector 176 correspond to the row use vectors 184, 186, 188, 190 and 192, respectively.
  • the binary bits in the entity select vector 180 which represent the unique values "10'.', "20”, and "30", are mapped in an implied manner to the row use vectors 204, 206 and 208, respectively.
  • each row use vector can indicate the presence or absence of a unique value in one or more rows of a column of the relation. For example, in row use vector 206, corresponding to the unique value "20", "1" bits indicate the presence of the value "20" in the first and fourth rows of column 172. Because the binary representation of each of the columns is represented by an entity select vector to unique values, the columns 168, 170, 172 and 174 of the relation need not be represented as a set of actual values in the database memory. Only the row use sets and the corresponding entity select vectors need be stored in the memory of RDMS 10 (e.g., memory 18, FIG. 2) .
  • FIG. 5 and FIG. 6 represent the entity select vectors and their corresponding row use sets for the Parts Relation 65 (FIG. 2) and the Shipments relation ' 67 (FIG. 2) .
  • domains 282, 286, 290, 294 and 298 correspond to the domains for part identifiers 68 (FIG. 2) , part names 72 (FIG. 2) , colors 76 (FIG. 2) , numbers 78 (FIG. 2) , and cities 74 (FIG. 2) respectively.
  • the numbers domain 294 and the city domain 298 of the Parts relation (FIG. 5) are identical to the numbers domain 164 and the city domain 166 of the suppliers relation (FIG. 4) .
  • An efficiency of the present invention is that the unique values of the domains need only be represented once. In other words, the unique values in the numbers domain 164 (FIG.
  • the city domain 166 (FIG. 4) and the city domain 298 ,(FIG. 5) are the same domain of unique values of cities.
  • the representation of the subsets for these domains is different.
  • the entity select vector 180 (FIG. 5) represents that the unique values "10", “20” and “30” of the numbers domain are in the relation of suppliers (FIG. 4), whereas the unique values "12", “14", “17” and “19” are in the relation for parts (entity select vector 296, FIG. 5) .
  • the system instead of having to store two complete sets of unique values 164 and 294, which are identical, the system only stores one version of the numbers domain and two entity select vectors, namely, 180 and 296, to represent subsets of the same domain. Likewise, the system need not retain two unique sets of values for cities, 166 and 298, instead, the system maintains only one version of the city domain and two entity select vectors, 182 and 300, to identify the subsets of cities in the associated relations. In fact, any number of entity select vectors may be associated with a particular domain. For example, in FIG. 6, the domain for numbers 328 is also referenced; however, the entity select vector 330 depicts a different subset of unique values from the entity select vectors 180 (FIG. 4) and 296 (FIG. 5) . Entity select vectors are efficient subset representations of domains because the same domain need not be represented more than once and the actual values of each domain can be uniquely represented by the entity select vector's binary bits.
  • the columns of the Parts relation 302, 304, 308, 312, 316 are represented in the relational database by the row use sets 304, 306, 310, 314, 318, respectively.
  • the row use sets contain row use vectors which are each associated with a unique value indicated to be present in a subset of a domain by an entity select vector.
  • FIG. 6 depicts a binary representation for shipments.
  • the domains associated with each of the relations with their unique sets of values are stored on the external . device 12 (FIG. 1A) , and the binary representation of each of the relations (FIGS. 4, 5 and 6) which make up the relational database of FIG. 2 are stored in memory 18 of the RDMS 10 (FIG. 1A) .
  • the binary representation of the relations need be referred to. A detailed discussion of this process will follow shortly.
  • FIG. 7 is a flow block diagram of the steps which are performed by the RDMS 10 for creating a binary representation of a relational database.
  • each domain necessary for specifying unique values in the relations of the relational database is identified.
  • Input commands are interpreted by the command interpreter 28 (FIG. 1A) for creating domain identifiers in the memory 18 (FIG. 1A) of the RDMS 10 (FIG. 1A) .
  • the system reads each of the instructions above, and an empty value set is created for each of the domains listed in the instruction. The actual unique values associated with each domain are loaded into memory in a later step of this routine.
  • the system determines whether any more domains for the database need to be specified.
  • the system identifies a table of the relational database, and the RDMS 10 builds an entry in the system relation (see Part VI) .
  • the system identifies a column, as with the table identified in block 348.
  • the RDMS 10 builds on entries in the system relation identifying each column, and the RPU 22 via the BBVP 14 creates empty entity select vectors associated with each of the domains. The entity select vectors will have their binary bits set when the columns of the particular relations have their values loaded into the system.
  • Processing continues at blocks 350 and 352 until all of the relational columns for the particular table identified during block 348 have been identified.
  • the system asks for the next table or relation of the database. Assuming that a number of different tables exist in the relational database, blocks 348, 350, 352 and 354 are processed until all of the tables and their related columns have been specified.
  • system identifiers identifying each column of each relation, are constructed. These identifiers are discussed more thoroughly in part VII.
  • the system loads a file associated with each table of the relational database.
  • the file of the table is assumed to exist in the external device 12 of the system depicted in FIG. 1.
  • the system loads the file representations of the relations into external device 12, where they reside until summoned by the RDMS system 10.
  • RPU 22 instructs the external device 12 to transfer the first column associated with a particular table of the relational database via bus 30 to the RPU 22.
  • the particular column is retrieved by referring to certain system identifiers (i.e., AID, RID identifiers, Part VII) .
  • AID i.e., RID identifiers, Part VII
  • a bit is set to "1" in the first position of the row -use vector to indicate that the value occupies the first row of the first column of the relation.
  • a new row use vector is created and a binary bit associated with a particular row of the column, where the unique value resides, is set to "1".
  • the binary bits of the entity select vector, associated with the column are set to "1" to indicate the unique values of the domain referenced in the column.
  • the corresponding bits of the row use vector associated with the unique value are set to "1" indicating the presence of the unique value in these rows of the column. Because the entity select vector has previously had the binary bit associated with the unique value set to "1", it need not be set again.
  • This process of converting the column into a row use set continues until all of the values of the column have been represented by binary bits.
  • the RPU 22 commands the external device 12, to bring the next column, via bus 30, into the RDMS 10. This process continues for all of the columns associated with a particular table until the row use sets for the table are completely constructed.
  • the system determines if any other tables need to be binary represented by the RDMS 10, assuming a single input file corresponds to a single table. ' If other tables are to be constructed, then blocks 356, 358 and 360 are performed until all the columns of the next relation are represented in their binary form. After all of the binary bit vectors (i.e., row use vectors and entity select vectors) are constructed, the RPU 22 may summon the bit vectors to bit vector encoder BVE 16 of the RDMS 10 to encode the bit vectors into compressed impulse format. The steps for compressing the bit vectors into compressed impulse formats are thoroughly discussed in Glaser et al. RPU 22 then instructs the bit vectors to be sent via bus 48 to the memory 18 where they are stored.
  • the relations can be updated by two utility functions called INSERT and DELETE, which will be discussed in PART V of the specification.
  • INSERT function
  • DELETE DELETE
  • the INSERT function perhaps with appropriate modification to enhance efficiency, can also be used to load additional rows into the relation.
  • the relations in their binary represented form can be manipulated via the relational operations SELECT, JOIN and PROJECT, also to be discussed in PART V.
  • FIG. 2 a hypothetical relational database is shown.
  • RPU 22 FIG. ' 1A
  • FIGS. 4, 5 and 6 This example has been designed to emphasize the steps of the BINARY REPRESENTATION routine for constructing the binary representation of a relational database in FIGS. 4, 5 and 6.
  • the assumption is made that the reader understands instruction formats.
  • the system of FIG. 1 creates a relational database as shown in the tables 63, 65 and 67 (FIG. 2) . These tables are constructed as inputs to the system. Once the input process is complete, the system calls the BINARY REPRESENTATION routine (FIG. 7) . The system, during block 344, creates the domains referenced by the relational database. Input instructions, specify the following domains; suppliers identifiers 66, the parts identifier 68, person names 70, part names 72, city names 74, colors 76, and numbers 78 (FIG. 2) .
  • the input' instructions in pseudo code look like:
  • FIG. 1A interprets the instruction (A) in block 344, RPU 22 creates an empty value set (364, FIG. 8A) .
  • block 346 the system determines whether there are any other domains to be identified by the system. In fact, there are other domains to be identified as specified by the instructions listed above; thus, block 344 is called.
  • the command interpreter 28 evaluates statement B and RPU 22 creates an empty value set to reference the domain of part identifiers (366, FIG. 8A) .
  • the system determines that there are still more domains to be identified, and thus, block 344 is called.
  • the system interprets the next instruction (C) , which is for the "person names" domain.
  • the RPU 22 (FIG. 1A) generates an empty value set (368, FIG. 8A) .
  • BBVP sets all the vectors to "0" to indicate the empty entity select vector (368, FIG. 8A) .
  • the system determines that there are still more instructions for identifying domains. Processing continues at block 344, and an empty value set for part names is generated (370, FIG. 8A) .
  • Block 346 determines that there are still more domain instructions to be processed and processing continues at block 344.
  • an empty value set corresponding to the city domain is created (372, FIG. 8A) .
  • the system determines that there are still more commands for identifying domains, so processing returns to block 344.
  • Block 344 the system generates an empty value set corresponding to the domain "colors" (374, FIG. 8B) .
  • Block 346 determines that there is one more instruction left (G) for creating a domain, and thus, block 344 is called.
  • the RPU 22 creates an empty value set for the "numbers" domain (376, FIG. 8B) .
  • CREATE TABLE SUPPLIER (SUPPLIERS ID; PERSON NAME; STATUS; CITY) ; which indicates that a table for suppliers is to be identified; specifically, an identifier for the suppliers table is stored in memory 18 of the RDMS 10.
  • the system generates an identifier for the first column of the relation, which is also stored in memory 18. Specifically, it interprets the command above, and a suppliers ID identifier is stored in memory. (Identifiers to be discussed in PART VI).
  • the system determines that the command requires that other columns be identified for the relation, so processing continues at block 350.
  • the system identifies the column for "person names" and stores an identifier indicating such in the memory 18.
  • an empty entity select vector is created, which is associated with the "person names” domain.
  • the system determines that there are still other columns to be identified, and thus, processing continues in block 350.
  • the system stores an identifier for the next column associated with "status” into the memory 18.
  • an empty entity select vector associated with the "status” domain is created.
  • the system determines that there is still a column remaining to be identified; specifically, during block 350, the system stores an identifier for the column associated with "city”.
  • an empty entity select vector associated with the "city” domain is created.
  • the system determines that no more columns need be identified, and thus, processing continues at block 354, which determines whether there are any more tables to be identified for the relational database.
  • the RPU 22 instructs the external device 12 to download the files associated with each table of the relational database via bus 30 to the RDMS 10 (FIG. 1A) .
  • the byte values of each column of the relation are sent to the RPU 22 and the appropriate vectors of the row use sets are built via BBVP 14, and the binary bits of the entity select vectors are set there as well.
  • BBVP 14 the binary bits of the entity select vectors are set there as well.
  • FIGS. 9A, B-and C are results tables depicting the formation of the binary representation of the suppliers relation of FIG. 4. Each row of the FIGS. 9A, B and C depicts an additional row use vector associated with one column in the relation.
  • the first value SI (69, 80 FIG. 2) is inserted into the first ordinal position of the value set for suppliers identifiers. Additionally, the first binary bit of the entity select vector 417 associated with the "suppliers identifier" domain is inserted and set to "1" to indicate that the unique value SI is referenced in the suppliers ID column. The column for the suppliers identifier requires a new row use vector 381 to be generated nd the first bit of the row use vector 381 is set to "1" to indicate that SI occupies the first row of the column.
  • FIG. 9A A new row use vector is added for each value present in the subset, as indicated at 383-389, and a new bit is added to the entity select vector 417. Additionally, five more bits, set to "0", are added to the ordinal positions of the entity select vector corresponding to the new values of the value set.
  • the row use set 260 and the entity select vector 176 have now been generated by the BBVP 14 via the BINARY REPRESENTATION routine (FIG. 7) .
  • the next column of the suppliers table is evaluated by the BBVP 14.
  • the next column in the relation is the "person names” column (82, FIG. 2) , and the first value of the "person names” column is Smith (69, 82, FIG. 2 ) .
  • the value Smith is inserted to the first position of the "person names” value set. Additionally, the entity select vector 419 associated with "person names” has a first binary bit inserted and set to "1" to indicate that the unique value for Smith is referenced in the column for person names. A row use vector for Smith has not been created, and thus, a new row use vector 401 is generated (388, FIG. 9A) . This process is repeated, as indicated in rows 390-396, for the next four names added to the "names" value set, with a new row use vector added for each name, as indicated at 403-409, and a "S" bit added to the entity select vector 419.
  • the order of the row use vectors 409, 405, 407, 403 and 401 corresponding to Adams, Blake, Clark, Jones and Smith is in the order of occurrence of the binary bits set "1" in the entity select vector 419.
  • the values Baker, Fabel, Rahn, Ross and Young are added to the "person names” value set. Specifically, the values are added in an order corresponding to the ordering (396, FIG. 9B) .
  • Binary bits set to "0" are also inserted into entity select vector 414 in the ordinal positions corresponding to the newly added unique values. The new binary bits are set to "0" to indicate the remaining values do not occupy the column.
  • the row use vectors 409, 407, 405, 403 and 401 correspond to the row use set 262 of FIG. 4 and row use vector 262 corresponds to entity select vector 178 of FIG. 4, which is the same as the entity select vector 419.
  • the next column of the input suppliers relation is for the "status” (84, FIG. 2), and the first value of the column is "20" (64, 84 FIG. 2) .
  • the value "20” is inserted into the first position of the value set for status numbers 421. Additionally, a first bit is inserted to entity select vector 421 and set to "1", indicating that unique value "20” is referenced in the relational column.
  • a new row use vector 411 is created, and the first binary bit of the row use vector is set to "1” indicating the presence of value "20" in the first row of the column (398, FIG. 9B) .
  • the process is repeated, as indicated in rows 400-406 of FIG. 9B, for each value in "status” set, resulting in row use vectors 411-415.
  • the next value in the column is 20.
  • the value 20 already exists in the value set for numbers, and thus, it need not be added again.
  • the entity select vector 421, for numbers need not be set because a binary bit corresponding to unique value 20 has already been set to "1" in a previous step.
  • This value also corresponds to an already existing row use vector 411.
  • Binary bits set to "0" are added to row use vectors 413 and 415 to indicate that the values 10 and 30 do not occupy the fourth row of the relational column for status.
  • a binary bit set to "1" is added to the row use vector 411 to indicate that the unique value 20 is also in the fourth position of the relational column for status.
  • the last column of the relation for suppliers is the "cities” column.
  • the first value of the “cities” column is London (69, 89 FIG. 2) .
  • the value London is placed into the first position of the value set for "cities”.
  • the entity select vector 423 as with the names of cities has a first binary bit inserted and set to "1" to indicate that the city London is referenced in the relational column for cities.
  • a row use vector 417 is created and a binary bit set to "1" is added to the row use vector to indicate that London occupies the first row of the column for cities (408, FIG. 9C) , "Paris” is added (row 410) in the same way. Since "Paris” and “London” occur twice, the row use vectors have bits added, as indicated, at rows 412 and 114.
  • the last value in the column for cities is Athens
  • the value Athens does not exist in the cities value set, and so it is added. Specifically, Athens is placed into the first position of the value set corresponding to the lexical ordering of the city names.
  • the entity select vector 423 for cities has a third binary bit inserted and set to "1" in the first ordinal position of the entity select vector to indicate that Athens is referenced in the column for cities.
  • a new row use vector 421 is added to the row use set.
  • Binary bits set to "0" are added to the row use vectors 417 and 419 to indicate that the unique values London and Paris do not occupy the fifth row of the column.
  • the new row use vector contains five bits and the fifth bit is set to "1" to indicate that the fifth row of the column contains the value Athens.
  • the values Cleveland, Fresno, Harrisburg, Los Angeles, New York, Rome, and San Francisco are added to the value set from the input column.
  • the values are arranged in the value set according to a lexical ordering. For each additional value, a corresponding binary bit set to "0" is inserted at a corresponding ordinal position of the entity select vector.
  • the binary bits set to "0" indicate that the values are not referenced by the column (418, FIG. 9C) .
  • the row use vectors 421, 417 and 419 corresponding to Athens, London and Paris, are in the order of occurrence corresponding to the binary bits set in the entity select vector 423. Additionally, the row use vectors 421, 417 and 419 correspond to the row use set 266 of FIG. 4, and entity select vector 423 is associated with the entity select vector 182 (FIG. 4) .
  • processing continues at decision block 360 in which the RPU 22 determines whether there are more files associated with the PARTS and the SHIPMENTS tables of the relational database. Processing continues at blocks 356 and 358 until the binary • representation for the PARTS and SHIPMENTS tables are constructed (418, FIG. 9C) .
  • the binary representations for the PARTS and SHIPMENTS tables are generated in the same fashion as the SUPPLIERS table (FIGS. 9A, 9B and 9C) discussed above. Processing returns during 362 to the calling routine of the.BINARY REPRESENTATION routine (FIG. 7) .
  • FIGS. 10A, 10B, IOC, 10D, 11A, 11B, 11C, 11D, 12, 13, 14, 15A and 15B, 16, 17, 18, 19, 20, 21, 22A, 22B, 22C, 22D, 22E, 22F, and 22G depict flowcharts of operations performed on relations in their binary represented form.
  • FIGS. 5A, B, C and D are flow diagrams of the utility function called INSERT.
  • FIGS. 11A, B, C and D are flow diagrams of the function DELETE.
  • FIG. 14 is a flow diagram of the relation operation called SELECT.
  • FIG. 16 is a flow diagram of the relational operation called PROJECT, and FIGS.
  • 22A, 22B, 22C, 22D, 22E, 22F, and 22G are flow block diagrams of the operation JOIN.
  • the functions INSERT and DELETE are basically for maintaining and manipulating data within the relations.
  • the relational operations, SELECT, PROJECT and JOIN are for generating resultant relations, and in the preferred embodiment, in a binary represented form. Only three relational operations SELECT, PROJECT and JOIN are discussed in order to simplify this disclosure and to provide a basic understanding on how relational operations are preformed on binary representations of relations. For example, the operations PRODUCT, UNION, INTERSECTION, DIFFERENCE and DIVIDE, which are described in Date, "An Introduction ' To Database Systems," Vol. 1. (4th ed.
  • the resulting compressed bit strings are then stored to memory 18. They stay in memory 18 until a request to perform a relational operation (i.e., INSERT, DELETE, SELECT, PROJECT, JOIN) is initiated and is interpreted at the command interpreter 28.
  • a relational operation i.e., INSERT, DELETE, SELECT, PROJECT, JOIN
  • FIGS. 4, 5 and 6 the binary representations of the relations of the rela- tional data base stored in memory 18 are shown in FIGS. 4, 5 and 6. It is also assumed that the bit vectors of the binary represented relation could also be in compressed impulse formats. However, for ease of understanding, the bit vectors are processed in the uncompressed form.
  • the binary represented relation(s) are brought to the RPU 22 via buses 48 and 31.
  • the specified relational operation is performed on the relation(s) . If any Boolean operations need be performed by the relational operation, then the steps for processing compressed bit strings as set forth in Glaser et al. are preformed by the BLU 24. Once processing is completed at the RPU 22, the RPU 22 outputs a new binary represented relation. The new output relation is sent to memory buses 31 and 48, where the resultant relation resides until it is sent back to RPU 22 for further processing.
  • A. INSERT INSERT is a function which adds one value at a time to a relation.
  • the necessity for performing an INSERT operation occurs in three categories of cases. First, a unique value needs to be added to a domain or value set, and it needs to be added to a column of a relation. Second, a unique value already exists in the value set, and it needs to be added into a column of a relation. Third, a value already exists in a column and it needs to be added again to the column. Multiple values may be added to a value set or to a column; however, the INSERT subroutines must be processed more than once corresponding to each time a value is added.
  • the function INSERT can be used to add values to more than one column of a relation.
  • the INSERT function is separately performed each time a value is added to the relation.
  • FIGS. 10A, B, C and D depict routines for processing any one of the three situations
  • FIG. 10A is a routine for adding a unique value to a value set, and adding the unique value to a column of a relation; the routine is called INSERT.
  • FIG. 10B is a flow diagram of a subroutine for adding a unique value to the value set, and this routine is called INSERT VALUE INTO VALUE SET.
  • FIG. IOC is a routine for updating an entity select vector to reflect the addition of a unique value into a particular subset; this routine is called UPDATE SUBSET.
  • FIG. 10D is a routine for updating a column of a relation with a new occurrence of a value; this routine is called ADD VALUE TO COLUMN. The routine in FIG.
  • FIG. 10D would by itself be used to insert a value into a column of a relation when the value already existed in a column.
  • the flow diagrams of FIG. IOC and 10D are combined for inserting a value already existing in the value set into a subset of values of the value set and adding the value to a column of a relation.
  • FIG. 10A a more detailed description of the INSERT routine is now discussed.
  • the system calls the INSERT VALUE INTO VALUE SET routine (FIG. 10B) to add a unique value to the value set.
  • block 424 is called to call the subroutine UPDATE SUBSET (FIG. IOC) .
  • This routine updates an entity select vector corresponding to the value set so that the unique value is represented in an associated subset.
  • the system performs the ADD VALUE TO COLUMN subroutine (FIG. 10D) to add the value into a specified column of the relation. Processing is completed; a value is added to the value set and to a column and the system returns at block 428 to the calling program.
  • the RPU 22 via the BBVP 14 determines the ordinal position in the value -set at which the new, unique value is to be inserted.
  • the value set is stored in a structure which is traversed to find the new value.
  • the structure contains all of the unique values presently stored in the value set, and is built as such to minimize access time for finding values.
  • the system locates a node of the structure, which corresponds to the ordinal position at which the new, unique value should be placed.
  • the system determines whether the value already exists. If the value already exists, then during block 434, the system returns to the calling program. Assuming that the value is not in the struc- ture, then the new value is added to the value set by adding a node assigning an already existing node in the structure to incorporate the unique value.
  • processing returns to the calling program.
  • the RPU 22 via the BBVP 14 determines the ordinal position of the entity select vector which corresponds to the ordinal position of the unique value in the value set.
  • decision block 441 the RPU 22 determines whether the unique value has been added to the value set. If the unique value has been added, block 442 is called.
  • block 442 the system inserts a bit to the entity select vector at the ordinal position corresponding to the new unique value. Processing continues at block 444.
  • the new bit added to the entity select vector is set to "1" to indicate the unique value in the subset.
  • block 443 is called.
  • the RPU 22 via the BBVP 14 determines whether the bit in the entity select vector has been set to "1". The binary bit set to "1" indicates that the column contains this unique value. If the bit has been set to "1", the processing returns to the calling program during block 445. However, if the bit is not set, then processing continues in block 444. In block 444, the new bit added to the select vector is set to "l" to indicate that the unique value is in the subset. During block 446, processing returns to the calling program. Referring to FIG. 10D, a more detailed discussion of the ADD VALUE TO COLUMN routine is now discussed.
  • the RPU 22 via BBVP 14 counts the number of binary bits set to "1" in the entity select vector up to and including the bit at the ordinal position corresponding to the unique value inserted. This number is called “count”.
  • the "count” of binary bits set to "1” corresponds to the location of the row use vector in the row use set.
  • the system inserts the new row use vector at the position in the row use set corresponding to "count.”
  • the system appends a binary bit set to "0" to all of the row use vectors of the row use set.
  • the system sets the last bit of the new or selected row use vector to "1" to indicate that the new value is added to the last row of the column.
  • processing returns to the calling program.
  • processing continues at blocks 452.
  • bits set to "0" are appended to the existing row use vectors and during block 454, the last bit of the new row use vector is set to "1".
  • FIG. 12 is a detailed results table depicting the various steps performed by the INSERT routine (FIG. 10A) .
  • the results table of FIG. 12 is broken up into three columns.
  • the first column depicts the existing value set
  • the second column is the entity select vector characterizing a subset of the value set
  • the third column is the row use set representing the names column of the relation.
  • Each column of the results table depicts a change in either the value set, entity select vector, or row use set as the INSERT routine (FIG. 10A) is performed.
  • the subroutines depicted in FIGS. 10A, B, C and D are transparent to the application. Only the following type of instruction is required:
  • This instruction is interpreted by- the command interpreter 28 (FIG. 1A) to add a new and unique value Zeus to the value set for names and to add the word Zeus to the binary representation of the relation for suppliers, currently stored in memory 18.
  • the RPU 22 brings the binary representation of the suppliers relation from memory 18 to the RPU 22. Additionally, the value set is brought from the external device 12 via bus 30 to the RPU 22. Because the unique value Zeus is not part of the value set names, the RPU 22 calls the routine (FIG. 10A) to insert the value Zeus into the column of names in the relation and also to add the unique value Zeus to the value set of names, referring to the INSERT routine (FIG. 10A) .
  • the RPU 22 calls the subroutine INSERT VALUE INTO VALUE SET (FIG. 10B) .
  • the system determines the ordinal position in which the value Zeus is to be inserted.
  • the RPU 22 determines that Zeus does not exist in the value set, and in block 436, the RPU 22 adds Zeus to the proper node in the structure for the value 494 (FIG. 12) . If a node did not exist in the structure, then a new node would be added and set to Zeus.
  • the RPU 22 returns to block 424 of the INSERT routine (FIG. 10A) .
  • the RPU 22 calls the subroutine UPDATE SUBSET (FIG. 10C) .
  • the RPU 22 via the BBVP 14 determines the ordinal position of the entity select vector which corresponds to the unique value Zeus in the value set. The ordinal position is the last position of the entity select vector.
  • RPU 22 determines that the value Zeus had been previously added to the value set of names, and processing continues at block 442.
  • the system inserts a bit to the entity select vector (494, FIG. 12) to indicate that a value exists in the last ordinal position of the value set.
  • the RPU 22 sets the new bit to "1" in order to indicate that the value Zeus is added to the subset (496, FIG. 12). Then, in block 446, the RPU 22 returns to block 426 of the INSERT routine (FIG. 10A) .
  • the RPU 22 calls ADD VALUE TO COLUMN routine (FIG. 10D) .
  • the RPU 22 determines the "count” of binary bits which are set to "1", up to and including the bit which corresponds to ordinal position corresponding to the new value Zeus. "Count” is equal to six because there are six binary bits set to "1" in the entity select vector; the bit corresponding to the value ' Zeus is the sixth binary bit set to "1". With the "count", the RPU 22 determines if there presently resides a row use vector which corresponds to the value Zeus.
  • a row use vector does not exist at position six as determined previously at block 440 (FIG. IOC), and during block 450, the RPU 22 inserts a new row use vector at the sixth position of the row use set.
  • the RPU 22 counts, from left to right, six row use vector positions in the row use set.
  • the RPU 22 adds a new row use vector having binary bits set to "0".
  • the RPU 22 appends binary bits set to "0" to the end of all of the row use vectors in the row use set (500, FIG. 12).
  • the last bit of the newest and sixth position row use vector is set to "1" to indicate that Zeus now occupies the last row of the column for names in the relation for suppliers (502, FIG. 12).
  • DELETE is an operation which removes one value at a time from a binary representation of a relation and possibly from a value set.
  • the DELETE operation occurs in three categories of cases. First, a unique value need not be removed from a subset; however, the unique value needs to be removed from the column. Second, a unique value exists in a column, and it needs to be deleted from the relation and from a subset; however, it does not need to be removed from the value set. Third, a unique value already exists in a relation and it needs to be deleted from the column from a corresponding subset and from a value set.
  • FIGS. 11A, B, C and D depict routines for processing any one of the three situations discussed above.
  • the flow diagram in FIG. 11A is a routine for removing a unique value from a value set and for removing the unique value from a column of a relation; this routine is called DELETE.
  • FIG. 11B is a flow diagram of a routine for removing a value only from a column. This routine is called DELETE VALUE FROM COLUMN. This routine by itself could be used to DELETE a value, one or more times, from a particular column.
  • the flow diagram of FIG. 11C is a routine for updating an entity select vector to reflect the removal of a unique value from a particular subset; this routine is called DELETE VALUE FROM SUBSET.
  • FIG. 11D is a routine for removing a unique value from a
  • the RPU 22 calls the DELETE VALUE FROM COLUMN
  • 10 22 calls block 462(a) to determine whether there is "1" bit set in the ordinal position corresponding to the value being deleted in any of the entity select vectors for the relational database. If there are, then the value is presently being used in other relations and,
  • the DELETE routine (FIG. 11A) can be called successively to DELETE one or more value of a row of one relation.
  • the RPU 22 determines which row use vector of the row use set is associated with the particular value to be removed from one row of the column. This operation can be conducted by performing successive Boolean AND operations on the row
  • 35 use vectors and a binary bit only having a single bit set to one at the ordinal position corresponding to the row position of the column. The bit is changed from “1" to "0", in the appropriate row, indicating the absence of the value in the particular row of the column.
  • the RPU 22 via the BBVP 14 determines whether all of the bits of the row use vector have been set to "0". If not all of the bits of the row use vector are set to "0”, then a "DONE" signal is returned to the calling routine at block 474. If all of the binary bits of the row use vector are set to "0", then processing returns at block 478 to the calling routine.
  • the RPU via BBVP 14 determines the ordinal position in the corresponding entity select vector associated with the unique value whose row use vector has been deleted from the row use set.
  • the RPU 22 via the BBVP 14 sets the binary bit in the entity select vector, associated with the unique value to "0" to indicate the absence of the unique value in the subset.
  • processing returns to the calling program.
  • the system removes the value from the value set by removing the value in the value set structure. Additionally, the binary bit corresponding to the deleted value in all entity select vectors is also removed to account for the reduced size of the value set.
  • FIG. 13 is a detailed results table depicting the various tasks performed by the DELETE routine (FIG. 11A) . This example has been chosen to illustrate all of the routines for deleting a value from a value set and from a binary representation of a relation.
  • the results table of FIG. 13 is broken up into three parts. From left to right, the first column depicts the existing value set, including the unique value Zeus; the second column is the entity select vector representing the subset of the value set for names corresponding to column for names, including the name Zeus; and the third column is a row use set which represents the "names" column of the suppliers relation.
  • Each row of the results table depicts a change in either the value set, entity select vector, or row use set as the DELETE routine (FIG. 11A) is performed.
  • the routines depicted in FIGS. 11A, B, C and D are all transparent to the application.
  • the application provides only the following instruction to the system:
  • RPU 22 calls the subroutine DELETE VALUE FROM COLUMN
  • FIG. 11B to remove the value Zeus from the names column of the relation.
  • the RPU 22 via the BBVP 14 changes the binary bit set to "1" to "0" in the row use vector to indicate that the value Zeus is no longer in the column for names
  • block 472 the system determines whether all of the bits of the row use vector have been set to "0". Zeus only appeared in the column once, and thus, by changing the one binary bit to "0", Zeus is no longer represented in the column; all of the binary bits of the row use vector are set to "0". Thus, in block 478, processing returns to block -462 of the DELETE routine (FIG. 11A) .
  • the DELETE VALUE FROM SUBSET routine (FIG. 11C) is called to remove the unique value Zeus from the subset depicted by the entity select vector. Specifically, during block 463, the row use vector associated with the unique value Zeus is removed from the row use set. Processing continues at block 480, during which the ordinal position of the binary bit associated with the value Zeus in the entity select vector is determined. Specifically, the RPU 22 via the BBVP14, during block 480 determines that the ordinal position of the value Zeus in the entity select vector is the tenth position.
  • the system sets the tenth binary bit from "1" to "0" to indicate the absence of the value Zeus ' from the subset (510, FIG. 13).
  • processing returns to the DELETE routine (FIG. 11A) at block 462(a).
  • the RPU 22 determines whether there are any "1" bits set at the ordinal positions corresponding to Zeus in any entity select vector.
  • the entity select vectors corresponding, to the value set of names for the entire relational database are evaluated to determine if the unique value is referenced in any other subset of the relational database. Because the other tables (i.e., parts (FIG. 5) and shipments (FIG.
  • the RPU 22 removes Zeus from the value set and the corresponding bit in the entity select vector is also removed (512, FIG. 13) . Processing continues at block 490, during which the RPU 22 returns to the DELETE routine (FIG. 11A) at block 466, where processing returns to the calling program.
  • INSERT and DELETE which are basically functions for updating binary representations of relations.
  • the next operations are query functions for finding relevant information about a relation or groups of relations.
  • the operations include SELECT, JOIN and PROJECT, and are principally used for determining a resultant binary relation.
  • the resultant binary relations can then be converted into their byte value form for users to under ⁇ stand. This section concentrates on the operation SELECT which generates a resultant binary relation for depicting which row or rows of a relation contain
  • SELECT determines the rows of the relation which correspond to a particular value or values, in one or more columns of a relation and the result is depicted in a binary representation: a binary bit vector called a "select vector".
  • SELECT operation A typical example of a SELECT operation .might be for determining which suppliers (i.e., Smith, Jones, Blake, Clark and Adams) are located in Athens (63, FIG. 2). A more detailed discussion on this query will be presented shortly.
  • a flow diagram of the SELECT operation is depicted.
  • the RPU 22 via the BBVP 14 determines the ordinal positions of one or more selected unique values, which are in one column of the relation, in a particular value set.
  • the RPU 22 determines whether the selected unique values are found in the value set. If the selected unique values are not found in the value set, then processing returns to the calling program during block 520. Assuming that the selected unique values are found in the value set, then in block 522 a binary bit vector displaying the ordinal positions of the selected values within the value set is generated.
  • the binary bit vector contains bits set to "1" at the ordinal positions corresponding to those of the selected values and the remaining bits of the bit vector are set to "0".
  • the bit vector generated in block 522 is "ANDed" with the entity select vector, corresponding to the column in which the values reside, to determine whether the selected unique values are referenced in the column.
  • the RPU 22 determines whether the resultant bit vector has all bits of the resultant vector set to "0"; i.e., if the corresponding set is empty. If resultant bit vector is all zeros, then the selected unique values are not in the column, and thus, no select vector can be generated and processing returns at block 528 to the calling program.
  • processing continues at block 530 in which the RPU 22 determines count; the number of binary bits, in the entity select vector, which are set to "1" up to and including the ordinal position of each selected value. For each unique value, the count is then used to determine which row use vectors of the row use set correspond to the selected unique values. During block 532, the row use vectors, corresponding to "count", are retrieved. Processing continues at block 536, in which the RPU 22 determines whether one or more unique values were selected from the particular column over which this part of SELECT is processed.
  • the RPU 22 returns the one row use vector, corresponding to the unique value, retrieved at block 532. Processing returns to the calling program at block 542. However, if more than one unique value from a particular column was selected by the application for this operation, .then during block 537 the Boolean OR operation is performed on the selected row use vectors to determine a resultant relation.
  • the Boolean OR operation is performed by the BLU 24 (FIG. 1A) .
  • the steps for performing Boolean operations for compressed bit string is fully discussed in Glaser et al, which was referenced earlier.
  • RPU 22 determines whether the values selected are from only one column of the relation.
  • RPU 22 determines whether any more row use vector for values need to be selected from other columns. If more values need to be processed, then processing continues at blocks 516, 518, 522, 524, 526, 530, 532, 536, 537, 538 and 539 until all of the row use vectors are processed and the entity select vector for each resultant column is generated.
  • a Boolean operation specified by the SELECT instruction is performed on the entity select vectors.
  • the SELECT instruction might require the determination of whether one value, in one column of the relation, is associated with another value in a different column of the relation.
  • the select vectors for the two values would be ANDed together to determine whether both values reside in the same row of the relation.
  • any of the Boolean operations i.e. OR, XOR, etc.
  • the operation is assumed to be AND. (For a more detailed discussion, refer to detailed ex ⁇ amples.)
  • the flow diagram of FIG. 14 depicts the SELECT operation for returning a resultant entity select vector (e.g. binary bit vector) for depicting which rows of a relation contain one or more selected values.
  • the rows of the relation corresponding to the selected values can be displayed to the user.
  • the RPU 22 determines which row use vector of the corresponding row use set contains a binary bit set to "1" in the ordinal position corresponding to the selected row position.
  • An indexing function is performed, which determines the position of the selected row use vector in the corresponding row use set. Specifically, RPU 22 counts the number of row use vectors in the row use set up to and including the selected row use vector. This number corresponds to the ordinal position of the binary bit set to "1" in the entity select vector, which references the unique value of the relation. The unique value is retrieved from the value set. For each column of the relation, the value in the selected row is determined and displayed for the user.
  • a step is added for selecting the unique value according to whether the selected value is greater than, less than, equal to, not equal to, equal to or greater than, or equal to or less than a prespecified value selected by the application program or user.
  • the actual values in the selected rows of the relation are determined and displayed to the user via a mapping function through a vector called the "entity use vector".
  • entity use vector For each column of the relation, an entity use vector is maintained for identifying a value in the value set which corresponds to the value at a particular row of the column.
  • FIG. 15 is a detailed results table depicting the various steps performed by the SELECT routine (FIG. 14) .
  • the results table of FIGS. 15A and B are broken up into six columns.
  • the first column depicts the value set associated with the selected unique value
  • the second column is a bit vector corresponding to the ordinal position of the selected unique value within the value set
  • the third column is the entity select vector associated with the column in which the value resides
  • the fourth column is a resultant bit .vector determined by ANDing the ordinal position bit vector with the entity select vector
  • the fifth column is the row use set associated with the column in which selected unique value resides
  • the last column is the select vector determined by performing a Boolean OR operation on all of the row use vectors corresponding to the selected unique values from one column.
  • the query in this example is a simplified SELECT query to minimize the explanation and steps required to perform the operation. However, generally, the query will be over several value columns of the relation (see the next example for select) .
  • This example could easily be expanded to determine which of the suppliers IDs (i.e., SI through S5) is located in Athens.
  • the row use vector corresponding to Athens indicates the rows of the relation which contain the supplier IDs associated with Athens. For simplicity, in this example we are concerned with only one of the supplier IDs, namely S5, and whether it is associated with Athens. Referring now to FIGS. 14, 15A and B, a detailed example of the query for determining whether supplier ID S5 is located in Athens is now discussed.
  • the system determines the ordinal position of S5 in the value set for suppliers IDs. Essentially, the system traverses a structure associated with the suppliers ID value set and determines the specific node in the structure where the S5 value resides. During block 518, the system determines whether or not the value S5 has been found in the value set. The value S5 is located in the structure, and thus, it is within the value set for the suppliers IDs. In block 522, the system creates a binary bit vector for representing the ordinal position within the value set associated with value S5 (544, FIG. 15A) . As shown at row 544 of FIG.
  • the fifth ordinal position in the new binary bit vector is set to "1" corresponding to the ordinal position of the value S5 in the value set.
  • the new binary bit vector is ANDed with the entity select vector associated with the suppliers identifiers in the suppliers relation (546, FIG. 15A) .
  • the resultant bit vector from- the AND operation is evaluated and it is determined that bit vector is not an empty set.
  • the resultant bit string contains a binary bit set to "1" (548, FIG. 15A) .
  • the unique value S5 is located in the subset referenced by the entity select vector of the suppliers column, and thus, it is in the relation of suppliers.
  • a count is performed on the entity select vector to determine the number of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique value S5.
  • the RPU 22 determines that there are five binary bits set to "1", and thus, the unique value S5 is associated with fifth row use vector of the row use set associated with suppliers IDs (260, FIG. 4) .
  • the row use vector associated with unique value S5 is retrieved from the row use set (260, FIG. 4) .
  • the row use vector for S5 is a binary bit vector containing four binary bits set to "0" and a fifth binary bit set to "1", indicating that the value S5 resides in the fifth row for the column for the suppliers IDs (550, FIG.
  • RPU 22 determines that there is only one value selected from the column for suppliers IDs. Processing continues in block 538, in which the RPU 22 determines that more than one column is involved in this select operation, i.e., the supplier ID and city columns. In block 539, the RPU 22 determines that the value Athens has also been selected by the user in this query, and thus, processing returns to block 516. In block 5-16, the RPU 22 determines the ordinal position of Athens in the value set for cities. Essentially, the system traverses the value set for cities, and during block 518, the RPU 22 determines that Athens is in the value set for cities.
  • a binary bit vector is constructed to indicate which ordinal position of the value set for cities contains the city Athens (552, FIG. 15A) .
  • the system creates the binary bit vector, which shows a binary bit set to "1" in the first ordinal position (552, FIG. 15A) .
  • the Boolean AND operation is performed between the new binary bit vector and the entity select vector associated with cities for the suppliers relation (556, FIG. 15A) .
  • the RPU 22 determines that the resultant vector does not contain all "O's" (558, FIG. 15B) , and thus, the value Athens is determined to be in the suppliers relation.
  • the RPU 22 would return at block 528 to alert the user that the selected value for Athens, although found in the value set, is not within the suppliers relation. Processing continues at block 530, in which the RPU 22 does a count of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique number value Athens. RPU 22 determines that Athens is associated with the first binary bit set to "1" in the entity select vector, and thus, the unique value Athens corresponds to the first row use vector in the row use set (266, FIG. 4). In block 532, the RPU 22 retrieves the row use vector (560, FIG. 15B) associated with
  • the row use vector for Athens contains four binary bits set to "0", followed by a binary bit set to "1", indicating that the value Athens occupies the fifth row of the column associated with cities.
  • the RPU 22 determines that there are no more unique values selected in the column of cities, and thus, processing continues at block 538, in which the RPU 22 determines that more than one column, i.e., suppliers and city, was selected by the user.
  • the RPU 22 determines that no more value need to be selected.
  • the row use vectors, associated with S5 and Athens are ANDed together to generate a resultant select vector, which represents the rows of the relation which satisfy the query (562, FIG. 15B) .
  • the resultant binary bit vector contains four binary bits set to "0” followed by a binary bit set to "1", indicating that the fifth row of the relation contains the supplier ID S5 and the city
  • the actual row of the relation can be reconstructed and displayed to the user in one or two ways.
  • the system can use the entity use vectors and associated row use vectors to map the row number determined by the select vector to the ordinal position in each value set. A more detailed discussion on the entity use vector approach will be discussed in Part VI.
  • the RPU 22 could trace back from the row use vectors to the entity select vectors and then back to the value set to determine the unique values in the fifth row of relation.
  • FIG. 16 is a detailed results table depicting the various steps performed by the SELECT routine (FIG. 14) .
  • the results table of FIGS. 16A and B are broken up into seven columns.
  • the first column depicts the value set associated with the unique values
  • the second column is a bit vector corresponding to the ordinal positions of the selected unique values within the value set
  • the third column is the entity select vector associated with the column in which the selected values reside
  • the fourth column is a resultant bit vector determined by ANDing the ordinal position bit vector with the entity select vector
  • the fifth column is the row use set associated with the column in which the selected values reside
  • the sixth column is the select vector determined by performing a Boolean OR operation on all the row use vectors corresponding to selected unique values from one column
  • the last column is the resultant vector determined by performing a Boolean AND operation on the select vectors determined for the selected values from more than one column.
  • the RPU 22 determines the ordinal positions of the suppliers names Smith and Blake in the value set for suppliers names. Essentially, the system traverses a structure associated with the supplier name value set and determines the specific nodes in the structure where the Smith and Blake values reside. .
  • the RPU 22 determines whether or not the values Smith and . Blake have been found in the value set. The values Smith and Blake are located in the structure; thus, they are x ⁇ ithin the value set for the suppliers names.
  • the RPU 22 creates a binary bit vector for representing the ordinal positions within the value set associated with the values. Blake and Smith are as shown in row 565 of FIG. 16A, the third and ninth ordinal positions. The new binary bits are set to "1" corresponding to the ordinal positions of the values in the value set for suppliers names.
  • the new binary bit vector is ANDed with the entity select vector associated with the suppliers names in the suppliers relation (567, FIG. 16A) .
  • the resultant bit vector from the AND operation is evaluated and it is determined that the resultant bit vector is not all zeros.
  • the resultant bit string contains binary bits set to "1" (569, FIG. 16A) .
  • a single "count”, with two ordinal positions as input, is performed on the entity select vector, up to and including the binary bit associated with the unique value Smith, the last value characterized in the new entity select vector.
  • the RPU 22 determines that there are two binary bits set to "1", one for the count corresponding to Blake and that there are five binary bits set to "1" corresponding to the count for Smith. Therefore, the unique values, Blake and Smith, are associated with the second and fifth row use vectors of the row use that is associated with suppliers names (262, FIG. 4) .
  • row use vectors associated with the unique values Smith and Blake are retrieved from the row use set (262, FIG. 4) .
  • the row use vector for Blake is a binary bit vector containing five binary bits in which the third binary bit is set to "1", indicating that the value Blake resides in the third row of the column for the suppliers names (570, FIG. 15) .
  • the row use vector for Smith is a binary bit vector containing five binary bits and the first bit is set to "1", indicating that the value Smith resides in the first row of the column for suppliers names (570, FIG. 16A) .
  • RPU 22 determines that there is more than one value selected from a column for suppliers names, thus processing continues at block 537.
  • the row use vectors associated with Smith and Blake (570, FIG. 16A) are ORed together to form the select vector as shown at 571 of FIG. 16A.
  • the RPU 22 determines that more than one column is involved in this select operation, i.e., the suppliers names column and the city column. It continues at block 539, during which the RPU 22 determines that the values London and Paris are selected in the separate column cities. In block 516, the RPU 22 determines the ordinal positions for London and Paris in the value set for cities.
  • the system transverses the value set for cities, and during block 518, the RPU determines that London and Paris are both located in the value set for cities.
  • a binary bit vector is constructed to indicate which ordinal positions the value set for cities are associated with the cities London and Paris (573, FIG. 16A) .
  • the system creates a binary bit vector which shows binary bits set to "1" in the fifth and eighth ordinal positions (573, FIG. 16A) .
  • the Boolean AND operation is performed on the new binary bit vector with the entity select vector associated with cities for the suppliers relation (575, FIG. 16B) .
  • the RPU 22 determines that the resultant vector does not contain all zeros (579, FIG.
  • the RPU determines that Paris is associated with the third binary bit set to "1" in the entity select vector and, thus, Paris corresponds to the third row use vector in the row use set (266, FIG. 4).
  • the RPU 22 retrieves the row use vectors associated with London and Paris (561, FIG. 16B) .
  • the row use vector for London contains five bits, the first and fourth bits of the row use vector containing binary bits set to "1".
  • the row use vector for Paris contains five binary bits, the second and third binary bits set to "1".
  • the row use vectors for London and Paris indicate that the first through fourth rows of the column are associated with cities London, Paris, Paris, London.
  • the RPU determines that more than one unique value was selected in the column of cities and, thus, processing continues.
  • the RPU performs the Boolean OR operation on the row use sets for London and Paris.
  • a select vector is generated (563, FIG. 16B) , which has five binary bits and bits one through four are set to "1".
  • the RPU determines that more than one column of the relation was involved in the SELECT, i.e., suppliers names and cities.
  • the RPU determines that no more values in other columns need to be selected from the relation.
  • the select vectors associated with the suppliers names (571, FIG. 16A) and cities (563, FIG.
  • the resultant entity select vector indicates that the first and third binary bits are set to "1", indicating that the first and third rows of the suppliers relation contain information on whether Smith is associated with Paris and/or London and whether Blake is associated with Paris and/or London.
  • the actual rows in the relation can be reconstructed to display to the user in one of two ways. First, the system can use the entity use vectors associated with each column of the relation to map the row numbers determined by the resultant select vector, to the ordinal positions in the appropriate value sets.
  • the RPU 22 could trace back from the various row use vectors to the entity select vectors and back to the appropriate value sets to determine the actual unique values in the first and third rows of the relation.
  • the purpose of the RECONSTRUCT operation is to generate the values associated with a particular column of a relation for the user to ascertain.
  • the binary representation of a relation is constructed and stored in memory 18. If the user of the system wishes to see the actual relation and the values depicted in the relation, then the RECONSTRUCT operation can be per- formed for reconstructing and displaying the relation to the user.
  • the flow diagrams in FIGS. 17A and 17B are for reconstructing and displaying various columns specified by the user or an applications program.
  • the user will specify one or more columns of a relation to be displayed by the system.
  • the user might request the Suppliers ID column of the relation for suppliers at 63 of FIG. 2, which is currently stored in its binary representation in memory 18 (FIG.
  • the first step in performing the RECONSTRUCT operation is in block 565, in which the user specifies various columns of a relation to be reconstructed. As stated above, an applications program may also specify particular columns of the relation to be projected. -For example, when the SELECT operation is performed, the resultant binary representation can then be reconstructed and displayed to the user via the RECONSTRUCT operation (FIG. 17A) .
  • block 566 calls the routine DISPLAY/RECONSTRUCT (FIG. 17B) to reconstruct one of the specified columns.
  • the DISPLAY/RECONSTRUCT routine (FIG.
  • 17B essentially performs the necessary steps for obtaining the values and for placing the values in the proper rows in the column.
  • the RPU determines whether there are any more columns that need to be displayed. If there are more' columns to be displayed, then processing continues at block 566. Blocks 566 and 567 are performed until all of the columns specified by the user or applications program have been reconstructed. If all of the columns have been reconstructed, then processing continues at block 568, in which the RPU returns to the calling program.
  • a flow ⁇ diagram of the DISPLAY/RECONSTRUCT routine is depicted.
  • the RPU 22 obtains ' the entity select vector associated with the particular column to be displayed.
  • the first row use vector associated with the row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18, is transferred to the RPU 22.
  • the RPU 22 performs a Boolean AND operation on the row use vector obtained in block 577 with a row select vector.
  • the row select vector is created by performing a query operation on the relation thereby selecting which rows of the relation the user or application program wishes to display.
  • the row select vector is a new binary bit vector and each binary bit corresponds to a row of the column or columns to be displayed. A binary bit set to "1" indicates that the corresponding row needs to be displayed.
  • the result of the AND operation is a new vector Z which depicts the rows of the column which contain a particular value associated with the row use vector.
  • the results of the AND operation are sent to memory 18 for future processing.
  • the RPU 22 determines whether the resultant vector Z is "0". If the resultant vector Z is "0", then during block 581 a "0" is placed in the first binary position of a new vector called the index vector.
  • the index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the row use set. Each bit indicates whether the unique value associated with the row use vector exists in the relational column to be displayed. If a binary bit in the index vector is set to "0", the unique value associated with the row use vector does not exist in the column to be displayed; Whereas, if the binary bit is set to "1" in the index vector, then the unique value associated with the row use vector exists one or more times in the column. Processing continues at block 593, during which the next row use vector of the row use set is obtained.
  • processing continues at block 585.
  • the RPU 22 sets the binary bit in the index vector, which is associated with the current row use vector, to "1".
  • the resultant bit vector Z is stored in memory 18. The resultant vector Z is later used in the reconstruction process.
  • the RPU 22 clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z.
  • the purpose of this step is to shortcut the processing of the row use vectors in the row use set. Stated differently, when the row select vector is cleared, all of the values in the rows of the column have been determined. The row use vectors which have been processed with the row select vector contain all of the values to be displayed in the column. Then during block 591, the RPU 22 (28, FIG. 1A) determines whether the row select vector has been completely cleared, or stated differently, all of the binary bits have been set to "0".
  • processing continues at blocks 597, 601, 603, 605, 607, 609 and 611 to reconstruct the column with the values associated with the row use vectors in the row use set. However, if not all of the binary bits in the row use vector are set to "0", then processing continues at block 593. During block 593, the RPU 22 gets the next row use vector in the row use set currently being pro ⁇ Waitd. During block 595, the RPU 22 determines whether the end of the row use set has been reached. If the end of the row use set has been reached, then processing continues at blocks 597, 601, 603, 605, 6077 609 and 611 to reconstruct the column. However, assuming that the end of the row use set has not been reached, then processing continues at blocks 577, 579, 585, 587, 589, 591, 593 and 595 until all of the row use vectors of the row use set have been processed.
  • processing continues at block 597.
  • the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Each binary bit set to "1" indicates which row use vectors reference unique values which are to be displayed in the column.
  • the ordinal position of the binary bit set to "1" in the entity select vector is determined. Then, during block 603, the value associated with the ordinal position obtained in block 601 is obtained from the value set.
  • the RPU 22 finds the appropriate location in the index vector associated with the value. Then during 607, the appropriate resultant Z vector stored during step 587 is retrieved. The resultant vector Z indicates which rows of the column contain the unique value associated with the row use vector, and the unique value is placed in the column at the appropriate row locations.
  • the RPU 22 determines whether any more values are left for processing. Assuming that there are more values, then processing continues in blocks 601, 605 and 607 until all of the values have been placed in the proper rows of the column. Once all of the values have been placed into the column, then processing returns to the calling program during block 611. 1. Detailed Example o Performing
  • FIGS. 17A, 17B, 18A, 18B, 18C and 18D a detailed example for reconstructing the column for Suppliers IDs in the Suppliers relation (FIG. 2) is now discussed. More particularly, it is assumed that only the binary representation of the column Suppliers IDs exist in the RDMS 10. The binary representation of the column may be from a result of a SELECT operation or it may have been previously stored after processing by the BBVP 14. In either case, the binary representation of the column Suppliers IDs exist in memory 18 and now the user or an applications program needs to display the actual values of the column. Although, this example is for reconstructing and displaying only one column of a supplier for the suppliers relation, the PROJECT operation could be sequentially performed to supply all of the columns of the supply relation.
  • Results Table for depicting the results of the RECONSTRUCT operation for reconstructing or supplying the Suppliers ID column (80, FIG. 2) of the suppliers relation (63, FIG. 2) is shown.
  • Each row (800-848) of the Results Table depicts a result of the routines shown in FIGS. 17A and 17B.
  • the Results Table is. separated into seven columns. From left to right, the first column of the Results Table shows the entity select vector for the selected column to be displayed. The second column shows the row use vector set associated with the specified column to be displayed. The third column is the row use vector currently being processed, and the fourth column is the row select vector specified by the user application program for determining which rows of the column are to be displayed.
  • the fifth column is the result of ANDing the row use vector and the row select vector together; the result is called vector Z.
  • the sixth column depicts the reconstruction of the index vector for displaying which row use vectors of the row use set have associated unique values in the column.
  • the last column is for the reconstruction of the column to be displayed.
  • FIGS. 17A and 17B the operation performed by the RPU 22 (FIG. 1A) for displaying and reconstructing the Suppliers ID column (80, FIG. 2) is now discussed.
  • the user or application program selects the various column or columns which are to be reconstructed and displayed as a result of the query operation.
  • the user has selected the Suppliers ID column (80, FIG. 2) .
  • the Suppliers ID column only exists in the secondary memory (18, FIG. 1A) in the form of a binary representation or row use set.
  • the DISPLAY/RECONSTRUCT routine (FIG. 17B) is called for finding the values and reconstructing the Suppliers ID column.
  • the RPU 22B during block 571, the RPU 22
  • FIG. 1A finds the entity select vector associated with the Suppliers ID column in the memory 18 (FIG. 1A) .
  • the entity select vector stored in memory 18 (FIG. 1A) is transferred via bus 30 (FIG. 1A) to RPU 22 (FIG. 1A) .
  • the RPU 22 finds the row use set associated with the Suppliers ID column in memory 18 and transfers via a bus to the RPU 22 (FIG. 1A) .
  • the RPU 22 obtains the first row use vector of the row use set for the Suppliers ID column (804, FIG. 18A) .
  • the user wishes to display the first four rows of the Suppliers ID column.
  • a row select vector having four binary bits set to "1" in the first four ordinal .positions, i.e. "1 1 1 1 0".
  • the row use vector (804, FIG. ISA) and the row select vector (806, FIG. 18A) are transferred to the RPU 22 (FIG. 1A) .
  • the row use vector (804, FIG. 18A) and the row select vector (806, FIG. 18A) are ANDed together to determine a resultant binary vector Z (808, FIG. 18A) .
  • the resultant binary bit vector Z depicts the rows of the Suppliers ID column in which the unique value associated with the row use vector (804, FIG. 18A) is to reside.
  • the resultant binary vector Z contains a binary bit set to "1" in the first position which means that the unique value associated with the row use vector (804, FIG. 18A) will be placed in only the first ordinal position of the Suppliers ID column.
  • the RPU 22 determines that the result of the AND operation is not "0", thus processing continues to block 585.
  • the RPU 22 via BBVP 14 sets the first binary bits of the index vector to "1" (810, FIG. 18A) to indicate that the unique value associated with the row use vector (804, FIG. 18A) exists at least once in the Suppliers ID column.
  • the resultant binary bit vector Z is stored in memory 18 for future reconstruction of the column.
  • the binary bit set in the resultant binary bit vector Z indicate the rows of the Suppliers ID column with the unique value SI associated with the row use vector (804, FIG. 18A) reside.
  • the binary bits set to "l" in the row select vector which match the binary bits in the resultant vector Z are set to "0" (812, FIG. 18A) .
  • the row select vector is a value determined if all of the binary bits have been set to "0".
  • the row select vector contains three more binary bits set to "1" (812, FIG. 18A) , thus it continues at block 593.
  • the RPU 22 obtains the next row use vector of the row use set (814, FIG. 18B) .
  • the row use vector (814, FIG. 18B) and the row select vector (816, FIG. 18B) are ANDed together during block 577.
  • the resultant vector Z is shown at 818 of FIG. 18B.
  • the resultant vector Z is evaluated to determine if all the binary bits of the resultant vector have been set to "0". Not all of the binary bits of the resultant vector are "0" (the resultant vector is "0 1 0 0 0" (818, FIG. 18B) ) , and thus during block 585, the second binary bit of the index vector is set to "1" (820, FIG. 18B) .
  • the RPU 22 (FIG. 1A) stores the resultant vector Z in memory 18 for future processing. Then during block 589, the binary bits of the row select vector which were set to "1" and matched the binary bit set to "1" in the resultant vector Z are set to "0"
  • the row select vector is evaluated to determine if all the binary bits have been set to "0".
  • the row select vector still has two binary bits set to "1" (821, FIG. 18C) , and during block 579, it is evaluated to determine if all the binary bits are set to "0".
  • the resultant vector Z contains one bit set to "1", and thus processing continues at block 585.
  • the RPU 22 sets the third binary bit of the index vector to "1" (828, FIG. 18C) .
  • the resultant vector Z is stored in memory 18 for future processing.
  • the binary bit of the row select vector which matched the binary bit of the row use vector is set to "0" (830, FIG. 18C) .
  • the row select vector is evaluated to determine if all the binary bits are set to "0".
  • the row select vector still has one binary bit set to "1" (834, FIG. 18C) , thus it continues at block 593.
  • the RPU 22 obtains the next row use vector of the row use set (832, FIG. 18C) .
  • the row use vector (832, FIG. 18C) and the row select_vector (834, FIG. 18C) are ANDed together during block 577.
  • the resultant vector Z is shown at block (836, FIG. 18C) .
  • block 579 it is determined that not all of the binary bits of the resultant vector are "0" (the resultant vector is "00010" (836, FIG. 18C) .
  • the fourth binary bit of the index vector is set to "1" (838, FIG. 18C) .
  • the resultant vector Z is stored for future processing.
  • the binary bit of the row select vector which were set to "1” and match the binary bit set to "1" in the resultant vector Z are set to "0" (840, FIG. 18D) .
  • the row select vector is evaluated to determine if all the binary bits had been set to "0".
  • All the binary bits of the row select vector are set to "0" (830, FIG. 18D) , and thus processing continues to block 597.
  • the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Specifically, the first, second, third and fourth binary bits of the index vector are set to "1". Thus the first, second, third and fourth row use vectors of the row use sets are associated with unique values which exist in the Suppliers ID column.
  • RPU 22 (FIG. 1A) determines the ordinal positions of the entity select vector which correspond with the row use vectors having corresponding binary bits set to "1" in the index vector.
  • the values associated with each ordinal position of the entity select vector are obtained from the value set of Supplier IDs. Specifically, the values SI, S2, S3 and S4 are obtained. Then, during block 605 and 607, the first resultant binary vector Z is obtained from temporary storage, and the value SI is placed in the proper ordinal position of the column for Suppliers IDs. And during block 609, the RPU 22 (FIG. 1A) determines whether there are any more values left for processing. There are three more values left for processing, and thus block 605 and 607 are performed. During block 605 and 607, the second resultant vector Z is obtained and the value S2 is placed in the second ordinal position in the column (844, FIG. 18D) .
  • block 609 it is determined that there are more values left for processing, and thus block 605 and 607 are performed.
  • the third resultant vector associated with the third resultant vector Z which is associated with the value S3 is obtained from memory.
  • the value S3 is placed into the third ordinal position of the relational column for Suppliers IDs (846, FIG. 18D) .
  • block 609 it is determined that there is still one more value to process, and thus block 605 and 607 are performed.
  • the fourth resultant vector Z stored in temporary memory is obtained and the value S4 is placed into the fourth ordinal position of the column for Suppliers IDs (848, FIG. 18D) .
  • the RPU 22 (FIG. 1A) determines that there are no more values for processing and thus returns to the calling RECONSTRUCT routine (FIG. 17A) at block 567.
  • the RPU 22 determines whether there are any more columns to be displayed to the user. For this example, it is assumed that only the Suppliers ID column is to be reconstructed and displayed. However, if more columns of a particular relation were to be displayed, then processing would continue at blocks 566 and 567 until all of the columns were displayed. Also, the same row select vector would be used each time the DISPLAY/RECONSTRUCT routine (FIG. 17B) was performed. Assuming that there are no more columns to be displayed, then processing returns to the calling program during block 568. E. JOIN
  • a JOIN is a SELECT over the Cartesian product of more than one relation of the relational database.
  • the data shown in Table A above comes from the two relations; suppliers 63 and parts 65 (FIG. 2) .
  • the names of the relations are listed in the FROM clause, and connection between the two relations (63, 65 FIG. 2) is listed in the WHERE clause (i.e., the fact that the city values must be equal) which is called the JOIN predicate.
  • the JOIN is used to combine relations based on equivalent values in the column if specified by the JOIN predicate. In this case, the specified columns are the "city" columns of each relation.
  • the JOIN pairs each of the N rows of a first relation; e.g., the SUPPLIERS relation (63, FIG.
  • the comparison operator in a JOIN predicate be equality.
  • the EQUIJOIN by definition produces a result containing two identical columns as shown in Table A. If one of these two columns is eliminated, the result is called NATURAL JOIN.
  • the JOIN operation is the restriction of the Cartesian product of two or more relations.
  • the Cartesian product of a set of N relations is a new relation consisting of all possible rows "r", such that ⁇ r r" is the concatenation of all rows from the participation of relations. Once the Cartesian product is generated, all rows that do not satisfy the "JOIN predicate" are eliminated from the Cartesian product. What is left is the EQUIJOIN result relation.
  • the complete table contains thirty rows. Now, all the rows' Cartesian product in which S.CITY is not equal to P.CITY are eliminated and what is left is the EQUIJOIN result as shown earlier.
  • the JOIN relation is efficiently represented by binary bit vectors and second, the JOIN relation is constructed without having to create a cross product relation.
  • FIG. 19 represents the depiction of a JOIN relation from the following query: SELECT S.ID#,S.STATUS,S.CITY,P.ID#
  • This query is a projection of the JOIN because the P.CITY is not mentioned in the SELECT clause of the query. Like the EQUIJOIN example discussed above, this query requires that data come from two relations, namely the suppliers relation (63, FIG. 19) and the Parts relation (65, FIG. 19) . Both relations are named in the FROM clause and the connection between the tables is through the CITY columns in the WHERE clause.
  • the result of the JOIN for displaying the columns CITY, SUPPLIERS ID#s, STATUS, and PART ID # « s is shown at 628 of FIG. 19.
  • the entity select vectors at 600 and 602 are binary bit vectors that indicate which rows of the particular relation associated with the entity select vector, participate in the JOIN relation. More particularly, each binary bit has an ordinal position, which corresponds to a row of the relation. Binary bits 601, 603, 605, 607 and 609 correspond to the five rows of the suppliers relation 63. When the binary bit is set to "1" the particular row associated with the binary bit participates in the JOIN relation.
  • binary bit 601 indicates that the first row of the SUPPLIERS relation 63 participates in the JOIN relation.
  • the entity select vector 602 having binary bits 611, 613, 617, 619 and 621, indicates that the first, second, fourth, fifth and sixth rows of the PARTS relation 65 participate in the JOIN relation.
  • Vectors 600 and 602 act just like the entity select vectors discussed in FIGS. 4, 5 and 6. . The only difference is that the ordinal positions of the bits in the entity select vectors 600 and 602 do not correspond to ⁇ unique values in a value set. Instead, the binary bits of entity select vector 600 and 602 refer to row locations in a particular relation. Like the entity select vectors in FIGS. 4, 5 and 6, an implied mapping correspondence exists between each binary bit in the entity select vector to and a particular row use vector in an associated row use set. The implied mapping scheme is illustrated by the dotted lines 608, 610, 612 and 614, which show that the binary bits of the entity select vector 600 indicate the correspondence of the rows of the suppliers relation to the row use vectors 615, 617, 619 and 621, respectively.
  • the binary bits of the entity select vector 602 which correspond to the row use vectors of the Parts relation 65, are mapped in an implied manner to the row use vectors 623, 625, 627, 629 and 631, as shown by the dotted lines 616, 618, 620, 622 and 624.
  • the row use sets of the JOIN relation perform a dual task of representing the values in the rows of the JOIN relation and for depicting more than one column, of the JOIN relation.
  • the row use set 604 represents the columns S.CITY 638, S. ID# 636 and S.STATUS 634.
  • the row use set 606 represents the P.ID#, 630, the column of the JOIN relation 628.
  • the columns 638, 636, 634 and 630 depict the result of the JOIN.
  • the entity select vector 600 and the row use set 604 represent all of the binary information necessary to construct the suppliers relation portion 626 of the JOIN relation, 628; namely, columns 638, 636 and 634. The following is a detailed discussion on how this representation is achieved.
  • a dotted line 608 maps the row use vector 615 to the binary bit 601 of the entity select vector 600.
  • the binary bits of the row use vector 615 indicate the row positions of the columns 638, 636 and 634, which contain a particular value in the first row of the suppliers relation 63.
  • Binary bit 601 cor ⁇ responds to the values London, 20, Smith and SI.
  • To build the S.CITY column of the JOIN relation only the value London is referenced.
  • the three binary bits set to "1" in the row use vector 615 represent three occurrences of the value London in the S.CITY column 638 of the suppliers portion 626 of the JOIN relation.
  • the first three bits set to "1" in the row use vector 615 indicate three occurrences of the value SI.
  • the first three binary bits set to "1" in the row use vector 615 represent occurrences of the value 20.
  • the first three rows of the suppliers portion 626 of the JOIN relation 628 are characterized by the first three bits of the row use vector 615.
  • the next three rows of the suppliers relation 626 are characterized by the row use vector 621 of the- row use set 604.
  • the replication of the three bits in row use vector 615 indicates that the three values [LONDON, SI, 20] of the first table of the Supplier relation occur in rows 1, 2, 3 of the JOIN relation.
  • the remaining values and the columns of the Suppliers portion 626 are indicated by the row use vectors 617 and 619, which contain binary bits set to "1" in the seventh, eighth, ninth and tenth rows of both row use vectors. It should be noted that although the columns S.CITY, S.ID# and S.STATUS of the Suppliers relation are indicated by the query, the column SNAME in the SUPPLIERS relation, could just as easily have been represented by the row use set 604.
  • the values of the Parts relation are mapped into the JOIN relation 628 by the row use set 606 and the entity select vector 602 in exactly the same fashion as for Suppliers.
  • FIGS . 20 and 21 represent a more detailed view of the JOIN relation 628 (FIG. 19) . More particularly, the SUPPLIERS relation 63 (FIG. 19) is depicted by its row use sets at 63 (FIG. 21) and the Parts relation 65 (FIG. 20) , as depicted by its row use sets 65 (FIG. 20) . In addition, the entity select vectors 600 and 602 are shown corresponding to the row use sets 630 and
  • FIGS. 20 and 21 depict the value sets referred to in the SUPPLIERS and PARTS relations with their associated entity select vectors.
  • FIG. 19 a detailed discussion of the SUPPLIERS relation portion 626 (FIG. 19) is now presented.
  • the row use set 604 for the JOIN relation (628, FIG. 19) is mapped back to the Suppliers relation by the entity select vector 600.
  • the entity select vector 600 is repeated for each column of the Suppliers relation which has one or more values in the suppliers portion of the JOIN relation.
  • the implied mapping from each row use vector of the row use set 604 is shown by the dotted lines 608, 610, 612 and 614.
  • the row use sets for representing the columns of the suppliers relation are shown in FIG. 4.
  • the first three binary bits set to "1" in the row use vector 615 of the row use set 604 are mapped to the binary bit 601 of the entity select vector 600.
  • the entity select vector 600 corresponds to the row use set 260.
  • the first binary bit 601 is set to "1" indicating that the value in the first row of the S.ID# column of the suppliers relation is present in the JOIN relation.
  • a Boolean AND operation is performed on each row use vector 184, 186, 188, 190 and 192 to determine which row use vector contains a corresponding "1" bit in the first position.
  • Row use vector 184 contains a binary bit set to "1" in the first position.
  • Row use vector 184 maps back to the first binary bit of the entity select vector 176 for the suppliers relation.
  • the first binary bit of entity select vector 176 corresponds to the value SI in the value set for suppliers identifiers 160.
  • the first three binary bits set to "1" in the row use vector 615 indicate that the value SI is in the first three rows of the S column in the JOIN relation.
  • the first three binary bits set to "1" in the row use vector 615 are mapped to the entity select vector 600 associated with the row use sets 264 and 266, corresponding to the S.STATUS and S.CITY columns of the suppliers relation.
  • the row use vector 615 represents the first three values of the S.CITY column 638 and the S.STATUS column 634 of the prior portion of the JOIN relation. Referring to FIG. 21, a more detailed view of the
  • Parts relation portion of the JOIN relation is shown. Specifically, the mapping of the part ID #'s to the P.ID# column 630 of the JOIN relation is shown.
  • Row use vector 623 of the JOIN relation contains binary bits set to "1" in the first and fourth positions. These binary bits correspond to the first and fourth positions of the P.ID# column 627 of the JOIN relation.
  • Row use vector 623 is impliedly mapped to the first binary bit 611 of the entity select vector 602.
  • the entity select vector 602 corresponds to the row use set 304.
  • a Boolean AND operation is performed on the entity select vector in each row use vector of the row use set 304 to determine which row use vector contains the binary bit set to "1" in the first position.
  • the left most row use vector contains a binary bit set to "1" and this row use vector maps back to the first binary bit position of entity select vector 284.
  • the first binary bit of the entity select vector 284 is set to "1", indicating that the first row of the value set 282 contains the unique value mapped into the Parts relation 65 and into the JOIN relation.
  • binary bit 611 indicates that the value PI is mapped into the row use vector 623 of the JOIN relation, specifically, that of the first and fourth rows.
  • the remaining rows of the P.ID# column 627 of the JOIN relation are depicted -by the row use vectors 625, 627, 629 and 631 of the row use set 666.
  • FIG. 22G a detailed discussion on the operations performed by the RPU 22 (FIG. 1A) for performing a JOIN operation and constructing the JOIN relation is now discussed.
  • the routine EQUI- JOIN which builds a binary representation of the JOIN relation
  • FIG. 22B is a flow diagram of the routine BUILD ROW USE SETS for constructing the particular row use sets associated with each column of the JOIN relation.
  • FIG. 22C is a flow diagram for the routine CONSTRUCT JOIN ROW USE VECTORS for controlling the overall construction of each row use vector of a row use set in the JOIN relation.
  • FIG. 22A the routine EQUI- JOIN, which builds a binary representation of the JOIN relation.
  • FIG. 22B is a flow diagram of the routine BUILD ROW USE SETS for constructing the particular row use sets associated with each column of the JOIN relation.
  • FIG. 22C is a flow diagram for the routine CONSTRUCT JOIN ROW USE
  • FIG. 22D is a routine EVALUATE ROW USE SETS for determining the number of occurrences of the unique values participating in the JOIN operation.
  • FIG. 22E is a routine called PRODUCTS for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors for a particular row use set of the JOIN relation.
  • FIG. 22F is a routine called NUMS for determining the number of times a particular bit pattern repeats itself in a row use vector in the JOIN relation.
  • FIG. 22G is a routine called GENERATE BIT STRING for building the row use vector in the JOIN relation.
  • the binary representation of the one or more relations to be JOINed are found in the memory 18 of the RDMS 10.
  • the binary represented relations are stored until the RPU 22 is ready for processing.
  • the relations are sent via bus 48 to the BBVP 14.
  • the RPU 22 uses the relations in the BBVP 14, the RPU 22 performs the EQUI/NATURAL JOIN operation (FIG. 22A) to create a binary representation of the resultant JOIN relation.
  • S.CITY P.CITY
  • the RPU 22 performs a Boolean AND operation on the entity select vectors obtained to determine a resultant binary bit vector x.
  • the resultant binary bit vector indicates which values of the particular value set are common to all of the entity select vectors involved in the JOIN operation. Specifically, binary bits set to "1" in the resultant bit vector "x" indicate the values of the value set which are common to all the columns represented by the obtained entity select vectors.
  • the RPU 22 determines which binary bits of each entity select vector correspond to the binary bit set to "1" in the resultant bit vector x. For each binary bit set to "1" in the entity select vector that corresponds to a binary bit set to "1" in the resultant bit vector x, the RPU 22 obtains the row use vector in the associated row use set during block 658. Then, during block 660, for each row use set associated with each entity select vector, the Boolean operation OR is performed on the selected row use vectors of the associated row use set.
  • the resultant vectors are referred to as JOIN entity select vectors, which characterize values belonging to one or more JOIN columns in the JOIN relation.
  • a row use set corresponding to each JOIN entity select vector is constructed. Specifically, during block 664, the BUILD ROW USE SET routine (FIG. 22B) is called to construct each row use set corresponding to each JOIN entity select vector of the JOIN relation. When all of the row use sets for the JOIN relation have been constructed, processing returns to the calling routine in block 668.
  • routine BUILD ROW USE SETS The purpose of this routine is to construct the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns of the resultant JOIN relation.
  • the RPU 22 selects the first unique value i in the resultant bit vector x. In other words, the RPU 22 selects the first value represented by the occurrence of a "1" bit in the resultant bit vector x. Then, during block 673, a variable "START ROW" is set equal to zero. This variable indicates the start position of the first row in the JOIN row use vector in the JOIN row use set being generated and will be discussed in more detail along with the description of FIG. 22G. Then, during block 674, the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C is called.
  • This routine is performed by the RPU 22 to determine the characteristics of a particular row use vector of the JOIN relation corresponding to the value i and to build the row use vector or vectors associated with the value in the JOIN relation.
  • the RPU 22 determines if there are any more unique values which are indicated in the resultant vector x. If there are, then the next unique value, i, is obtained at block 678. Processing continues at the CONSTRUCT JOIN ROW USE VECTORS routine to build the JOIN row use vector(s) associated with the next unique value. Assuming . that there are no more unique values represented in the resultant bit vector x, then processing returns to the calling program (EQUI/NATURAL JOIN, FIG. 22A) at block 680.
  • the CONSTRUCT JOIN ROW USE VECTORS routine is now discussed. As stated above, the purpose of this routine is to determine the characteristics of the row use vectors of the JOIN relation and to build the row use vectors for a particular - row use set in the JOIN relation.
  • the routine EVALUATE ROW USE VECTORS is called. The purpose of this routine is to determine the number of occurrences of a particular value which participates in the JOIN operation. A more detailed discussion of this routine will be presented shortly with reference to FIG. 22D. Then, during block 686 the PRODUCTS routine (FIG. 22E) is called.
  • the PRODUCTS routine calculates a series of product terms which characterize the formation of bit patterns in each of the row use vectors of a particular row use set of the JOIN relation. A more detailed discussion of this routine will be presented with reference to FIG. 22E. Then, during block 688 the NUMS routine (FIG. 27F) is called. The NUMS routine determines the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. A more detailed discussion will be shortly presented with reference to FIG. 22F. Assuming that all the calculations for determining the characteristic of a row use set have occurred, processing continues at block 690 during which the first input column "j" (where "j" is set equal to 1) is obtained. Then, in block 692, the row use set associated with the first input column j is obtained.
  • the GENERATE BIT STRING routine (FIG. 22G) is called for constructing the row use vectors associated with a particular value in the row use set of the JOIN relation.
  • the GENERATE BIT STRING routine (FIG. 22G) evaluates the calculations of the PRODUCTS (FIG. 22E) and NUMS (FIG. 22F) routines to determine the characteristics of the bit patterns in the row use vectors and constructs these bit patterns in the row use vectors of the JOIN relation.
  • the RPU 22 determines if there are any more input columns which need to be processed. Assuming that there are still other input columns to be processed, block 698 increments the variable "j" by 1.
  • FIG. 22D is a flow diagram of the EVALUATE ROW USE VECTORS routine, discussed below in more detail.
  • the purpose of this routine is to determine the number of occurrences of a particular value which participates in the JOIN operation.
  • the RPU 22 selects the first column and obtains the row use set of the column.
  • the RPU 22 obtains the row use vector of the RUS (j) which corresponds to the unique value i of the bit vector x.
  • the row use vector is referred to as V j .
  • the number of binary bits set to "1" in the row use vector Vj is determined.
  • the number of binary bits set to "1" in Vj corresponds to the number of occurrences of the particular value i in the current column over which the JOIN is performed.
  • the number of occurrences calculated for this particular row use vector Vj is placed in the variable Cj .
  • Cj is used by the PRODUCTS routine (FIG. 22E) , to be discussed.
  • the RPU 22 determines whether there are any more input columns. Assuming that there are still more input row use vectors, processing continues at block 709, during which the variable j is incremented by l. Processing continues at block 705, 706 and 707 until all of the input columns are processed.
  • FIG. 22E is a flow diagram for the routine
  • PRODUCTS As stated earlier, the purpose of this routine is for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors of the JOIN relation.
  • an array called PRODS is set equal to the series
  • PRODS ⁇ (l,C ⁇ * C 2 * C 3 * . . . * C n ) , (2,C 2 * C 3 * . . . * C n ), (3,C 3 * . . . * C n ), . . . (N - 1, C n - 1 * C n ) ,
  • Cj is equal to the number of occurrences of value i in column j participating in the JOIN operation. For example, suppose the JOIN operation is performed over the CITY columns for two relations and the value of London is found to be present in both columns. Assume that the CITY column in the first relation contains two occurrences of the value London and the CITY column in the second relation contains three occurrences of the value London. Then C ] _ is set equal to 2 and C 2 is set equal to 3. Therefore, PRODS(1) is equal to C- ⁇ * C 2 , which is equal to 6. This number is used by the GENERATE BIT STRING routine (FIG.
  • processing block 710 determines the characteristics of the bit patterns in the row use vectors of the JOIN relation.
  • processing block 710 processing continues at block 712, to return processing to the calling program or the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
  • the NUMS routine is for determining the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. Specifically, during block 716 the following series is calculated
  • NUMS (1, PRODS(1)/PRODS(1) )
  • this routine determines' the characteristics of a bit' pattern in a JOIN row use vector associated with a particular value i.
  • An OFFSET value is determined for specifying the number of binary "0" bits in the row use vector ahead of the first binary "1" bit in addition to the zero-bits specified by START- ROW.
  • the offset value is equal to an initial value of zero.
  • RPU 22 obtains the input bit vector associated with the variable Vj .
  • RPU 22 obtains the first binary "1" bit in the Vj bit vector.
  • a variable K is set equal to the ordinal position of the selected (726, FIG. 22G) bit of Vj .
  • Processing continues at block 730, during which the Create Output vector "W" is generated.
  • the output vector "W" at a position in the row use set corresponds to the bit position k, in the entity select vector (the destination entity select vector corresponding to column j) .
  • the characteristics of the output bit vector W are determined by calculating NUMS (j), PRODS (j + 1) and PRODS (j).
  • NUMS (j) indicates the number of repetitions of a bit pattern having PRODS (j + l) "l" bits.
  • PRODS (j) indicates the total number of bits in the bit pattern associated with the output bit vector W.
  • FIG. 19 depicts a JOIN operation for the following query:
  • the result of this JOIN operation for columns CITY, SUPPLIERS ID#s, STATUS and PART ID#s is shown at 628 of FIG. 19.
  • the binary representation for the new JOIN relation is shown at 604 and 606. More particularly, the SUPPLIERS relation portion of the JOIN relation 628 is shown at 604 and the PARTS relation portion of the JOIN relation is shown at 606.
  • the purpose of this discussion is to construct the binary representation of the JOIN relation for the query above.
  • Fig. 23 is a Results Table depicting the JOIN operation performed on the SUPPLIERS relation and the PARTS relation for the specific query above.
  • the second column is the entity select vector associated with the column of the first relation (or SUPPLIER relation) over which the JOIN operation is performed.
  • the third column is the entity select vector for the column in the second' relation (or PARTS relation) over which the JOIN operation is performed.
  • the fourth column is the resultant bit vector x generated by Boolean ANDing the entity select vectors for the two columns over which the JOIN operation is being performed.
  • the fifth column is the selected input row use vectors of the SUPPLIERS relation.
  • the sixth column is the selected input row use vectors for the PARTS relation.
  • the seventh column is the entity select vector 1, corresponding to the first column of the JOIN relation which also refers to the SUPPLIERS relation portion of the JOIN relation.
  • the eighth column is the entity select vector 2 corresponding to the second column of the JOIN relation which also refers to the PARTS relation portion of the JOIN relation.
  • the ninth column is the first JOIN row use set corresponding to entity select vector 1, and the tenth column is the second JOIN row use set corresponding to entity select vector 2.
  • the entity select vectors for the CITY columns of the SUPPLIERS relation e.g., the entity select vectors for the CITY columns of the SUPPLIERS relation
  • the second column of the Results Table of FIG. 23 depicts the entity select vector for the CITY column of the SUPPLIERS relation
  • the third column of the Results Table of FIG. 23 depicts the entity select vector associated with the CITY column of the PARTS relation.
  • the RPU 22 performs a Boolean AND operation on the entity select vectors for the SUPPLIERS and PARTS relations to obtain a resultant binary bit vector x (852, FIG. 23A) .
  • the resultant binary bit vector shown in the third column of Fig.
  • the RPU 22 determines which binary bits of each entity select vector correspond to the binary bits set to "1" in the resultant vector x.
  • the row use vectors corresponding to the values London (L) and Paris (P) are obtained (854, FIG. 23A) .
  • the row use vectors corresponding to the values London and Paris in • the CITY column for the PARTS relation are obtained (856, FIG. 23) .
  • the Boolean OR operation is performed.
  • the resultant vectors of the Boolean OR operations are entity select vectors for the output or JOIN row use sets of the JOIN relation being formed.
  • the entity select vector 1, for the Boolean OR of the bit vectors for London and Paris is shown at 858 of FIG. 23A.
  • the entity select vector 2 for the Boolean OR of the bit vectors for London and Paris is shown at 860 of FIG. 23A.
  • the routine BUILD ROW USE SETS is called for constructing the row use sets associated with the JOIN entity select vectors. Referring to FIG. 22B, the BUILD ROW USE SETS routine for constructing the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns of the resultant JOIN relation is shown.
  • the RPU 22 selects the first value represented to be present by the first occurrence of a bit set to "1" in the resultant bit vector x.
  • the first unique value of the resultant relation which corresponds to a bit set to "1" is
  • variable START-ROW is set equal to zero.
  • CONSTRUCT JOIN-ROW USE VECTORS routine is called. Processing continues at block 684 of the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
  • the EVALUATE ROW USE VECTORS routine (FIG. 22D) is called.
  • the RPU 22 obtains the first row use set associated with the S.CITY column over which the JOIN operation is performed. Particularly, the RPU 22 obtains the row use set corresponding to the S.CITY column of the Supplier relation.
  • the variable j corresponding to this particular column is set equal to 1.
  • Processing continues in block 705, during which the row use vector of the row use set for S.CITY column is obtained for the unique value London which corresponds to the first binary bit set to "1" in the resultant binary bit vector x.
  • the variable V ⁇ _ is set equal to the row use vector "10010" (854, FIG. 23A) .
  • the RPU 22 sets the variable C ⁇ _ equal to 2, the -number of "1" bits in the bit vector V ⁇ .
  • the RPU 22 sets the variable C 2 equal to 3, the number of "1" bits in the bit vector V 2 .
  • the RPU 22 determines that there are no more input columns over which the JOIN operation is to be performed. Thus, processing returns during block 711 of FIG. 22D to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 686.
  • PRODUCTS routine (FIG. 22E) is called.
  • FIG. 22E a series of numerical products which characterize the formation of the bit patterns in each of the row use 1 vectors of the JOIN row use set is constructed.
  • NUMS 22F is called to determine the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation.
  • the NUMS determination is 10 a. series of divisions for calculating the number of repetitions of a particular pattern of "1" bits in a row use vector of the JOIN relation. Specifically, NUMS is equal to the following series:
  • NUMS (1, 1) (2, 2) . 15 Processing returns during block 718 to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 690.
  • 25 OFFSET is set equal to an initial value of zero.
  • the RPU 22 performs a count function on the first bit set to "1" in the binary bit vector Vn . Then, at
  • variable K is set equal to the ordinal position of the first binary bit set to "1" in V ] _; the ordinal position is 1. Processing continues at block
  • the output vector W corresponds to the bit
  • NUMS (1) indicates the number of repetitions of the bit string of the output vector W corresponding to the first occurrence of London in the bit vector V ⁇ _.
  • PRODS (2) indicates that there are three "1" bits in the bit pattern of the output vector W.
  • PRODS (1) indicates the total number of bits, six, in the bit pattern associated with the output bit vector W.
  • the position of the first binary bit set to "1" in the output bit vector W is zero.
  • the output vector W is "111000" (862, FIG. 23).
  • Processing continues in block 732, during which the RPU 22 determines that there is a second "1" bit in the bit vector V ⁇ .
  • the next bit of V ] _ is obtained.
  • the position of the selected bit of V ! is 4.
  • K is set to the 4. .
  • the variable OFFSET is set equal to the initial value of zero.
  • the bit vector V 2 is obtained. This is the row use vector corresponding to London in the P.CITY row use set.
  • the first bit set to "1" of the bit vector V 2 is selected.
  • the position of this selected bit of V 2 is determined to be 1.
  • the output vector W corresponding to the bit position 1 of the entity select vector 2 (Destination entity select vector corresponding to P.CITY column) is determined. The characteristics of the output vector W are determined by calculating NUMS (2), PRODS (3) and PRODS (2).
  • NUMS (2) is the number of repetitions of the generated bit string which is equal to 2.
  • the bit string consists of PRODS (2) bits which is equal to three bits.
  • the number of "1" bits in the bit pattern is given by PRODS (3) which is equal to one "1" bit.
  • PRODS (3) which is equal to one "1" bit.
  • This series commencing with the first bit set to "1” begins at position zero as indicated by the START ROW + OFFSET, where START ROW and OFFSET are both zero.
  • the bit pattern of output vector W is "100100" (866, FIG. 23).
  • Processing continues at block 732, during which the RPU 22 determines that there are still "1" bits in the bit vector V 2 to be evaluated. Specifically, bits at positions 4 and 6 of the bit vector V 2 need to be evaluated.
  • processing continues at block 736, during which the variable OFFSET is .determined to be equal to 1.
  • the output bit vector W is "010010" (868, FIG. 23) .
  • Processing continues at block 732, during which the RPU 22 determines that there is still one "1" bit in the bit vector V 2 to be evaluated.
  • processing continues at block 736, during which offset is incremented by the value PRODS (3) .
  • PRODS (3) is equal to 1 and thus the value of offset is equal to 2.
  • the next bit of the bit vector V is obtained and processing continues at block 728.
  • the position of the newly selected bit of the bit vector V 2 is 6 and the variable K is set to 6.
  • the output vector W corresponding to bit position 6 in the entity select vector 2 is determined.
  • PARTS relation 65 such that the SUPPLIER CITY follows the PARTS CITY in alphabetical order.
  • the command for this query is:
  • the command for the GREATER THAN JOIN operation requires that the data come from two relations, namely, the Suppliers relation (63, FIG. 2), and the Parts relation (65, FIG. 2) .
  • both relations are named in the FROM clause, and the express connection between the tables is the City column in the WHERE clause.
  • the result of the GREATER THAN JOIN is for displaying all the columns of the Suppliers relation with all the columns of the Parts relation.
  • the process for creating the binary representation of the GREATER THAN JOIN relation is the same as for the EQUIJOIN operation. Namely, the BUILD ROW USE set (FIG. 223) is called to physically construct the columns of the JOIN relation.
  • FIG. 24 is a flow diagram which depicts the steps for evaluating and preparing the data prior to performing the BUILD ROW USE SETS routine (FIG. 22B) , which constructs the binary representation of the JOIN relation.
  • a generic routine for performing the GREATER THAN JOIN operation is shown. Specifically, the JOIN operation is constructed for all the values of column A (i.e. all the values of the City column of the Suppliers relation) which are greater than all the values of column B (i.e. all the values of the City column for the Parts relation) . It should be noted that this routine could be modified for any other of the predicate JOINs such as LESS THAN, LESS THAN OR EQUAL TO, and GREATER THAN OR EQUAL TO.
  • a sort operation is performed on all the values of column A to order the values from the greatest value to the least greatest value.
  • a binary bit vector D is generated.
  • the binary bit vector D contains a number of binary bits equal to the number_ of values in the value set over which the GREATER THAN JOIN operation is performed.
  • the binary bit vector D has its binary bits, set to "1" at all the ordinal positions corresponding to the values of the value set, which are less than the value y of column- A.
  • the Boolean AND operation is performed on the binary bit vector D with the entity select vector corresponding to column B.
  • the resultant vector contains binary bits set to "1" at the ordinal positions corresponding to the values of column B, which are less than the value y of column A.
  • the resultant vector is checked to see if it is "0". If the resultant vector is "0", then processing returns to the calling routine during block 1012 because none of the values of B is less than the value of A. Therefore, the GREATER THAN JOIN operation cannot be satisfied, and thus, the user is notified. Assuming that the resultant vector is not "0", then processing continues at block 1014.
  • the ordinal positions of the binary bit set to "1" in the result vector calculated in block 1008 are determined.
  • the ordinal positions correspond to the ordinal positions of the entity select vector, which correspond to the values of column B, which, are less than the values of column A.
  • all of the row use vectors associated with the ordinal positions of column B, which are associated with the binary bits set to "1" in the result vector are obtained.
  • the row use vector for value y of column A is also obtained.
  • the row use vectors associated with column B are placed into a temporary area, along with the row use vector for column A.
  • the Boolean OR operation is performed between the row use vectors associated with column B to generate the entity select vector for the column B portion of the JOIN relation.
  • entity select vector for the column A portion of the JOIN relation is a row use vector associated with the value y of column A.
  • the entity select vector for the input column B depicts all of the values of B, which are less than the value y of column A.
  • the BUILD ROW USE SETS routine (FIG. 22B) will construct the first several rows of the JOIN relation, which will consist of the value y of column A and all of the values of column B, which are less value y.
  • the JOIN operation is performed over two relations and a resultant JOIN relation is generated.
  • the resultant JOIN relation (FIGS. 19, 20 and 21) was a binary representation.
  • the system can automatically perform the DISPLAY/RECONSTRUCT FOR A JOIN operation routine (FIG. 25A) to generate the particular columns of the JOIN relation for the user to ascertain.
  • the DISPLAY/RECONSTRUCT FOR JOIN operation FIG.
  • FIG. 25A a row use vector corresponding to the rows of the JOIN relation to be constructed is created. Then, during block 1104, an index vector associated with the JOIN relation is created.
  • the index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the JOIN row use set. Each bit indicates whether the unique value associated with the row use vector exists in- the JOIN column, which is to be displayed.
  • the unique value associated with the row use vector does not exist in the JOIN column to be displayed.
  • the index vector contains a quantity of binary bits equal to the number of row use vectors of the JOIN row use set.
  • the . row use set for the JOIN column to be displayed is obtained.
  • the first row use vector associated with the JOIN column row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18 (FIG.
  • the RPU 22 via BBVP 14 performs a Boolean AND operation on a row use vector with the row select vector.
  • the row select vector is a binary bit vector, and each binary bit corresponds to a row of the column or columns to be displayed by the system.
  • a binary bit set to "1" in the row select vector indicates that the corresponding row needs to be displayed by the system.
  • the result of the AND operation is a binary bit vector Z, which depicts the rows of the column which contain a particular value associated with the current row use vector.
  • the results of the AND operation are sent via bus 48 back to memory 18 (FIG.
  • the RPU 22 determines whether the resultant vector is "0". More particularly, the resultant vector Z contains all binary bits set to "0". If the binary bit vector is "0", then during block 1112, a "0" is placed in the first binary bit position of the index vector to indicate that the unique value associated with the row use vector does not exist in the JOIN column to be displayed. However, if the resultant bit vector Z is not equal to "0", then processing continues to block 1114. The RPU 22 via BBVP 14 sets the binary bit of the index vector, which is associated with the current row use vector, to "1". In block 1116, the resultant bit vector Z is stored in memory 18. The resultant bit vector Z is identified for this particular row use vector presently being processed, and it will later be used in the reconstruction process.
  • the RPU 22 via BBVP 14 clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z.
  • the purpose of this step is. to "shortcut" the processing of the row use vectors in the row use set. Stated differently, when the JOIN row select vector is cleared, all of the values in the rows to be displayed for the column can be determined.
  • the RPU 22 (FIG. 1A) determines whether the JOIN row select vector has had all its binary bits set to "0". If the JOIN row select vector contains only binary bits set to "0", then processing continues at block 1126.
  • processing continues at block 1122.
  • the RPU 22 gets the next JOIN row use vector in the JOIN row use set currently being processed.
  • the RPU 22 determines whether the end of the JOIN row use vector has been reached. If the end of the JOIN row use vector has been reached, then processing continues at block 1126. However, assuming that the end of the row use vector has not been reached, then processing continues at blocks 1108, 1110, 1114, 1116, 1118, 1120, 1122 and 1124 until all of the JOIN row use vectors of the JOIN row use set have been completely processed.
  • processing continues at block 1126.
  • the RPU 22 determines whether the present column for display references a value set or a relation.
  • the DISPLAY/RECONSTRUCT routine for the JOIN operation starts with the row use set of the JOIN column and then references the relation corresponding to the row use set in the JOIN relation.
  • the column to be displayed is a JOIN column, and it does not reference a value set. Therefore, processing continues at block 1112, during which the REFERENCE RELATION routine (FIG. 25B) is called.
  • the RPU 22 obtains the entity select vector associated with the JOIN column.
  • the entity select vector for the JOIN column is used as a row select vector for obtaining values from the referenced relation. Processing returns to the DISPLAY/RECONSTRUCT FOR JOIN operation routine (FIG. 25A) at block 1104.
  • an index vector for the referenced relation is created. More particularly, the index vector contains a number of binary bits is equal to the number of row use vectors of the reference relation.
  • the RPU 22 (FIG. 1A) obtains a row use set for a column of the relation to be displayed. The first row use vector associated with the row use set is obtained.
  • the RPU 22 instructs the BBVP 14 to perform a Boolean AND operation on the row use vector obtained in block 1106 with the row select vector obtained during the REFERENCE RELATION routine (FIG. 6B) .
  • the result of the AND operation is a vector Z which displays the rows of the column which contain a particular value associated with the row use vector.
  • the RPU 22 determines whether the resultant vector Z is "0". If the resultant vector Z is "0", then during block 1112, a "0" is set in the first binary bit position of the index vector. Processing continues at block 1122, during which the next row use vector associated with the column in the reference relation is obtained. Returning to block 1110, if the result of the Boolean operation performed in block 1108 is "0", then processing continues at block 1114. During block 1114, the RPU 22 via BBVP 14 (FIG. 1A) sets the binary bit in the index vector, which is associated with the current row use vector to "1". During block 1116, the resultant bit vector Z is stored in memory 18 (FIG. 1A) .
  • the RPU 22 instructs the BBVP 14
  • the RPU 22 instructs the BBVP 14 to determine whether the row select vector has been completely cleared. Stated differently, whether all the binary bits have been set to "0". The row select vector contains only binary bits set to "0", and processing continues at decision block 1126. Otherwise, assuming that all of the binary bits in the row use vector are not set to "0", then processing continues at block- 1122. During block 1122, the RPU 22 (FIG.
  • processing continues at block 1126.
  • the RPU 22 determines whether the column currently being evaluated references a relation or a value set. The currently column presently being evaluated references a value set, and thus, processing continues at block 1130.
  • the REFERENCE VALUE SET routine (FIG. 25C) is called.
  • the REFERENCE VALUE SET routine obtains the values from the value set and places the values in the proper rows of the JOIN relation column. More particularly, referring to block 1142, the RPU 22 instructs BBVP 14 (FIG. 1A) to determine the ordinal positions of the binary bits set to "1" in the index vector for the referenced column. Each binary bit set to "1" indicates which row use vectors reference unique values to be displayed in the referenced column. In addition, for each row use vector which has a corresponding binary bit set to "1" in the index vector, the ordinal position of the corresponding binary bit set to "1" in the entity select vector associated with the value set is determined.
  • This process is performed for each of the row use vectors which has a corresponding binary bit set to "l" in the index vector for the referenced column.
  • the values associated with the ordinal positions obtained in the block 1142 are retrieved from the referenced value set.
  • the RPU 22 finds the appropriate location in the index vector associated with each of the values retrieved in block 1144.
  • the appropriate resultant Z vector associated with each of the row use vectors is retrieved.
  • the resultant Z vector indicates which rows of the referenced column contain the unique value associated with the row use vector.
  • the RPU 22 (FIG. 1A) determines whether any more values are left for processing.
  • processing continues at blocks 1146 and 1148 until all the values from the value set have been placed in the proper rows of the referenced column. Assuming that all of the values have been placed in the proper rows of the column, then processing continues at block 1152.
  • the corresponding JOIN row use set is determined. More particularly, the corresponding binary bit set to "1" in the index vector for the JOIN column is determined.
  • the appropriate vector Z temporarily stored at an earlier step is obtained. The resultant vector Z indicates the row positions of the JOIN column where the value from the referenced column should be placed.
  • the RPU 22 determines whether any more values from the referenced column need to be processed. If more values need to be processed, then processing continues at blocks 1152, 1153 and 1155 until all the values of the referenced column have been processed. Assuming that all the values have been processed, then processing returns to the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) . During block.1132, processing returns to the calling program. 1. Example of the DISPLAY/RECONSTRUCT Operation For A JOIN Relation
  • FIGS. 20, 25A, 25B, 25C, 26A, 26B, 26C, 26D and 26E a detailed example for performing the DISPLAY/RECONSTRUCT program on the JOIN column for Suppliers IDs in the JOIN relation (604, FIG. 29) is now discussed. More particularly, it is assumed that only a binary representation of the JOIN column for Suppliers IDs exists in the RDMS 10. The binary representation of the JOIN relation is a result of the JOIN operation, or it may have been previously stored after processing of the BBVP 14 (FIG. 1A) . In either case, the binary representation of the JOIN column exists in memory 18 and now the user or applications program needs to display the actual values of the JOIN column.
  • this example is for reconstructing and displaying only one column of the JOIN relation (604, FIG. 20), the remaining columns of the JOIN relation could be reconstructed by sequentially performing the PROJECT operation.
  • the PROJECT routine (FIG. 17A) calls the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 20A, 20B and 20C) .
  • FIGS. 21A, 21B, 21C, 21D and 21E a results table for depicting the results of the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 25A, 25B and 25C) for reconstructing and supplying the Suppliers ID column of the JOIN relation (604, 626 of FIG. 20) is shown.
  • Each row of the results table depicts a result of the routines shown in FIGS. 25A, 25B and 25C.
  • the results table is separated into 12 columns from left to right.
  • the first column of the results table shows the row select vector associated with the JOIN column to be displayed.
  • the second column shows the row use set associated with the JOIN column to be displayed.
  • the third column is the row use vector of the row use set which is currently being processed, and the fourth column is the index vector associated with the row use set corresponding to the JOIN column.
  • the fifth column is a resultant binary bit vector Z.
  • the resultant binary bit vector Z is determined by ending a JOIN relation row use vector and the row select vector.
  • the sixth column depicts the entity select vector associated with the referenced relation.
  • the seventh column is an index vector associated with the row use set corresponding to the referenced relational column.
  • the eighth column depicts the row use set corresponding to the referenced relational column.
  • the ninth column is for the index vector associated with the row use set for the referenced relational column.
  • the tenth column is for the resultant bit vector Z.
  • the resultant bit vector Z is determined by ending a row use vector of the row use set with the entity select vector.
  • the eleventh column is a depiction of the creation of the referenced relational column.
  • the last column depicts the JOIN relational column, which is to be displayed.
  • the referenced relation obtains the entity select vector (600, FIG. 20) associated with the JOIN column.
  • the entity select vector is used as a row select vector for specifying the rows of the referenced relation which have values in the JOIN column (1184, FIG. 26C) .
  • processing returns to the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) at block 1104.
  • Block 1104 creates an index vector for the row use sets corresponding to the Supplier ID column (1181, FIG. 20) .
  • the row use set associated with the Supplier ID column is retrieved (1186, FIG. 26C) .
  • the first binary bit vector of the row use set (1186, FIG. 26C) is ANDed with the entity select vector to form a resultant vector Z (1192, FIG. 26C) .
  • the resultant vector is evaluated to determine if all the binary bits are set to "0".
  • One bit in the resultant vector Z is set to "1", and thus, processing continues at block 1114.
  • the binary bit in the index vector associated with the current row use vector is set to "1" (1194, FIG.
  • the resultant vector Z associated with the current row use vector is stored in memory 18 (FIG. 1A) .
  • the binary bit set to "1" in the row select vector that matched the binary bit set to "1" in the resultant vector Z are cleared (1196, FIG. 26D) .
  • the next row use vector of the row set for the Supplier ID column is obtained (1198, FIG. 26D) .
  • the RPU 22 (FIG. 1A) determines if the end of the row use set has been reached. The end of the row use set has not been reached, and thus, processing continues at block 1108.
  • sets S 1234 and T 1236 define the set of all ordered pairs (S, T) such that s is an element of S and t is an element of T.
  • a "mapping" exists from S 1234. to T 1236. More than one element of set S 1234 may map onto unique elements of T 1236; however, there will never exist more than one element of T into which maps the elements of S.
  • the set S 1234 is the "domain"
  • the set T 1236 is the "range”.
  • FIG. 27 shows the many-to-one mapping. Specifically, si and s2 map to the value tl. The purpose of the entity use vector is to facilitate this relationship between ordinal positions of elements in one set to another set.
  • the entity use vector is a vector whose elements are values expressed in ordinal positions of elements within another set.
  • the entity use vector constitutes a function (or a mapping) between either a value in a value set or a relational row in another or the same relation.
  • the mapping may apply specifically to a relational column, whether it be a single physical column or a physical column consisting of many conceptual columns.
  • the implied ordinal position within the entity use vector corresponds to a row number in the "domain”.
  • the associated value of the element is the entity location (ordinal position of the value in the value set or row position of the relational row being referenced) of the "range”.
  • the value 5 resides. Essentially, the value 5 maps the value in the first row of the city column 174 to the fifth row of the domain 166. Referring to the dotted line 1249, the mapping of the first row of the city column 174 to the fifth row of the value set is shown. Likewise, the fourth row of the city column 174 is also mapped -to the fifth row position of the value set 166. Thus, the first row and the fourth row of the city column 174, both map to the same row of the value set 166. In this way, the many-to-one mapping can be achieved via the entity use vector.
  • the second row and the third row of the city column 174 map to the same row of the value set 166 may be implied mapping at 1247.
  • the fifth row of the city column 174 maps back to the first row of the value set 166.
  • ___ column 174 corresponds to the value 5 in the entity use vector at 1237 which corresponds to the value London.
  • the second row of the city column 174 corresponds to the entity use vector value 8 which corresponds to the value Paris.
  • the third row of the city column 174 corresponds to the value 8 of the entity use vector and to the value Paris.
  • the fourth row corresponds to the value 5 in the entity use vector and to the value London.
  • the fifth row of the city column 174 corresponds to value 1 in the entity use vector which corresponds to the value Athens.
  • the entity use vector will be used specifically during the DISPLAY/RECONSTRUCT operations for the RDMS 10.
  • the entity use vector facilitates the steps for determining the values in the particular columns of a relation and hence facilitates the DISPLAY/RECONSTRUCT process.
  • FIG. 29 a depiction of how entity use vectors may be used in the setting of a JOIN relation is shown. More particularly, the entity use vectors 1238, 1240, 1242, 1244, and 1246 have been added to the binary representation of the JOIN relation of FIG. 20.
  • the entity use vectors 1238, 1240, 1242 and 1244 are identical to the entity use vectors of the Supplier relation in FIG. 28.
  • the new entity use vector of FIG. 29 is at 1246.
  • This entity use vector maps the rows of the JOIN columns to the rows of the input relations. More particularly, the values of the entity use vector correspond to the row numbers of the referenced relation.
  • the first three rows of the entity use vector, 1250, 1251, 1252, respectively, correspond to row 1 of the Supplier relation.
  • the first three rows of the JOIN relation will contain values from the first row of the city column.
  • the city column has associated with it an entity use vector 1244 which has a value 5 in the first row corresponding to the first row of the city column.
  • the value 5 maps back to the fifth row of the value set 166.
  • the fifth row of the value set 166 contains the value London. Therefore, the first three rows of the entity use vector, 1250, 1251, and 1252, correspond to the value London as shown by the physical representation of the S.CITY column of the JOIN relation.
  • the entity use vectors associated with the JOIN relation provide a mapping between the columns of the Supplier relation and the columns of the JOIN relation. As stated earlier, the columns are mapped via the entity use vectors to the value sets of the relational database.
  • the entity use vector will be used during the DISPLAY/RECONSTRUCT process of the JOIN relation.
  • the entity use vector will substantially improve the processing time for generating the actual representation of the JOIN relation by not having to perform the multitude of steps associated with the DISPLAY/RECONSTRUCT operations as discussed earlier. Again, a more detailed discussion on the DISPLAY/RECONSTRUCT will be presented shortly.
  • the entity use vectors will be stored in byte or multi-byte form in memory 18.
  • the RPU 22 causes the appropriate entity use vectors to move from the memory 18 via bus 46 to MVP 15.
  • the RPU 22 evaluates the entity use vectors via the DISPLAY/RECONSTRUCT routines (FIGS. 30 and 31).
  • the relation has been reconstructed and it is ready for display to the external device 12 • for storage or to display 3 (FIG. 1) for the user to ascertain.
  • the entity use vectors are constructed during the load operation performed by the BINARY REPRESENTATION routine at block 356.
  • the system loads the file representation of the relations into external device 12, where they reside until summoned by the RDMS system 10.
  • a particular column is retrieved by referring to its system identifier as discussed earlier and in more detail in part VII.
  • the first value of the particular column is brought to the RPU 22; as discussed earlier, a row use vector associated with the value is built by the BBVP 14. Additionally, a bit is set to "1" in the first position of the row use vector to indicate that the value occupies the first row of the first column of the relation.
  • the first value is brought to the RPU 22, it is also evaluated by the MVP 15.
  • an entity use vector associated with the column is built.
  • the first position of the entity use vector is assigned- a numerical value to indicate the row of the value set corresponding to the input value.
  • the MVP 15 As each value is input from the column, the MVP 15 generates a corresponding value associated with the row position in the value set which contains the particular value.
  • a new entity use vector is created. This process continues until all of the columns input from the external device have associated entity use vectors. As the entity use vectors are constructed, they are sent via bus 46 to memory 18, where they reside until they are called for processing via the DISPLAY/RECONSTRUCT routine at FIGS. 25A, 25B and 25C.
  • FIG. 30A a flow diagram for the DISPLAY/RECONSTRUCT - WITH ENTITY USE VECTORS is shown.
  • the purpose of this routine is to efficiently reconstruct and display the values of a specified column or columns. More particularly, referring to block 1270, the RPU 22 (FIG. 1A) causes the entity use vector associated with the particular column to be displayed to move from memory 18 via bus 46 to MVP 15. Then during block 1272, the RPU 22 determines whether the entity use vector associated with the column references a value set or another relation. Assuming that the entity use vector references another relation, then processing continues at block 1273, during which the REFERENCE RELATION (FIG. 3OB) is called.
  • REFERENCE RELATION FIG. 3OB
  • the REFERENCE VALUE SET routine (FIG. 3OB) is called.
  • the REFERENCE RELATION routine (FIG. 30C) is called when the column to be displayed is, for example, a column of the JOIN relation.
  • the REFERENCE VALUE SET routine (FIG. 30C) is called when the column to be displayed is a column of a base relation (one which references one or more value sets directly) . In either case, when the column has been reconstructed and is ready for display, processing returns to the calling program at block 1276.
  • FIG. 3OB a flow diagram for the REFERENCE RELATION routine is shown.
  • the entity select vector associated with the column is obtained.
  • This entity select vector is used as a row select vector for obtaining the values from a REFERENCE RELATION.
  • the entity select vector 600 (FIG. 29) which is associated with the JOIN column depicting the row use set of the JOIN relation 604 (FIG. 29) could be such a REFERENCE RELATION.
  • Processing continues at block 1282, during which processing returns to the DISPLAY/RECONSTRUCT- WITH ENTITY USE VECTORS routine (FIG. 29A) .
  • a flow diagram for the REFERENCE VALUE SET routine is now discussed. More particularly, during block 1286, the values from the value set ' are obtained corresponding to the ordinal positions expressed by the entity use vector elements. Then, during block 1288, a value is placed in the appropriate row of the column. The ordinal position of the value in the column corresponds to the ordinal position of the corresponding element in the entity use vector. During block 1290, the next value of the value set is obtained and in block 1292, the RPU 22 (FIG. 1A) determines whether all the values have been processed.
  • processing continues at block 1288, 1290, 1292, until all of the values are placed in the appropriate row positions of the column. Once all the values have been placed in the column, processing continues at block 1294. During block 1294, it is determined whether there is another column to go through. More particularly, whether there is a JOIN column which needs to be reconstructed, for example, the CITY column 638 (FIG. 29) of the JOIN relation. Assuming that a JOIN relation needs to be reconstructed, then processing continues at blocks 1288, 1290, 1292, 1294 until all the values associated with the JOIN column have been placed into the proper rows of the column.
  • processing continues at block 1296-, during which processing returns to the DISPLAY/RECONSTRUCT - WITH ENTITY USE VECTORS routine (FIG. 30A) at block. 1276. During block 1276, processing returns to the calling program and the display of the column has been completed.
  • FIG. 31 a representation of a relational database is shown.
  • the top portion of FIG. 31 depicts seven domains of unique values referenced by two relations (Supplier and Parts) .
  • the third relation, labeled JOIN relation references both the Supplier and Parts relations as shown. All three relations depicted in FIG. 31 were thoroughly discussed in the previous sections of the specification. Specifically, the Supplier relation is detailed in FIG. 4, the Parts relation is detailed in FIG. 5, and the JOIN relation is detailed in FIG. 19.
  • the three relations are assumed to be in their binary representations and stored in memory 18 (FIG. 1A) .
  • the RDMS 10 maintains track of all of the columns of the relational database (FIG. 31) via an identification scheme, called the System relation.
  • the System relation logically connects all of the necessary information for describing a column of any of the relations.
  • the System relation for the Supplier/Parts relational database (FIG. 31) is shown.
  • the System relation is broken up into four columns. From left to right, the column identifier (CID) , the relation identifier (RID) , the attribute identifier (AID) , and the domain identifier (DID) .
  • the CID is a number which distinguishes a column from all of the other columns of the relational database.
  • the RID identifies the relation in which a column logically resides. For example, column 80 (FIG. 2) resides in relation 63 (FIG. 2) .
  • the AID defines the order or position in which the column resides in a particular relation. For example, column 80 (FIG.
  • each element of the Supplier/Parts relational database is identified by a CID number.
  • the Supplier/Identifier domain is characterized by a CID number "1" and an RID number "1". Domains do not have AID or DID numbers because the relation only has a single column, and it is itself a domain. Normally, a single column in the relational database would have associated with it an AID number of 1; however, in the preferred embodiment, the domain is always considered to have an AID of 0.
  • the Parts Identifier domain is characterized by a CID number "2" and an RID number "2". The domains are more fully depicted in FIG. 33 with their associated CID and RID numbers.
  • the CID and RID numbers for the domains are shown in the first seven rows of the system relation (FIG. 32) .
  • the CID and the RID number any of the domains for the Supplier/Parts relation (FIG. 31) can be referenced and obtained.
  • the Supplier relation is identified by a special row of the System relation, which generates a virtual column (as- seen by the user) for identifying the Supplier relation.
  • Each relation of the relational databases has a virtual column. The user never sees this column. This column
  • the virtual column contains row numbers associated with the rows of the relation. This column by convention does not have associated with it an AID or a DID number. Thus, in the eighth row of the system relation (FIG. 32) , the AID and DID numbers are set to 0.
  • the first column (which the user sees) of the Supplier relation is the Supplier ID column and it has associated with it a CID number 9.
  • the Supplier ID column is part of the Supplier relation, and thus, its RID number is 8.
  • the Person Name column of the Supplier relation has a CID number 10 and it, too, is part of the Supplier relation, and thus, has an RID number of 8.
  • the Status column of the Supplier relation has a CID number 11 and it is part of the Supplier relation, and thus, has an RID number 8.
  • the last column of the Supplier relation is the City column and it has a CID number of 12 and an RID number 8.
  • the Supplier ID column has an AID number of 1, which identifies the Supplier ID column as the first column of the Supplier relation.
  • the Supplier ID column is associated ' with the Supplier Identifier domain, and thus, the DID number associated with the Supplier column is 1.
  • the CID number for a Supplier Identifier domain is equal to 1, and this corresponds to the DID number for the Supplier ID column.
  • the Person Name column of the Supplier relation has an AID number of 2 and a DID number of 3. More particularly, the AID number 2 means that the Supplier Name column is the second column of the Supplier Relation and the DID number 3 means that the Person Names column is associated with the Person Names domain.
  • the Status column of the Supplier Relation has an AID number 3 and a DID number 7.
  • the AID number 3 means that the Status column is the third column of the Supplier relation and the DID number 7 means that the Numbers domain is associated with the Status column.
  • the City column has AID number 4 and DID number 5 associated with it.
  • the AID number 4 means that City column is the fourth column of the system relation and the DID number 5 means that the City domain is associated with the City column of the Supplier relation.
  • the Parts relation also has a virtual column which is identified by the CID number 13. Additionally, the JOIN relation is identified by a virtual column having a CID number 19. The columns of the Parts relation are identified by the CID, RID, AID and DID numbers in the same way the columns are identified for the Supplier relation.
  • the CID number 20 corresponds to a JOIN column which references the Supplier relation having RID 8 (shown by the dotted lines) . More particularly, the DID number associated with the JOIN column (CID 20) is 8, which identifies the Supplier relation (RID 8) as the domain to the JOIN column.
  • the AID number associated with the JOIN column having CID 20 is 1, which means that this ' column is the first column of the JOIN relation.
  • the second column of the JOIN relation having CID number 21 references the Parts relation. More particularly, the DID number associated with the JOIN column having CID 21 is 14, which is the RID number for the Parts relation.
  • FIGS. 34, 35, 36, 37A, 37B, 37C and 37D four constructs for developing each column referenced by the CID numbers in the system relation (FIG. 32) are shown. Specifically, a set containing domains and entity select vectors (FIG. 34) , an entity use set (FIG. 35) , a row select set (FIG. 36) , and a row use vector set (FIGS. 37A, 37B, 37C and ' 37D) are all used to build a particular column of the relational database. Essentially, each construct is a "set of sets".
  • the set contains 21 elements, and each number below each element corresponds to a different CID number or row of the System relation (FIG. 32) .
  • the first element, 1, of the set corresponds to the first row of the System relation (FIG. 32) in which the CID number is 1, RID number is 1, AID number is 0, DID number is 0.
  • the first element of this set characterizes all of the elements of the value set for Supplier IDs in a sorted fashion.
  • FIG. 34 shows the domains of the relational database depicted in FIG. 34 sorted by a lexical ordering scheme. However, any ordering scheme may be used, such as, order by entry, or by size (number of letter or number or both) etc.
  • FIG. 35 shows the set of entity use vectors for mapping each value in each column of each relation
  • FIG. 31 back to the ordinal position of the associated unique value in the domain.
  • Elements 1 through 7 (FIG. 34) of the entity select set are lexically ordered which do not require entity use vectors for mapping because the domains are ordered to begin with. If the values in the domains (elements 1 through 7, FIG. 25) are in order, entity use vectors can be used for mapping the values in the column in the relations (elements 9-12, 14-18, 20 and 21) .
  • the column having CID 9, RID 8, AID 1 and DID 1 is the Supplier ID column of the Supplier relation.
  • This column is characterized by the entity select set (FIG. 34) , entity use set ' (FIG. 35) , row select set (FIG. 36) and row use set (FIGS. 37A, 37B, 37C and 37D) .
  • the entity select vector associated with the Supplier ID column is referenced by the vector element 9.
  • the vector element 9 corresponds to the entity select vector "1010111000", which characterizes a subset of the Supplier identifier domain comprising those values which appear one or more times in the Supplier ID column.
  • the entity use vector associated with the Supplier ID column is referenced by the ninth element of the set.
  • the entity use vector for the ninth element characterizes a mapping between the values of the Supplier Identifier domain and the Supplier ID column.
  • the values of the column are shown in the order they occur in the domain and thus, the ordering shown in the entity use vector represents a direct ordinal numbering scheme.
  • the entity use vector represents a mapping between the row positions of the column and the values of the Supplier Identifier domain.
  • the row select vector associated with the Supplier ID column is referenced by the ninth element of the vector.
  • the ninth element of the vector corresponds to the row select vector "11111" which means that all of the rows of the column are valid.
  • FIGS. 37A and 37C the row use set associated with the Supplier ID column is referenced by
  • the row use set corresponding to the ninth element also represents a mapping from a unique value from the Supplier's domain to a set of rows of the Supplier ID column.
  • the set of rows for the Supplier ID column is characterized by the row use vector.
  • the row use set contains a series of row use vectors for providing a mapping between the sorted value set and the unsorted value set.
  • the function performed by the row use vector associated with the first element of the vector (FIG. 37A) is the same as the entity use vector associated with the first element of the vector in the entity use set (FIG. 35) .
  • the entity use vector is more efficient than the row use vector for mapping an unsorted value set to a sorted value set in the row use set.
  • the row use set is more efficient than the entity use vector for mapping the sorted value set to the unsorted value set.
  • the relational database system could perform with either one, the row use set or the entity use set; however, for efficiency, both are maintained by the preferred embodiment of the invention.
  • the ninth elements of the row use set and the entity use set are shown in (FIG. 37C) and (FIG. 35) , respectively.
  • the column at rows 8, 13 and 19 of the System relation is also characterized by the constructs depicted in FIGS. 34, 35, 36, 37A, 37B, 37C and 37D. More particularly, referring to FIG. 34, the eighth element of the set corresponds to the entity select vector for the column having CID number 8. The entity select vector referenced by the eighth element of the vector has five binary bits set to "1", which indicates that the relation contains five valid rows. Referring now to FIG. 35, the entity use vector associated with the column having CID 8 is referenced by the eighth element of the set. The entity use vector associated with the eighth element is empty because a mapping would not add any useful information for characterizing the column. Referring to FIG.
  • the eighth element of the row select set is depicted for representing the valid rows of the column. Note that there is the binary bit vector having five binary bits set to "1" corresponding to the five binary bits set to "1" in the entity select vector (FIG. 34) .
  • the row use vector associated with the column is referenced by the eighth element of the set as shown in FIG. 37B.
  • the row use vector is empty because a mapping would not add any further useful information for characterizing the column.
  • the row use set is more efficient for mapping a unique value from a value set to one or more rows of a column than the entity use vector and vice versa.
  • the system maintains both constructs to create a maximum efficiency for mapping rows of a column to and from a value set.
  • the RPU 22 creates an empty row in the system relation (FIG. 32) for the domain identifiers during block 1288. Then, during block 1290, .the RPU 22 (FIG. 1A) sets the CID number equal to the row number of the system relation. For example, the first domain identified by the RPU 22 (FIG. 1A) would have a CID number of 1 because it occupies the first row of the system relation (FIG. 32). During block 1292, the RPU 22 sets the RID number equal to the CID number, and during block 1294, the AID number and DID number are set equal to 0.
  • the RPU 22 In block 1296, the RPU 22 generates an element associated with the entity select set (FIG. 34) , the entity use set (FIG. 35) , the row select set (FIG. 36) , and the row use set (FIG. 37A, 37B, 37C and 37D) . During a later step in this routine, the necessary information for characterizing each of the columns is loaded during block 1332 (FIG. 38D) .
  • the RPU 22 determines whether any more domains need to be identified by the system.
  • seven different "create domain" commands have been specified for identifying each of the seven domains in the relational database.
  • processing will return to blocks 1286, 1288, 1290, 1292, 1294, 1296 and 1298, during which the RPU 22 will identify each of the domains of the relational database. Assuming that all of the domains have been identified with their respective CID, RID, AID, DID and element locations in the vectors, processing will continue at block 1300.
  • the application identifies each table of the relational database. Specifically, the application provides the system with the following commands: CREATE TABLE - SUPPLIER
  • Part ID Part Name; Color; Weight; City
  • the RDMS enters another row into the system relation (FIG. 32) for identifying the relation with a column.
  • the CID number is set equal to the row number of the system relation, and during block 1306, the AID number and a DID number are set equal to 0.
  • a new element is entered to each of the sets corresponding to the entity select set (FIG. 34) , the entity use set (FIG. 35) , the row select set (FIG. 36) , and the row use set vector (FIG. 37) .
  • the RPU 22 sets the variable CURRENT AID NUMBER equal to 1.
  • the RPU 22 (FIG. 1A) generates a new row in the system relation (FIG. 32) .
  • the CID number is set equal to the row of the system relation
  • the RID number is set equal to the RID number of the column for identifying the relation corresponding to the current column.
  • the AID number is set equal to the "current AID number” and during block 1322, the "current AID number” is incremented by 1.
  • the DID number for the current column is set equal to the CID number of the remaining reference by the current column. Processing continues at block 1326 (FIG. 39D) where new elements are added to the entity select set (FIG.
  • the RPU 22 determines whether any more columns need to be created for the particular relation specified during block 1300. If more columns need to be identified by the RPU 22 (FIG. 1A) , then processing continues at blocks 1310, 1312, 1314, 1316, 1318, 1320, 1322, 1324, 1326 and 1328 until all the columns of the specified relation have been identified. Assuming that all the columns of the specified relation have been identified, then processing continues at block 1330.
  • the RPU 22 determines whether any more relations need to be identified. Assuming that more relations need to be identified, then processing returns to block 1300, where the next relation to be created by the next CREATE TABLE instruction, listed above, is specified. Blocks 1300 through 1330 are performed until all of the relations in the relational database have been identified and the columns associated with each of the tables also have been identified. Assuming that all the tables and their related columns have been identified, then processing continues at block 1332. During block 1332, the RPU 22 instructs the file associated with each table of the relational database to be transferred from external device 12 to the RDMS 10 via 30 to the RPU 22.
  • the RPU 22 instructs the BBVP 14 to generate the entity select vector (FIG. 34) , the row select vector (FIG. 36) and the row use set (FIG. 37A) associated with each column. Simultaneously, the RPU 22 instructs the MVP 15 to generate the entity use vector (FIG. 35) associated with each of the columns.
  • the entity select vector, entity use vector, row select vector, and row use set are retrieved by the RPU 22 via the system identifiers (i.e. CID, DID, AID, RID identifiers). Then, during block 1334, the RPU 22 determines whether any more files associated with the relations need to be loaded into the RDMS 10.
  • processing returns to block 1332; otherwise, processing continues at block 1336. Assuming that all the files have been loaded into the RDMS 10, then processing returns to the calling program at block 1336. At this time, all of the columns and the relations of the relational database have been identified and all of the characterizing vectors associated with each of the columns are also organized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An apparatus and/or method utilizing a computer, for creating a relational database (63, 65 and 67), the relational database contains a plurality of relations and each of the relations contains one or more colums and rows. A column has one or more values, which all have a common characteristic. Each value of the column corresponds to one of the rows of the relation. Each row contains some or more values in each value is from a different column. The values in each row have one or more characteristics. The creation of a relational database occurs in three steps. First, for each characteristic of the relational database, a set containing a plurality of unique values is formed. Second, for each relation of the relational database, one or more subsets of each set containing unique values of the relation is formed. Each of the subsets contains one or more of the unique values of one of the sets. Third, the relations of the relational database are formed. More particularly, for each subset associated with a particular relation, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more rows of the relation.

Description

A RELATIONAL DATABASE REPRESENTATION
WITH RELATIONAL DATABASE OPERATION CAPABILITY
Background of the Invention Field of the Invention This invention relates to a computer apparatus and/or method for creating a relational database. This invention also relates to a computer apparatus and/or method for efficiently processing relational operations on one or more relations. Prior Art
A database system is basically a computerized record-keeping machine — that is, a system whose overall purpose is to maintain data and to make that data available. The data maintained by the system can be any data deemed to be of significance. The basic purpose of a database system is to assist in the process of accessing data.
Almost all database systems developed over the past few years are called relational type databases. The concept is described in Date, An Introduction to Database Systems, 4th Edition, 19-20 (1986) . "A relational database is a database that is perceived by users as a collection of tables and nothing but tables." Id. at 96. The relational database system enables the user to generate new tables from old tables and to extract and combine subsets of information from one or more tables.
Although the current database systems are user friendly and efficient for maintaining and manipulating relatively small quantities of data, they are not amenable to larger data requirements such as for large company employee records, government agency statistics, etc. Typically, such relational databases require an enormous amount of memory to be maintained by the underlying systems because the smallest data value representation is in the form of a "byte". Additionally, due to complex indexing structures, a large amount of overhead software is necessary for maintaining the relational database. As a result, data manipulation is inefficient, and additional memory must be allocated to accommodate the software. Most importantly, unique data values may be stored redundantly by the system. The redundant storage of values causes even more unnecessary memory allocation. Consequently, current relational databases are not cost effective and they do not lend themselves to efficient manipulation of data, particularly when data requirements are large.
Summary of the Invention
The present invention introduces a computer apparatus and/or method for efficiently accessing, representing and manipulating data in a relational database. More particularly, the present invention introduces a computer apparatus and/or method for creating and representing a relational database and an apparatus and/or method for efficiently processing relational operations on one or more relations.
Briefly, one aspect of the preferred embodiment of the present invention includes an apparatus and/or method, utilizing a computer, for creating a relational database. The relational database contains a plurality of relations and each of the relations contains one or more columns and rows. A column has one or more values, which all have a common characteristic. Each value in the column corresponds to one of the rows of the relation. Each row contains one or more values, and each value is from a different column. The values in each row thus have one or more characteristics. The creation of a relational database occurs in three steps. First, for each characteristic of the relational database, a set, called a "value set" or "domain" is formed containing all the values entered in the data base having the same characteristic. Second, for each relation (table) of the relational database, one or more subsets of each value set containing values unique to the particular relation is formed. Each of the subsets contains one or more of the unique values of one of the value sets. Third, the relations (tables) of the relational database are formed. More particularly, for each subset associated with a particular relation, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more of the rows of the relation.
Typically, the first step, discussed above, involves an additional step of forming the set of unique values in some desired order of occurrence (e.g., numerical, lexical, or order of entry into the system) . The second step, discussed above, may involve a step of representing a subset in the form of an entity select vector of binary bits, where each of the binary bits of the binary bit vector has an order of occurrence which corresponds to the order of occurrence of the unique values in the set. The binary bits of the bit vector represent the presence or absence of each of the unique values in the subset, i.e., the vector selects which entities or values in the value set are in the associated subset forming the column of the relation. The third step, discussed above, may also involve the additional step of forming a binary representation of each relation in the relational database. In this step, for each subset (column) associated with one of the relations (tables) of the relational database, a vector of binary bits is created for each unique value in the subset. Each binary bit of this binary bit vector corresponds to one of the rows of the column, and the binary bits can be formed in an order of occurrence corresponding to the order of occurrence of the rows in the column or in some other appropriate order. Also, each binary bit represents the presence or absence of each unique value in one of the rows of the column.
Significantly, by forming subsets of each set of unique values associated with a particular relation, all relational operations are facilitated in terms of speed and total functionality. In addition, a substantial savings in memory occurs. This savings occurs because an entire value set, which has one or more values in the relation, need not be directly employed within the relation. Instead, only the entity select vector for each column of a relation need be referenced. As a result, all unique values within a domain are stored in conventional format only once in memory from which any number of subsets may be formed. Additionally, by forming binary bit vector representations of each subset associated with each relation, and by forming binary bit vector representations of each relation, an even further savings in memory occurs. The binary bit vector representations are also compressible. The techniques for compressing the bit vectors are thoroughly discussed in a co-filed, application to Glaser et al, entitled "A BIT STRING COMPRESSOR WITH BOOLEAN OPERATION PROCESSING CAPABILITY", which is herein incorporated by reference. Most importantly, the binary representation of the sets and relations creates a content-addressable system. In the past, content addressability has been implemented as a very expensive memory hardware solution, or as a computationally intensive inverted tree approach. These approaches have allowed processing to occur, but in a conventional and inefficient way. The present invention, on the other hand, is the first implementation that uses conventional memory, but imposes an efficient and unique processing solution by representing the relations in a binary bit vector representation.
The present invention also has the capability of efficiently performing database operations on one or more relations. Binary bit vector represented relations need not be decompressed in order to perform database operations. Computer time is saved by not having to decompress the data, and processing is made more efficient, because only a relatively small amount of compressed data needs to be considered while performing the relational operations. In the present invention, the relational operations are reduced to operations on bit strings. This aspect of the present invention involves a method and/or apparatus using a computer, for efficiently performing a relational operation on one or more relations of a relational database in order to produce a binary representation of a resulting relation. Typically, the relational operations performed are in the form of "SELECT," "JOIN" ("EQUIJOIN" or "NATURAL JOIN") and "PROJECT."
In the preferred embodiment, the SELECT operation determines which one or more rows of a relation corresponds to one' or more selected unique values. The selection process occurs in two steps. First, the binary bit vector(s) associated with the one or more selected unique values are retrieved. (Each binary bit of the binary bit vector indicates the one or more rows of the binary represented relation which contain the selected unique value) . Second, when there is more than one selected unique value, a Boolean OR operation is performed on the retrieved binary bit vectors to determine the binary representation of a resultant relation. More particularly, the resultant relation is represented by a binary bit vector which indicates the one or more rows of the relation which contain the selected unique values from one column of the relation. The quantity of the rows of the relation which contain the one or more selected unique values can be quickly determined by counting the binary bits of the resultant relation which indicate the presence of the selected values. The preferred embodiment also has the capability of performing SELECT on more than one selected unique value which occur in more than one column of the relation. The selection process occurs in three steps. First, for each column, the binary bit vectors associated with the selected unique values are retrieved. Each binary bit of the binary bit vector indicates a row of the binary represented column which contains the selected unique values. Second, for each column, when there is more than one selected unique value from the column, a Boolean OR operation is performed on the retrieved binary bit vectors to determine a SELECT binary bit vector indicating the one or more rows of the relation which contain the selected unique values. This bit vector characterizes a subset of the rows of the original relation. Third, a Boolean operation (i.e., AND, OR, etc.) is performed on the SELECT binary bit vectors to determine a resultant relation.
In the preferred embodiment of the present invention, the step for performing a relational operation also may include the step for performing a JOIN on one or more rows of a first binary represented relation with one or more rows of a second binary represented relation. More particularly, the JOIN operation may be specified to occur only when a column of the first relation contains values having a particular characteristic (i.e., belonging to particular domain) and a column of the second relation contains values having the same characteristic, which bear a desired relationship with values from the column of the first relation. That is, they are greater than, less than, equal to, equal to or greater than, equal to or less than the corresponding values of the first relation. (The second relation need not be distinct from the first.) In the preferred embodiment, performing a PROJECT operation is also included. The PROJECT operation comprises generating the binary bit vectors associated with any subset of columns of a particular relation. The columns selected can be reconstructed for a user to interpret.
Any relational operation which creates a second resultant relation can be implemented with the present invention. Although only SELECT, PROJECT and JOIN are thoroughly discussed, one skilled in the art of database systems understands that there are comparatively few practical problems that cannot be solved by SELECT, PROJECT and JOIN alone. Additionally, by following the strategy set forth in the specification for implementing SELECT, PROJECT and JOIN, one skilled in the art of relational databases will see that the other relational operations such as PRODUCT, UNION and DIFFERENCE can be implemented by the present invention as well. Lastly, functions such as INSERT and DELETE for updating rows of a binary represented relation and for updating unique values to sets of unique values, are also implemented by the present invention.
The present invention significantly enhances a computer's capability of processing relational operations by performing the relational operations directly at the binary bit level. Additionally, the Boolean operations can also be performed more efficiently by a special Boolean Logic Unit discussed in a co-filed application to Glaser et al. , entitled "A BIT VECTOR PROCESSOR WITH BOOLEAN OPERATION PROCESSING CAPABILITY," filing date September 24, 1987.
The aspects of the present invention, discussed above, together create a relational database system which can more efficiently access, represent and manipulate data than any other relational database system presently available.
Brief Description
FIG. 1 depicts a computer system equipped with a Relational Database Management System (RDMS) for creating a binary representation of a relational database and/or for efficiently processing relational operations on the one or more binary represented relations in accordance with the present invention;
FIG. 1A depicts the RDMS on which software programs and/or hardware are performed for representing the relations and for performing relational operations;
FIG. 2 represents a typical relational database provided by a user of the system depicted in FIG. 1;
FIG. 3A represents an ordered set S where the ordering is a lexical ordering; FIG. 3B represents an (ordered) subset of the ordered set represented in FIG. 3A;
FIG. 3C represents an empty or null set A; FIG. 3D depicts a binary bit vector expression a for characterizing the subset depicted in FIG. 3B; FIG. 3E is a binary bit vector a1 for characterizing the ordered set shown in FIG. 3A;
FIG. 3F depicts a binary bit vector representation of a null subset a" shown in FIG. 3C;
FIG. 4 depicts a binary representation of the Suppliers relation as shown in FIG. 2;
FIG. 5 depicts a binary representation of the Parts relation as shown in FIG. 2;
FIG. 6 depicts a binary representation of the Shipments relation as shown in FIG. 2; FIG. 7 is a flow diagram of the BINARY
REPRESENTATION routine;
FIGS. 8A and 8B are a table depicting the results of executing a series of commands for setting up the columns of the relational database shown in FIG. 2; FIGS. 9A, 9B. and 9C are a table depicting the results of constructing the binary representation of the Suppliers relation (FIG. 2) by the BINARY REPRESENTATION routine; FIG. 10A is a flow diagram of the INSERT routine;
FIG. 10B is a flow diagram of the INSERT VALUE INTO VALUE SET routine;
FIG. IOC is a flow diagram of the UPDATE SUBSET routin ; FIG. 10D is a flow diagram of the ADD VALUE TO
COLUMN routine;
FIG. 11A is a flow diagram of the DELETE routine; FIG. 11B is a flow diagram of the DELETE VALUE FROM COLUMN routine; FIG. 11C is a flow diagram of the DELETE VALUE FROM
SUBSET routine;
FIG. 11D is a flow diagram of the DELETE VALUE FROM VALUE SET routine;
FIG. 12 is a results table depicting the operations performed by the INSERT routine (FIG. 10A) ;
FIG. 13 is a table depicting the results of the operations performed by the DELETE routine (FIG. 11A) ; FIG. 14 is a flow diagram of the SELECT routine; FIGS. 15A and 15B depict a table of the results of the operations performed by- the SELECT routine for a two-column SELECT for two values;
FIGS. 16A and 16B depict a table of results of the operations performed by the SELECT routine (FIG. 14) on two columns for multiple values; FIG. 17A is a flow diagram of the PROJECT routine;
FIG. 17B is a flow diagram of the DISPLAY/RECONSTRUCT routine;
FIGS. 18A, 18B, 18C and 18D depict a table of the results of the operations performed by the PROJECT routine (FIG. 17A) ; FIG. 19 depicts a binary representation of a JOIN relation;
FIG. 20 depicts, a more detailed version of the binary representation of the Suppliers portion of the JOIN relation (FIG. 19) ;
FIG. 21 depicts a more detailed version of the Parts relation portion of the JOIN relation (FIG. 19) ;
FIG. 22A is a flow diagram of the EQUIJOIN routine; FIG. 22B is a flow diagram of the BUILD ROW USE SETS routine;
FIG. 22C is a flow diagram of the EVALUATE ROW USE SETS routine;
FIG. 22D is a flow diagram of the CONSTRUCT JOIN ROW USE VECTORS routine; FIG. 22E is a flow diagram of the PRODUCTS routine;
FIG. 22F is a flow diagram of the NUMS routine; FIG. 22G is a flow diagram of the GENERATE BIT STRING routine;
FIGS. 23A, 23B, and 23C represent a table of the results of the operations performed by the EQUIJOIN routine (FIG. 22A) ;
FIG. 24 is a flow diagram of the GREATER THAN JOIN routine;
FIG. 25A is a flow diagram of the DISPLAY/RECONSTRUCT FOR JOIN routine;
FIG. 25B is a flow diagram of the REFERENCE RELATION routine;
FIG. 25C is a flow diagram of the REFERENCE VALUE SET routine; FIGS. 26A, 26B, 26C, 26D, 26E, 26F and 26G depict a table of results for the operations performed by the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) ;
FIG. 27 represents a mapping of the elements of Set S into Set T; Set S is the "domain" and Set T is the "range." FIG. 28 represents the binary representation of
Suppliers (FIG. 4) , with entity use vectors added;
FIG. 29 depicts a binary representation of the Suppliers portion of the JOIN relation (FIG. 20) with entity use vectors added;
FIG. 30A is a flow block diagram of the DISPLAY/RECONSTRUCT WITH ENTITY USE VECTORS routine;
FIG. 3OB is a flow block diagram of the REFERENCE RELATION routine; FIG. 30C is a flow block diagram of the REFERENCE
VALUE SET routine;
FIG. 31 depicts a relational database; FIG. 32 depicts a SYSTEM RELATION;
FIG. 33 depicts the domains of the relational database shown in FIG. 31;
FIG. 34 depicts the ENTITY SELECT SET associated With the SYSTEM RELATION (FIG. 32) ;
FIG. 35 depicts the ENTITY USE SET, which is associated with the SYSTEM RELATION (FIG. 32) ; FIG. 36 depicts the ROW SELECT SET associated with the SYSTEM RELATION (FIG. 32) ;
FIGS. 37A, 37B, 37C, and 37D depict the ROW USE SETS associated with the SYSTEM RELATION (FIG. 31) ;
FIGS. 38A, 38B, 38C, and 38D are flow block diagrams of the DATABASE IDENTIFICATION routine.
TABLE OF CONTENTS
DETAILED DESCRIPTION
I. Hardware Level of the Preferred Embodiments
II. A Detailed Discussion on Relational Databases
III. Binary Bit-Vector Technology
IV. Binary Representation of a Relational Database
A. Example of Generating a Binary Representation of a Relation
B. Example of Building a Binary Represented Relation V. Operations Performed on Binary
Representations of Relations A. INSERT
1. Detailed Example for the INSERT Function B. DELETE
1. Detailed Example for the DELETE Operation
C. SELECT
1. Detailed Example of a Two- Column SELECT for Two Values.
2. Detailed Example of Two Column
SELECT for Multiple Values
D. RECONSTRUCT
1. Detailed Example of Performing PROJECT Operation
E. JOIN
1. Binary Representation of a JOIN Relation 2. Constructing a Binary
Representation of a JOIN Relation
3. Detailed Example For Constructing a Binary
Representation of the JOIN Relation
4. Constructing a BINARY REPRESENTATION of a GREATER THAN JOIN
F. DISPLAY/RECONSTRUCT For JOIN Operation
1. Example of the DISPLAY/RECONSTRUCT Operation For A JOIN Relation
VI. ENTITY USE Vectors
VII. Database Identification
A. Performing the Database Identification Scheme
Detailed Description
I.. Hardware Level of the Preferred Embodiments
FIGURE 1 depicts a computer system having a programmable computer and computer programs for creating a relational database and for processing operations on one or more relations (also called tables) of a relational database. The system includes programmable computer 2, display 3, entry device 11 for the computer and external device 12 such as a disk for storage of data. Hardware/software for representing the relations and hardware/software for performing relational operations are housed in a Relational Database Management System (RDMS) 10 (shown in phantom lines) , which is connected within the computer 2. The RDMS 10 coordinates the various activities related to representing relations in the relational database and to performing relational operations on one or more relations. Conventionally, RDMS 10 is a programmable computer on a printed circuit board which can be easily employed within most standard computers, including per¬ sonal, mini-, and mainframe computers. It is envisioned that RDMS 10 may be a special purpose computer formed by one or more integrated chips.
More particularly, referring to FIG. 1A, RDMS 10 includes an optional Binary Bit Vector Processor (BBVP) 14, an optional Bit Vector Encoder (BVE) 16, an optional Map Vector Processor (MVP) 15, an optional memory 18, a Relational Processing Unit (RPU) 22, including a Boolean Logic Unit (BLU) 24, and a Command Interpreter 28. When software programs for generating binary represented relations, for processing relational operations, and for coordinating data transfer between components are loaded into the RDMS 10, the RDMS 10 is formed and ready for processing operation. A detailed discussion of the specific components of the RDMS 10 is now presented. External device 12 is a permanent or buffered storage, typically in the form of a hard disk, for storing information used in relations which are represented in expanded form where each value is typically no smaller than a byte. The contents of the external device are typically maintained in records which are divided into fields where the n^1 field of each record corresponds to a specific type of data» The contents of the external device 12 is loaded via bus 30 to the RDMS 10 and specifically to RPU 22. The RPU instructs BBVP 14 to convert each relation stored on the external device into a binary representation (to be more thoroughly discussed in PART IV) . Bus 32 then transfers the binary representation of each relation to optional BVE 16. The BVE 16 compresses the binary representation, the techniques of which are discussed in a co-filed patent application to Glaser et al entitled "A BIT-STRING COMPRESSOR WITH BOOLEAN OPERATION PROCESSING CAPABILITY", having Serial No. 07/100,761, filed September 24, 1987, which is herein incorporated by reference. More particularly, the BVE 16, employed within the RDMS 10, evaluates uncompressed bit-string representations of each relation and separates the bit- strings into one or more "impulses." An impulse is a run, which is a string of one or more bits of a same binary value or a polarity (e.g., "O's" or "l's"), and an ending bit which has a polarity opposite the polarity of the run. Software programs executed by the bit- vector encoder encode the bit vectors into one of several different compressed impulse formats. The compressed bit vectors are then sent via bus 32 back to the BBVP and then in turn to the optional memory 18. Memory 18 may be a memory component included in the host computer such as an external device or a memory component included within the RDMS 10 (as shown in FIG. 1A) . Memory 18 holds the compressed binary representations of relations before processing the relations at the RPU 22 or stores the relations after processing at RPU 22. RPU 22 via the BBVP 14 performs relational type operations (e.g., SELECT, PROJECT, JOIN, INSERT, DELETE, etc.) . on one or more relations by processing unique software programs at the RPU 22 via a microcontroller (e.g., Intel 80386). If the relations are in the form of encoded bit strings, and if Boolean operations are required to be performed by the relational operation, then the Boolean operations can be performed by the hardware or software embodiments of the BLU 24 discussed in Glaser et al (Serial No. 07/100,761). Even though the relational operations in the preferred embodiment may be implemented primarily by software, the RPU 22 can perform relational operations more efficiently in terms of storage, speed, etc. , than presently known techniques for performing relational operations on one or more relations. At the hardware level, BLU 24 can take full advantage of the unique properties of the latest components, such as 32-bit microprocessors, CPU'S, etc.
The MVP 15 generates map vectors, called entity use vectors, and the map vectors are used by the RPU .22 to map each row of a column of a relation to a unique value of the relational database. The purpose of the entity use vectors is for facilitating the reconstruction and display of the information represented by the binary representations of the relations. The MVP 15 is optional because the RDMS 10 can reconstruct and display the relations without having entity use vectors. However, the entity use vectors are used to perform the DISPLAY/RECONSTRUCT process more efficiently. Although there are many ways by which the system could be interconnected with users or programs, the preferred embodiment contains a command interpreter 28 which interprets instructions for processing data in the relational database. The instructions used could be those found in Structured Query Language (SQL) which has become an industry standard language for enabling users to communicate with relational databases. The operation of the components in the RDMS 10 are controlled by software programs implemented by RPU 22. Data lines 30, 31, 41, 44, 46, 48, 50, and 52 of FIG. 1A depict the data flow between the various components of the RDMS 10. The command interpreter 28 need not be a part of the RDMS 10; instead, it could be loaded into the host computer 2. In the preferred embodiment, however, the command interpreter 28 is located within the RDMS 10.
In certain circumstances, relations stored in the external device 12 need to be updated by inserting or deleting values in the different domains, etc. The information can be updated by transferring data, via bus 30, to RPU 22, which then inserts or deletes necessary values, in the form of bytes, in the domain, etc., and restores the domain back to external device 12 via bus 30. This capability enables the RDMS 10 to add new and unique values to the relational database.
II. A Detailed Discussion on Relational Databases
The following discussion is an explanation of the relational database depicted in FIG. 2. FIG. 2 is an example of a relational database, namely, the "SUPPLIERS and PARTS" relational database, and it has been adopted from Date, An Introduction to Database Systems, 4th Ed. , Chapters 4 & 11 (1986) , which is incorporated herein by reference. FIG. 2 depicts three relations, a table for suppliers 63, a table for parts 65, and a table for ' shipments 67. "Relation" is just a mathematical term for the word "table", and these two words are used interchangeably in the specification.
Table 63 represents the information on the suppliers for a particular company. Each supplier has a Supplier ID column 80, a Supplier Name column 82, a Status column 84, and a City column 86, indicating where the supplier is located. As shown in table 63, each row of the table depicts information on a different supplier, and each column 80, 82, 84, 86 represents a different characteristic of each supplier. For example, row 69 of table 63 depicts supplier "SI" having the name "Smith" with a Status "20" and located in "London."
Table 65 represents the parts the suppliers may sell. Each part has a Part ID 88, a Part Name column 90, a Color column 92, a Weight column 94, and a City column 96. As shown by each row of the table, it is also assumed that the part only comes in one color and it is stored in a warehouse in exactly one city. For example, row 77 of table 65 corresponds to a single part having part having part ID "PI", a part name "NUT", a color "RED", a weight "12", and a location or city, "LONDON."
Table 67 represents the shipments of parts (table 65) made by suppliers (table 63) . Table 67 is really a connection between tables 63 and 65 (to be discussed in Part V) . The first row 87 of table 67 connects a specific supplier (SI) with a specific part (PI) ; stated differently, it represents a shipment of parts of kind PI, by supplier SI, and the shipment quantity is 300. Thus, each shipment is uniquely described by supplier ID, corresponding to column 98, a part ID, corresponding to column 100, and a quantity, corresponding to column 102. It is assumed that at most, one shipment at any given time for a given supplier can be made for a given part. For example, the combination of SI and PI having a quantity shipment of 300 at row 87 is unique with respect to the set of the shipments appearing in table 67. The supplier, part and quantity, together in each row of table 67, create a unique "entity". The Shipments table is a "relationship" between a particular supplier and a particular part. The "SUPPLIERS and PARTS" relational database (FIG. 2) describes, in reality, a very elementary database. Databases are likely to be much more involved, containing many more entities and relationships. This database, however, is sufficient to illustrate what a relational database is and to illustrate the novel features of the present invention.
A couple of properties regarding each relation are worth noting. First, each of the data values depicted in the tables 63, 65, 67 are "atomic". That is, at every row and column position in every table, 63, 65, 67, there is always exactly one data value, never a set of values. For example, in table 63, at row 69 and column 89, the status for the supplier Smith is a single value "20" and not a set of statuses. Also, note there are no links connecting one table to another table. In the example of FIG. 2, there is a relationship between the supplier row 69 of table 63 and the part row 77 of table 65, because supplier SI supplies PI as shown by the existence of row 87 of table 67, in which SI has sold 300 Pi's. However, there are no links extending from table 63 to table 65 to show this unique relationship. By contrast, in non-relational systems, this information is typically represented by some kind of "link" that is visible to the applications programmer. To this point, we have discussed a high level theoretical construct of how relational databases can be defined. Although relations at the external level in systems today create this construct for users to work with, the internal levels of the relational systems use a variety of structures, which are not in the form of relations or tables. The idea of a relational database construct only applies to the external levels of the relational system, and not to the internal level of a present-day relational database. Stated differently, the relational models as disclosed by the prior art (e.g., Date) represent database systems "at a level of abstraction that is somewhat removed from the details in the underlying machine." Whereas in the present invention, although the relational database at the external level remains the same, the internal level of the relational database is actually depicted in the form of columns, as discussed in PART IV.
Referring to FIG. 2 at 60, a set of domains is depicted. A domain is a set of unique values, and each domain has only one characteristic. For example, the domain for supplier identifiers 66 is the set of all possible unique supplier identifiers which are referenced in the system. Likewise, the domain for person names 70 is the set of all unique names, the set of numbers 78 is the set of all integers greater than 0 and less than 10,000 (for example). Domains are pools of unique values, from which actual values appearing in the columns of the tables 63, 65 and 67 are drawn. Typically, there will be values in a given domain that do not concurrently appear in one of the columns of one of the relations. For example, the value S8 or S10 may appear in the domain of supplier identifiers 66, but note that no supplier S8 or S10 actually appears in relation 63. They may appear in some other column of some other relation. Furthermore, each column of a relation corresponds to one of the domains. For example, supplier identifier domain 66 corresponds to column 80 of table 63. When a particular relation is ready to be taken from the external device 12, RPU 22 acts as a file manager to retrieve each of the records associated with the particular columns. The columns will then be sent via bus 30 to the RDMS 10 or for further processing. In reality, the RDMS 10's view of the database on the external device 12 is a collection of stored columns (in byte form) , and that view is supported by a file manager and disk manager (not shown) which may or may not be a part of the RDMS system. Assuming that the RDMS 10 system via RPU 22 summons a relation stored in the external device 12 over the bus 30, the relation will be converted into a binary representation via the BBVP. This unique tabular representation of the relation is one aspect of the invention and it will be thoroughly discussed in part IV. However, before proceeding to the binary representation model of a relation, some background regarding bit-vector technology must be discussed. Part III is devoted entirely to creating a fundamental framework in which a binary representation of a relation can be developed.
III. Binary Bit-Vector Technology
One aspect of the present invention is the representation of a relational database by bit strings which do not contain the raw data values of the relational database. Instead, each value of the database is represented by a single binary bit within an ordered set of binary bits. To more fully understand how these concepts are achieved, a detailed discussion is presented regarding binary bit-vector charac¬ terizations of sets. (For an excellent discussion on this matter see "Elements of Set Theory," Academic Press 1977, by Enderton, which is herein incorporated by reference.)
The basic building block for representing a relation of a relational database, in the present invention, relies on the concept of ordered sets. Ordered sets consist of ordered pairs, <M, X>, where each ordered pair represents an element of a set, where M is a symbol that defines the location or position of the value X within the set. In the preferred embodiment of the invention, M is a non-negative integer. Stated differently, every element of the ordered set constitutes a function between the integer describing the ordinal position M and the corresponding value X in the set. Typically, the value M increases by one for each forward progression of one element in the set. An ordered set with unique values can be defined as a set having elements X and Y which have ordinal positions M and N, respectively, and for all <M, X>, <N, Y> in the set, M is equal to N if, and only if, X is equal to Y. Two elements of an ordered set can only be equal when they are at the same ordinal position in the set, as all ordinal positions in the ordered set are unique. The ordering of values within ordered sets (ordering rule) , namely, the way in which each value matches an ordinal position, is arbitrary. It depends on the system's or user's choices and requirements. A "vector" consists of an ordered set of elements.
However, the ordering of values (a value X of the ordered pair <M, X>, where M gives the ordinal position) is implied by the position in the vector. Thus, a vector consisting of a set "S" of elements is a one- dimensional array, where each value corresponds to a vector element whose ordinal position over the vector is implied. More particularly, a vector may define an ordering to a set of elements in an ordered or non- ordered set. Given the ordered set "S" 148 of FIG. 3A, a "binary-bit vector" a can uniquely define a subset "A" (FIG. 3D) of "S" 148 (FIG. 3A) , by representing the elements of "S" in "A". Only two data values are required for representing the elements of "S" in subset "A". More precisely, each data value must represent whether an element of set "S" is in subset "A" or is not in subset "A" and, thus, a binary bit string, where each bit corresponds to an ordinal position of set "S", can represent subset "A". In the preferred embodiment, binary bit-vectors contain l's and O's, a "1" indicating a set element is present, and a "0" indicating that it is not present. Each position of each binary bit within a binary bit-vector corresponds to an ordinal position of an element in subset "A" (FIG. 3A) . Thus, a binary bit-value, combined with its implied ordinal position within the vector, is an "element" of the vector. The presence, of a "1" bit in the vector is equivalent to representing a value of the ordered set by only its ordinal position. In this way, representations of data can be operated on without operating on the data values themselves.
As mentioned above, binary bit-vectors characterize a set by identifying which elements of the set exist in a subset; where each binary "1" bit corresponds to an ordinal position of the set. For example, given the ordered set of states, (<1, California>) , (<2, Colorado>) , (<3, Hawaii>) , (<4, Maine>) , (<5, New Mexico>) , (<6, New York>) , (<7, Oregon>) , (<8, Texas>) , we can characterize all elements in this set by the binary bit-vector a' = (11111111) (FIG. 3E) . Likewise, to represent only a subset of set "S", for example, the subset "A" (FIG. 3B) , containing (<2, Colorado>) , (<3, Hawaii>) , (<5, New Mexico>) , (<8, Texas>) , only the binary bit vector a = (01101001) (FIG. 3D) is necessary. Lastly, to represent an empty subset (FIG. 3C) of set "S", only the bit-vector a11 = (00000000) (FIG. 3G) is needed.
Assuming that a relational database comprises ordered sets, binary bit-vector analysis discussed above can be used to represent relations of the relational database. A more detailed discussion on the procedure in the "Binary Representation of a Relational Database" is discussed in PART IV of the specification.
IV. Binary Representation of a Relational Database FIG. 2 depicts a relational database for suppliers, parts and shipments. As stated in PART II, the relational database consists of tables 63, 65, and 67, which depict the Suppliers, Parts and Shipments relations, respectively. In addition, domain 60 contains the unique values which are referenced by each of the tables 63, 65 and 67. As shown, the domain 60 contains unique values for supplier identifiers 66, part identifiers 68, person names 70, part names 72, cities 74, colors 76, and numbers 78. One aspect of the present invention is to create a binary representation of a relational data base, such as the one shown in "FIG. 2. Specifically, the binary- representation for the relational database shown in FIG. 2 is shown in FIGS. 4, 5 and 6. More particularly, FIG. 4 shows the binary representation of the suppliers table 63 (FIG. 2) , FIG. 5 is a binary representation of the parts table 65 and FIG. 6 is a binary representation of the Shipments table 67 (FIG. 2) .
FIG. 4 can be broken up into two portions as shown. The top portion of the figure represents the value sets or domains associated with the suppliers relation 63 of the relational database in FIG. 2. More particularly, domain 160 (FIG. 4) is associated with suppliers identifier domain 66 of FIG. 2. Domain 162 corresponds to the Person Names domain 70 of FIG. 2. Domain 164 corresponds to the Numbers domain 78 (FIG. 2) , and domain 166 corresponds to the Cities domain 74 of FIG. 2. The values of domains 160, 162, 164 and 166 have been arbitrarily chosen and formed in particular orders of occurrence, namely, lexical ordering for domains 162 and 166 and numerical ordering for domains 160 and 164. A subset of each of the domain 160, 162, 164 and 166 is represented by corresponding binary bit-vectors 176, 178, 180, and 182. Each binary bit-vector, which is used for the purpose of characterizing a domain, is called herein an entity select vector. In other words, the binary bit-vector is used to represent a selection (subset) of entities from a set of entities, where an entity is defined to be a unique value or grouping of values. For example, an entity may be a value in a value set or a row in a relation.
The entity select vector has a quantity of binary bits equivalent to the quantity of unique values in a given domain. For example, domain 160 for supplier identifiers contains ten supplier identifiers for ten corresponding binary bits. Each binary bit of the entity select vector has a position which directly corresponds to the position of each of the unique values in the supplier identifier domain 160. Additionally, each of the binary bits represents the presence or absence of each of the unique values in the subsets represented by the entity select vector 176. More particularly, the binary bit 236 of entity select vector 176 corresponds to the unique value SI at 216 of domain 160. The binary bit 236 has a value of "1" which indicates that the unique value SI is present in the subset represented by entity select vector 176. Likewise, binary bits 238, 240, 242 and 244 also indicate that the unique values S2, S3, S4, and S5 are present in the subset. Binary bits 246, 248, 250, 252 and 254 are set to "0", indicating that unique values S6 226, S7 228, S8 230, S9 232, and S10 234, respectively, are not in the subset. The binary bit vector represen¬ tation of the unique values of the domain 160 is a short-hand way of representing a subset of the domain 160 by indicating those unique values which are present in the subset. This aspect of the present invention is an important one. By only representing a subset of a domain, all values of a domain need hot be associated with a column. More particularly, only the entity select vector which corresponds to a particular column of a relation, and not the domain, needs to be associated with the column.
For each domain containing unique values in the Suppliers relation, a different entity select vector
176, 178, 180, and 182 is used to represent subsets of the domains. Each subset representation is directly associated with a column of the relation. More particularly, entity select vector 176 is associated with column 168, entity select vector 178 is associated with column 170, entity select vector 180 is associated with column 172, and entity select vector 182 is associated with column 174. Columns 168, 170, 172 and 174 correspond to the columns of the suppliers relation 63 Of FIG. 2.
Referring to the set of binary bit-vectors at 260, each binary bit-vector 184, 186, 188, 190 and 192 corresponds to one of the values indicated to be present in the subset by the entity select vector 176. In the preferred embodiment, the binary bit-vector is called a row use vector because it indicates the presence or absence of a unique value in one or more rows of a column in the relation. The row use vectors are combined into a row use set 260 to form the binary representation of the column 168. The order of row use vectors occurs in the same order as the order of values characterized by the entity select vector. More particularly, the lower order binary bits of the entity select vector have been arbitrarily assigned to correspond to the left most row use vectors of the row use set, and the higher order binary bits of the entity select vector correspond to the right most row use vectors of the row use set. The implied mapping between the bits set to "1" of the entity select vector could have just as. easily been reversed. For example, the lower order binary bits may correspond to the right-most row use vectors of the row use set.
Binary bit 236 of the entity select vector 176 corresponds to row use vector 184 of the row use set 260. Likewise, binary bit 238 of the entity select vector 176 corresponds to the next row use vector 186 of the row-use set 260. The system interpreting the entity select vectors and the row use set is pre-programmed so that the binary bits indicate unique values present in the subset. The implied ordering scheme is illustrated by the dotted lines 268, 270, 272, 274 and 275, which show that the binary bits in entity select vector 176 correspond to the row use vectors 184, 186, 188, 190 and 192, respectively. Likewise, the binary bits in the entity select vector 180, which represent the unique values "10'.', "20", and "30", are mapped in an implied manner to the row use vectors 204, 206 and 208, respectively.
It is important to note that each row use vector can indicate the presence or absence of a unique value in one or more rows of a column of the relation. For example, in row use vector 206, corresponding to the unique value "20", "1" bits indicate the presence of the value "20" in the first and fourth rows of column 172. Because the binary representation of each of the columns is represented by an entity select vector to unique values, the columns 168, 170, 172 and 174 of the relation need not be represented as a set of actual values in the database memory. Only the row use sets and the corresponding entity select vectors need be stored in the memory of RDMS 10 (e.g., memory 18, FIG. 2) . The actual values are stored in a more permanent memory such as a hard disk (e.g., external device 12, FIG. 2) . Thus, for each relation in the database, a series of entity select vectors representing subsets of the various domains pertaining to the particular relation, are stored along with their corresponding row use sets which depict the columns of the relation. FIG. 5 and FIG. 6 represent the entity select vectors and their corresponding row use sets for the Parts Relation 65 (FIG. 2) and the Shipments relation' 67 (FIG. 2) .
Referring to FIG. 5, domains 282, 286, 290, 294 and 298 correspond to the domains for part identifiers 68 (FIG. 2) , part names 72 (FIG. 2) , colors 76 (FIG. 2) , numbers 78 (FIG. 2) , and cities 74 (FIG. 2) respectively. Note that the numbers domain 294 and the city domain 298 of the Parts relation (FIG. 5) are identical to the numbers domain 164 and the city domain 166 of the suppliers relation (FIG. 4) . An efficiency of the present invention is that the unique values of the domains need only be represented once. In other words, the unique values in the numbers domain 164 (FIG.
4) and the numbers domain 294 (FIG. 5) are the same
'domain of unique values of numbers. Likewise, the city domain 166 (FIG. 4) and the city domain 298 ,(FIG. 5) are the same domain of unique values of cities. The representation of the subsets for these domains is different. For example, the entity select vector 180 (FIG. 5) represents that the unique values "10", "20" and "30" of the numbers domain are in the relation of suppliers (FIG. 4), whereas the unique values "12", "14", "17" and "19" are in the relation for parts (entity select vector 296, FIG. 5) . Thus, instead of having to store two complete sets of unique values 164 and 294, which are identical, the system only stores one version of the numbers domain and two entity select vectors, namely, 180 and 296, to represent subsets of the same domain. Likewise, the system need not retain two unique sets of values for cities, 166 and 298, instead, the system maintains only one version of the city domain and two entity select vectors, 182 and 300, to identify the subsets of cities in the associated relations. In fact, any number of entity select vectors may be associated with a particular domain. For example, in FIG. 6, the domain for numbers 328 is also referenced; however, the entity select vector 330 depicts a different subset of unique values from the entity select vectors 180 (FIG. 4) and 296 (FIG. 5) . Entity select vectors are efficient subset representations of domains because the same domain need not be represented more than once and the actual values of each domain can be uniquely represented by the entity select vector's binary bits.
The columns of the Parts relation 302, 304, 308, 312, 316 (FIG. 5) are represented in the relational database by the row use sets 304, 306, 310, 314, 318, respectively. As in the description for the suppliers relation (FIG. 4) , the row use sets contain row use vectors which are each associated with a unique value indicated to be present in a subset of a domain by an entity select vector. Likewise, FIG. 6 depicts a binary representation for shipments.
The domains associated with each of the relations with their unique sets of values are stored on the external . device 12 (FIG. 1A) , and the binary representation of each of the relations (FIGS. 4, 5 and 6) which make up the relational database of FIG. 2 are stored in memory 18 of the RDMS 10 (FIG. 1A) . When the system performs a relational operation on the relational database, only the binary representation of the relations need be referred to. A detailed discussion of this process will follow shortly.
FIG. 7 is a flow block diagram of the steps which are performed by the RDMS 10 for creating a binary representation of a relational database.
Referring to FIG. 7, a detailed discussion of the BINARY REPRESENTATION routine is now provided. Specifically, during block 344, each domain necessary for specifying unique values in the relations of the relational database is identified. Input commands, are interpreted by the command interpreter 28 (FIG. 1A) for creating domain identifiers in the memory 18 (FIG. 1A) of the RDMS 10 (FIG. 1A) . Additionally, the system reads each of the instructions above, and an empty value set is created for each of the domains listed in the instruction. The actual unique values associated with each domain are loaded into memory in a later step of this routine. During block 346, the system determines whether any more domains for the database need to be specified.
Assuming that all the domains to be referenced by the database are specified, the system, during block 348, identifies a table of the relational database, and the RDMS 10 builds an entry in the system relation (see Part VI) . During block 350, the system identifies a column, as with the table identified in block 348. In addition, the RDMS 10 builds on entries in the system relation identifying each column, and the RPU 22 via the BBVP 14 creates empty entity select vectors associated with each of the domains. The entity select vectors will have their binary bits set when the columns of the particular relations have their values loaded into the system. During block 352, it is decided whether to identify another column for the particular table iden- tified in block 348. Processing continues at blocks 350 and 352 until all of the relational columns for the particular table identified during block 348 have been identified. During block 354, the system asks for the next table or relation of the database. Assuming that a number of different tables exist in the relational database, blocks 348, 350, 352 and 354 are processed until all of the tables and their related columns have been specified. During blocks 348, 350, 352 and 354, system identifiers, identifying each column of each relation, are constructed. These identifiers are discussed more thoroughly in part VII.
During block 356, the system loads a file associated with each table of the relational database. The file of the table is assumed to exist in the external device 12 of the system depicted in FIG. 1. When the relational database has been completed, the system loads the file representations of the relations into external device 12, where they reside until summoned by the RDMS system 10. During block 358, RPU 22 instructs the external device 12 to transfer the first column associated with a particular table of the relational database via bus 30 to the RPU 22. The particular column is retrieved by referring to certain system identifiers (i.e., AID, RID identifiers, Part VII) . When the first value of a particular column is brought into the RPU 22, a row use vector associated with the value is built by the BBVP 14. Additionally, a bit is set to "1" in the first position of the row -use vector to indicate that the value occupies the first row of the first column of the relation. For each unique value of the column, a new row use vector is created and a binary bit associated with a particular row of the column, where the unique value resides, is set to "1". Additionally, as each unique value of a particular column is brought into the RPU 22, the binary bits of the entity select vector, associated with the column, are set to "1" to indicate the unique values of the domain referenced in the column.
If the same value appears in a second, third or fourth row, etc. of the column, the corresponding bits of the row use vector associated with the unique value are set to "1" indicating the presence of the unique value in these rows of the column. Because the entity select vector has previously had the binary bit associated with the unique value set to "1", it need not be set again. This process of converting the column into a row use set continues until all of the values of the column have been represented by binary bits. When the row use set for a column is completely constructed, the RPU 22 commands the external device 12, to bring the next column, via bus 30, into the RDMS 10. This process continues for all of the columns associated with a particular table until the row use sets for the table are completely constructed. During block 360, the system determines if any other tables need to be binary represented by the RDMS 10, assuming a single input file corresponds to a single table.' If other tables are to be constructed, then blocks 356, 358 and 360 are performed until all the columns of the next relation are represented in their binary form. After all of the binary bit vectors (i.e., row use vectors and entity select vectors) are constructed, the RPU 22 may summon the bit vectors to bit vector encoder BVE 16 of the RDMS 10 to encode the bit vectors into compressed impulse format. The steps for compressing the bit vectors into compressed impulse formats are thoroughly discussed in Glaser et al. RPU 22 then instructs the bit vectors to be sent via bus 48 to the memory 18 where they are stored. Eventually, all of the relations of the relational database stored in external device 12 will be converted into the binary represented form, and this form is stored in the memory 18 of the RDMS 10 in the compressed impulse format or in the uncompressed form. During block 362, the system returns to the program which called the BINARY REPRESENTATION routine (FIG. 7) .
Once the binary representation of the relations is completed, the relations can be updated by two utility functions called INSERT and DELETE, which will be discussed in PART V of the specification. The INSERT function, perhaps with appropriate modification to enhance efficiency, can also be used to load additional rows into the relation. Additionally, the relations in their binary represented form, can be manipulated via the relational operations SELECT, JOIN and PROJECT, also to be discussed in PART V.
A. Example of Generating a Binary Representation of a Relation Referring to FIGS. 1, 1A, 2, 4, 5, 6, 7, 8A, 8B,
9A, 9B and 9C, a detailed example for generating a binary representation of a relation is now discussed. Referring to FIG. 2, a hypothetical relational database is shown. After the BINARY REPRESENTATION routine (FIG. 7) is performed by RPU 22 (FIG. ' 1A) , a binary represented database is formed as shown in FIGS. 4, 5 and 6. This example has been designed to emphasize the steps of the BINARY REPRESENTATION routine for constructing the binary representation of a relational database in FIGS. 4, 5 and 6. In this example, the assumption is made that the reader understands instruction formats. For background in this area, please refer to Date, Introduction to Database Systems, Vol. 1, 4th Ed., 100-107 (1986), which is herein incorporated by reference.
Referring to FIG. 2, the system of FIG. 1 creates a relational database as shown in the tables 63, 65 and 67 (FIG. 2) . These tables are constructed as inputs to the system. Once the input process is complete, the system calls the BINARY REPRESENTATION routine (FIG. 7) . The system, during block 344, creates the domains referenced by the relational database. Input instructions, specify the following domains; suppliers identifiers 66, the parts identifier 68, person names 70, part names 72, city names 74, colors 76, and numbers 78 (FIG. 2) . The input' instructions in pseudo code look like:
(A) Create Domain Supplier Identifiers;
(B) Create Domain Parts Identifiers;
(C) Create Domain Person Names; (D) Create Domain Part Names;
(E) Create Domain City;
(F) Create Domain Colors;
(G) Create Domain Numbers.
As the system reads each of the instructions above, an empty value set is created for each of the domains listed. The creation of the empty value set for the various domains specified in the above instructions are depicted in the results table of FIGS. 8A and 8B. Each row of the results table represents a new value set created as a result of one of the input commands. When the command interpreter 28 of the RDMS 10
(FIG. 1A) interprets the instruction (A) in block 344, RPU 22 creates an empty value set (364, FIG. 8A) . Next, in block 346, the system determines whether there are any other domains to be identified by the system. In fact, there are other domains to be identified as specified by the instructions listed above; thus, block 344 is called. During block 344, the command interpreter 28 evaluates statement B and RPU 22 creates an empty value set to reference the domain of part identifiers (366, FIG. 8A) . In block 346, the system determines that there are still more domains to be identified, and thus, block 344 is called. During block 344, the system interprets the next instruction (C) , which is for the "person names" domain. The RPU 22 (FIG. 1A) generates an empty value set (368, FIG. 8A) . BBVP sets all the vectors to "0" to indicate the empty entity select vector (368, FIG. 8A) . In block 346, the system determines that there are still more instructions for identifying domains. Processing continues at block 344, and an empty value set for part names is generated (370, FIG. 8A) . Block 346 determines that there are still more domain instructions to be processed and processing continues at block 344. During block 344, an empty value set corresponding to the city domain is created (372, FIG. 8A) . During block 346, the system determines that there are still more commands for identifying domains, so processing returns to block 344. During block 344, the system generates an empty value set corresponding to the domain "colors" (374, FIG. 8B) . Block 346 determines that there is one more instruction left (G) for creating a domain, and thus, block 344 is called. During block 344, the RPU 22 creates an empty value set for the "numbers" domain (376, FIG. 8B) . When processing is completed for all of the commands illustrated above, processing continues at block 348, during which a table to be created in the relational database is identified. Specifically, the system is presented with the following command:
CREATE TABLE SUPPLIER (SUPPLIERS ID; PERSON NAME; STATUS; CITY) ; which indicates that a table for suppliers is to be identified; specifically, an identifier for the suppliers table is stored in memory 18 of the RDMS 10. Then, during block 350, the system generates an identifier for the first column of the relation, which is also stored in memory 18. Specifically, it interprets the command above, and a suppliers ID identifier is stored in memory. (Identifiers to be discussed in PART VI). During block 352, the system determines that the command requires that other columns be identified for the relation, so processing continues at block 350. The system identifies the column for "person names" and stores an identifier indicating such in the memory 18. Additionally, an empty entity select vector is created, which is associated with the "person names" domain. In block 352, the system determines that there are still other columns to be identified, and thus, processing continues in block 350. In block 350, the system stores an identifier for the next column associated with "status" into the memory 18. Also, an empty entity select vector associated with the "status" domain is created. In block 352, the system determines that there is still a column remaining to be identified; specifically, during block 350, the system stores an identifier for the column associated with "city". During block 350, an empty entity select vector associated with the "city" domain is created. During block 352, the system determines that no more columns need be identified, and thus, processing continues at block 354, which determines whether there are any more tables to be identified for the relational database.
The following instructions can be specified:
CREATE TABLE PARTS (PART ID; PART NAME; COLOR; WEIGHT; CITY) ;
CREATE TABLE SHIPMENT (SUPPLIER; PART ID; QUANTITY) ;
which are for identifying the relations and their columns for both the PARTS and SHIPMENTS tables. For purposes of this example, however, we will assume that the system processes blocks 348, 350, 352 and 354 to properly store the identifiers in the memory 18 for both the PARTS and SHIPMENTS relations, and to construct empty entity select vectors for the columns listed in the instructions above.
Assuming that all the identifiers for the tables, domains and columns have been specified, the RPU 22 during block 356, instructs the external device 12 to download the files associated with each table of the relational database via bus 30 to the RDMS 10 (FIG. 1A) . Specifically, during block 358, the byte values of each column of the relation are sent to the RPU 22 and the appropriate vectors of the row use sets are built via BBVP 14, and the binary bits of the entity select vectors are set there as well. The following section is a detailed discussion on the construction of the row use vectors, entity select vectors, and value sets of the relational database in FIG. 2.
B. Example of Building a Binary Represented Relation
FIGS. 9A, B-and C are results tables depicting the formation of the binary representation of the suppliers relation of FIG. 4. Each row of the FIGS. 9A, B and C depicts an additional row use vector associated with one column in the relation.
Referring to row 378 of the results table (FIG. 9A) , the first value SI (69, 80 FIG. 2) is inserted into the first ordinal position of the value set for suppliers identifiers. Additionally, the first binary bit of the entity select vector 417 associated with the "suppliers identifier" domain is inserted and set to "1" to indicate that the unique value SI is referenced in the suppliers ID column. The column for the suppliers identifier requires a new row use vector 381 to be generated nd the first bit of the row use vector 381 is set to "1" to indicate that SI occupies the first row of the column. The process of inserting bits into a row use vector, creating row use vectors, and setting bits in an entity select vector is controlled by a routine called INSERT which will be discussed more thoroughly in part V of this specification. This process is repeated for all the values S^ to S10 in the value set for supplier identifiers, as indicated in rows 380-386 of
FIG. 9A. A new row use vector is added for each value present in the subset, as indicated at 383-389, and a new bit is added to the entity select vector 417. Additionally, five more bits, set to "0", are added to the ordinal positions of the entity select vector corresponding to the new values of the value set. Referring to FIG. 4, the row use set 260 and the entity select vector 176 have now been generated by the BBVP 14 via the BINARY REPRESENTATION routine (FIG. 7) . The next column of the suppliers table is evaluated by the BBVP 14. The next column in the relation is the "person names" column (82, FIG. 2) , and the first value of the "person names" column is Smith (69, 82, FIG. 2 ) . The value Smith is inserted to the first position of the "person names" value set. Additionally, the entity select vector 419 associated with "person names" has a first binary bit inserted and set to "1" to indicate that the unique value for Smith is referenced in the column for person names. A row use vector for Smith has not been created, and thus, a new row use vector 401 is generated (388, FIG. 9A) . This process is repeated, as indicated in rows 390-396, for the next four names added to the "names" value set, with a new row use vector added for each name, as indicated at 403-409, and a "S" bit added to the entity select vector 419.
The order of the row use vectors 409, 405, 407, 403 and 401 corresponding to Adams, Blake, Clark, Jones and Smith is in the order of occurrence of the binary bits set "1" in the entity select vector 419. The values Baker, Fabel, Rahn, Ross and Young are added to the "person names" value set. Specifically, the values are added in an order corresponding to the ordering (396, FIG. 9B) . Binary bits set to "0" are also inserted into entity select vector 414 in the ordinal positions corresponding to the newly added unique values. The new binary bits are set to "0" to indicate the remaining values do not occupy the column. The row use vectors 409, 407, 405, 403 and 401 correspond to the row use set 262 of FIG. 4 and row use vector 262 corresponds to entity select vector 178 of FIG. 4, which is the same as the entity select vector 419.
The next column of the input suppliers relation is for the "status" (84, FIG. 2), and the first value of the column is "20" (64, 84 FIG. 2) . The value "20" is inserted into the first position of the value set for status numbers 421. Additionally, a first bit is inserted to entity select vector 421 and set to "1", indicating that unique value "20" is referenced in the relational column. A new row use vector 411 is created, and the first binary bit of the row use vector is set to "1" indicating the presence of value "20" in the first row of the column (398, FIG. 9B) . The process is repeated, as indicated in rows 400-406 of FIG. 9B, for each value in "status" set, resulting in row use vectors 411-415.
Note that at row 404 the next value in the column (73A, 84, FIG. 2) is 20. The value 20 already exists in the value set for numbers, and thus, it need not be added again. Additionally, the entity select vector 421, for numbers, need not be set because a binary bit corresponding to unique value 20 has already been set to "1" in a previous step. This value also corresponds to an already existing row use vector 411. Binary bits set to "0" are added to row use vectors 413 and 415 to indicate that the values 10 and 30 do not occupy the fourth row of the relational column for status. A binary bit set to "1" is added to the row use vector 411 to indicate that the unique value 20 is also in the fourth position of the relational column for status. The last column of the relation for suppliers is the "cities" column. The first value of the "cities" column is London (69, 89 FIG. 2) . The value London is placed into the first position of the value set for "cities". Additionally, the entity select vector 423 as with the names of cities has a first binary bit inserted and set to "1" to indicate that the city London is referenced in the relational column for cities. A row use vector 417 is created and a binary bit set to "1" is added to the row use vector to indicate that London occupies the first row of the column for cities (408, FIG. 9C) , "Paris" is added (row 410) in the same way. Since "Paris" and "London" occur twice, the row use vectors have bits added, as indicated, at rows 412 and 114. The last value in the column for cities is Athens
(75, 86, FIG. 2) . The value Athens does not exist in the cities value set, and so it is added. Specifically, Athens is placed into the first position of the value set corresponding to the lexical ordering of the city names. The entity select vector 423 for cities has a third binary bit inserted and set to "1" in the first ordinal position of the entity select vector to indicate that Athens is referenced in the column for cities. A new row use vector 421 is added to the row use set. Binary bits set to "0" are added to the row use vectors 417 and 419 to indicate that the unique values London and Paris do not occupy the fifth row of the column. The new row use vector contains five bits and the fifth bit is set to "1" to indicate that the fifth row of the column contains the value Athens. The values Cleveland, Fresno, Harrisburg, Los Angeles, New York, Rome, and San Francisco are added to the value set from the input column. The values are arranged in the value set according to a lexical ordering. For each additional value, a corresponding binary bit set to "0" is inserted at a corresponding ordinal position of the entity select vector. The binary bits set to "0" indicate that the values are not referenced by the column (418, FIG. 9C) . The row use vectors 421, 417 and 419, corresponding to Athens, London and Paris, are in the order of occurrence corresponding to the binary bits set in the entity select vector 423. Additionally, the row use vectors 421, 417 and 419 correspond to the row use set 266 of FIG. 4, and entity select vector 423 is associated with the entity select vector 182 (FIG. 4) .
Referring to FIG. 7, when all of the relations of the relational database have been generated, processing continues at decision block 360 in which the RPU 22 determines whether there are more files associated with the PARTS and the SHIPMENTS tables of the relational database. Processing continues at blocks 356 and 358 until the binary • representation for the PARTS and SHIPMENTS tables are constructed (418, FIG. 9C) . The binary representations for the PARTS and SHIPMENTS tables are generated in the same fashion as the SUPPLIERS table (FIGS. 9A, 9B and 9C) discussed above. Processing returns during 362 to the calling routine of the.BINARY REPRESENTATION routine (FIG. 7) .
V. Operations Performed on Binary Representations of Relations
FIGS. 10A, 10B, IOC, 10D, 11A, 11B, 11C, 11D, 12, 13, 14, 15A and 15B, 16, 17, 18, 19, 20, 21, 22A, 22B, 22C, 22D, 22E, 22F, and 22G depict flowcharts of operations performed on relations in their binary represented form. Specifically, FIGS. 5A, B, C and D are flow diagrams of the utility function called INSERT. FIGS. 11A, B, C and D are flow diagrams of the function DELETE. FIG. 14 is a flow diagram of the relation operation called SELECT. FIG. 16 is a flow diagram of the relational operation called PROJECT, and FIGS. 22A, 22B, 22C, 22D, 22E, 22F, and 22G are flow block diagrams of the operation JOIN. The functions INSERT and DELETE are basically for maintaining and manipulating data within the relations. The relational operations, SELECT, PROJECT and JOIN are for generating resultant relations, and in the preferred embodiment, in a binary represented form. Only three relational operations SELECT, PROJECT and JOIN are discussed in order to simplify this disclosure and to provide a basic understanding on how relational operations are preformed on binary representations of relations. For example, the operations PRODUCT, UNION, INTERSECTION, DIFFERENCE and DIVIDE, which are described in Date, "An Introduction' To Database Systems," Vol. 1. (4th ed. , 1986) , and which is herein incorporated by reference, are not discussed in this specification, due to their complexity. However, one skilled in the art will readily understand their operation on binary representations of relations after reading the section on SELECT, PROJECT and JOIN. This part of the specification is broken up into five subsections, each dealing with a separate operation as discussed above. For purposes of this example, it is assumed that one or more relations have been loaded into the RDMS 10. Here the relations are encoded by the RPU 22 via BBVP 14 into the binary representations of the relations. Then the binary represented relations are either sent directly to memory 18 for storage, or they are sent to the BVE 16 where the bit strings of the binary representations are encoded into compressed impulse formats as discussed in Glaser et al. The resulting compressed bit strings are then stored to memory 18. They stay in memory 18 until a request to perform a relational operation (i.e., INSERT, DELETE, SELECT, PROJECT, JOIN) is initiated and is interpreted at the command interpreter 28. In their uncompressed form, the binary representations of the relations of the rela- tional data base stored in memory 18 are shown in FIGS. 4, 5 and 6. It is also assumed that the bit vectors of the binary represented relation could also be in compressed impulse formats. However, for ease of understanding, the bit vectors are processed in the uncompressed form. When the RPU 22 is ready for processing, the binary represented relation(s) are brought to the RPU 22 via buses 48 and 31. Here the specified relational operation is performed on the relation(s) . If any Boolean operations need be performed by the relational operation, then the steps for processing compressed bit strings as set forth in Glaser et al. are preformed by the BLU 24. Once processing is completed at the RPU 22, the RPU 22 outputs a new binary represented relation. The new output relation is sent to memory buses 31 and 48, where the resultant relation resides until it is sent back to RPU 22 for further processing.
A. INSERT INSERT is a function which adds one value at a time to a relation. The necessity for performing an INSERT operation occurs in three categories of cases. First, a unique value needs to be added to a domain or value set, and it needs to be added to a column of a relation. Second, a unique value already exists in the value set, and it needs to be added into a column of a relation. Third, a value already exists in a column and it needs to be added again to the column. Multiple values may be added to a value set or to a column; however, the INSERT subroutines must be processed more than once corresponding to each time a value is added. In all of the situations above, an assumption is made that the binary representation of the one or more relations to perform the INSERT are in RPU 22 ready for processing. Additionally, the function INSERT can be used to add values to more than one column of a relation. The INSERT function is separately performed each time a value is added to the relation.
The flow diagrams in FIGS. 10A, B, C and D depict routines for processing any one of the three situations
- discussed above. Specifically, the flow diagram in FIG.
10A is a routine for adding a unique value to a value set, and adding the unique value to a column of a relation; the routine is called INSERT. FIG. 10B is a flow diagram of a subroutine for adding a unique value to the value set, and this routine is called INSERT VALUE INTO VALUE SET. FIG. IOC is a routine for updating an entity select vector to reflect the addition of a unique value into a particular subset; this routine is called UPDATE SUBSET. FIG. 10D is a routine for updating a column of a relation with a new occurrence of a value; this routine is called ADD VALUE TO COLUMN. The routine in FIG. 10D would by itself be used to insert a value into a column of a relation when the value already existed in a column. The flow diagrams of FIG. IOC and 10D are combined for inserting a value already existing in the value set into a subset of values of the value set and adding the value to a column of a relation. Referring to FIG. 10A, a more detailed description of the INSERT routine is now discussed. At block 422, the system calls the INSERT VALUE INTO VALUE SET routine (FIG. 10B) to add a unique value to the value set. Once the new unique value has been added to the value set, block 424 is called to call the subroutine UPDATE SUBSET (FIG. IOC) . This routine updates an entity select vector corresponding to the value set so that the unique value is represented in an associated subset. During block 426, the system performs the ADD VALUE TO COLUMN subroutine (FIG. 10D) to add the value into a specified column of the relation. Processing is completed; a value is added to the value set and to a column and the system returns at block 428 to the calling program.
Referring to FIG. 10B, a more detailed description of the INSERT VALUE INTO VALUE SET routine is now discussed. In block 430, the RPU 22 via the BBVP 14 determines the ordinal position in the value -set at which the new, unique value is to be inserted. Essentially, the value set is stored in a structure which is traversed to find the new value. The structure contains all of the unique values presently stored in the value set, and is built as such to minimize access time for finding values. The system locates a node of the structure, which corresponds to the ordinal position at which the new, unique value should be placed. During block 432, the system determines whether the value already exists. If the value already exists, then during block 434, the system returns to the calling program. Assuming that the value is not in the struc- ture, then the new value is added to the value set by adding a node assigning an already existing node in the structure to incorporate the unique value. During block 438, processing returns to the calling program.
Referring to FIG. IOC, a more detailed discussion of the UPDATE SUBSET routine is now discussed. In block 440, the RPU 22 via the BBVP 14 determines the ordinal position of the entity select vector which corresponds to the ordinal position of the unique value in the value set. In decision block 441, the RPU 22 determines whether the unique value has been added to the value set. If the unique value has been added, block 442 is called. During block 442, the system inserts a bit to the entity select vector at the ordinal position corresponding to the new unique value. Processing continues at block 444. During block 444, the new bit added to the entity select vector is set to "1" to indicate the unique value in the subset. Returning to block 441, if the unique value already exists, block 443 is called. During block 443, the RPU 22 via the BBVP 14 determines whether the bit in the entity select vector has been set to "1". The binary bit set to "1" indicates that the column contains this unique value. If the bit has been set to "1", the processing returns to the calling program during block 445. However, if the bit is not set, then processing continues in block 444. In block 444, the new bit added to the select vector is set to "l" to indicate that the unique value is in the subset. During block 446, processing returns to the calling program. Referring to FIG. 10D, a more detailed discussion of the ADD VALUE TO COLUMN routine is now discussed. During block 448, the RPU 22 via BBVP 14 counts the number of binary bits set to "1" in the entity select vector up to and including the bit at the ordinal position corresponding to the unique value inserted. This number is called "count". The "count" of binary bits set to "1" corresponds to the location of the row use vector in the row use set. During block 450, the system inserts the new row use vector at the position in the row use set corresponding to "count." During block 452, the system appends a binary bit set to "0" to all of the row use vectors of the row use set. In block 454, the system sets the last bit of the new or selected row use vector to "1" to indicate that the new value is added to the last row of the column. During block 456, processing returns to the calling program. Returning to block 449, if a new row use vector is not required to be built by the RPU 22, then processing continues at blocks 452. During block 452, bits set to "0" are appended to the existing row use vectors and during block 454, the last bit of the new row use vector is set to "1".
1. Detailed Example for the INSERT Function
Referring to the suppliers relation depicted in FIG. 4, a detailed example for adding a name to the value set of names (162, FIG. 4) and to a column (170, FIG. 4) corresponding to names in the relation is now discussed. In this example, the requirement exists to insert the name "Zeus" to the value set depicted at 162 (FIG. 4) , and to the end of the column at 170 (FIG. 4) . This example has been constructed to illustrate all of the routines for inserting a value to a value set and to a binary representation of a relation. FIG. 12 is a detailed results table depicting the various steps performed by the INSERT routine (FIG. 10A) . The results table of FIG. 12 is broken up into three columns. From left to right, the first column depicts the existing value set, the second column is the entity select vector characterizing a subset of the value set, and the third column is the row use set representing the names column of the relation. Each column of the results table (FIG. 12) depicts a change in either the value set, entity select vector, or row use set as the INSERT routine (FIG. 10A) is performed. The subroutines depicted in FIGS. 10A, B, C and D are transparent to the application. Only the following type of instruction is required:
INSERT
Into (Relation-Suppliers) Value (Zeus)
This instruction is interpreted by- the command interpreter 28 (FIG. 1A) to add a new and unique value Zeus to the value set for names and to add the word Zeus to the binary representation of the relation for suppliers, currently stored in memory 18. Pursuant to the instruction, the RPU 22 brings the binary representation of the suppliers relation from memory 18 to the RPU 22. Additionally, the value set is brought from the external device 12 via bus 30 to the RPU 22. Because the unique value Zeus is not part of the value set names, the RPU 22 calls the routine (FIG. 10A) to insert the value Zeus into the column of names in the relation and also to add the unique value Zeus to the value set of names, referring to the INSERT routine (FIG. 10A) . During block 422, the RPU 22 calls the subroutine INSERT VALUE INTO VALUE SET (FIG. 10B) . Referring to FIG. 10B, at block 430 of the INSERT VALUE INTO VALUE SET routine, the system determines the ordinal position in which the value Zeus is to be inserted. During block 432, the RPU 22 determines that Zeus does not exist in the value set, and in block 436, the RPU 22 adds Zeus to the proper node in the structure for the value 494 (FIG. 12) . If a node did not exist in the structure, then a new node would be added and set to Zeus. At block 438, the RPU 22 returns to block 424 of the INSERT routine (FIG. 10A) .
During block 424 (FIG. 10A) , the RPU 22 calls the subroutine UPDATE SUBSET (FIG. 10C) . Referring to FIG. 10C, at block 440, the RPU 22 via the BBVP 14 determines the ordinal position of the entity select vector which corresponds to the unique value Zeus in the value set. The ordinal position is the last position of the entity select vector. In block 441, RPU 22 determines that the value Zeus had been previously added to the value set of names, and processing continues at block 442. During block 442, the system inserts a bit to the entity select vector (494, FIG. 12) to indicate that a value exists in the last ordinal position of the value set. During block 444, the RPU 22 sets the new bit to "1" in order to indicate that the value Zeus is added to the subset (496, FIG. 12). Then, in block 446, the RPU 22 returns to block 426 of the INSERT routine (FIG. 10A) .
Referring to block 15A, during block 426, the RPU 22 calls ADD VALUE TO COLUMN routine (FIG. 10D) . During block 448 of the ADD VALUE TO COLUMN (FIG. 10D) , the RPU 22 determines the "count" of binary bits which are set to "1", up to and including the bit which corresponds to ordinal position corresponding to the new value Zeus. "Count" is equal to six because there are six binary bits set to "1" in the entity select vector; the bit corresponding to the value'Zeus is the sixth binary bit set to "1". With the "count", the RPU 22 determines if there presently resides a row use vector which corresponds to the value Zeus. A row use vector does not exist at position six as determined previously at block 440 (FIG. IOC), and during block 450, the RPU 22 inserts a new row use vector at the sixth position of the row use set. During this step, the RPU 22 counts, from left to right, six row use vector positions in the row use set. The RPU 22 adds a new row use vector having binary bits set to "0". During block 452, the RPU 22 appends binary bits set to "0" to the end of all of the row use vectors in the row use set (500, FIG. 12). Then, during block 454, the last bit of the newest and sixth position row use vector is set to "1" to indicate that Zeus now occupies the last row of the column for names in the relation for suppliers (502, FIG. 12).
B. DELETE
DELETE is an operation which removes one value at a time from a binary representation of a relation and possibly from a value set. The DELETE operation occurs in three categories of cases. First, a unique value need not be removed from a subset; however, the unique value needs to be removed from the column. Second, a unique value exists in a column, and it needs to be deleted from the relation and from a subset; however, it does not need to be removed from the value set. Third, a unique value already exists in a relation and it needs to be deleted from the column from a corresponding subset and from a value set. Multiple values may be removed from a value set or a column; however, the DELETE routines must be processed more than once according to the number of times a value or values need to be deleted. As in the case of INSERT, the binary representation of the relation and the value set are in RPU 22, ready for processing.
The flow diagrams in FIGS. 11A, B, C and D depict routines for processing any one of the three situations discussed above. Specifically, the flow diagram in FIG. 11A is a routine for removing a unique value from a value set and for removing the unique value from a column of a relation; this routine is called DELETE. FIG. 11B is a flow diagram of a routine for removing a value only from a column. This routine is called DELETE VALUE FROM COLUMN. This routine by itself could be used to DELETE a value, one or more times, from a particular column. The flow diagram of FIG. 11C is a routine for updating an entity select vector to reflect the removal of a unique value from a particular subset; this routine is called DELETE VALUE FROM SUBSET. The flow diagram of
FIG. 11D is a routine for removing a unique value from a
.value set; this routine is called DELETE VALUE FROM VALUE SET. The flow diagrams in FIGS. 11B and 11C can be used together to remove a value already existing in a column, and to remove the value from a subset of a value set without removing the value from the value set.
Referring to FIG. HA, a more detailed description of the DELETE routine is now discussed. During block
460, the RPU 22 calls the DELETE VALUE FROM COLUMN
routine (FIG. 11B) to DELETE a value from a column of a relation. Once all of the occurrences associated with a unique value have been removed from a column, the row use vector associated with the unique value contains binary bits set to "0". If not all of the bits of the row use vector are set to "0", then during block 461, the RPU 22 determines that processing is complete and returns processing to the calling routine at block 461. On the other hand, if all the bits are set to "0", then 1 block 461 determines that processing is incomplete and processing continues at block 462. Because the unique value is no longer referenced in the column of the s relation, it can be removed from the subset (or entity
5 select vector) which depicts the column. Specifically,
* the entity select vector associated with the unique value of the column can be updated via a call to the
DELETE VALUE FROM SUBSET (FIG. 11C) during block 462.
Once the entity select vector has been updated, the RPU
10 22 calls block 462(a) to determine whether there is "1" bit set in the ordinal position corresponding to the value being deleted in any of the entity select vectors for the relational database. If there are, then the value is presently being used in other relations and,
15 processing returns to the calling program at block 462(b). However, if there are no "1" bits set at the ordinal position corresponding to the unique value being deleted, then processing continues at block 464. During block 4.64, the DELETE VALUE FROM VALUE SET (FIG. 9D) is
20 called to remove the unique value from the value set. The value is removed from the value set only when none of the other relations in the relational database reference the unique value. When processing is completed, the system returns at block 466 to the ' . 25 calling program. The DELETE routine (FIG. 11A) can be called successively to DELETE one or more value of a row of one relation.
Referring to FIG. 11B, a more detailed description of the DELETE VALUE FROM COLUMN routine is now
30 discussed. During block 470, the RPU 22 determines which row use vector of the row use set is associated with the particular value to be removed from one row of the column. This operation can be conducted by performing successive Boolean AND operations on the row
35 use vectors and a binary bit only having a single bit set to one at the ordinal position corresponding to the row position of the column. The bit is changed from "1" to "0", in the appropriate row, indicating the absence of the value in the particular row of the column. During block 472, the RPU 22 via the BBVP 14 determines whether all of the bits of the row use vector have been set to "0". If not all of the bits of the row use vector are set to "0", then a "DONE" signal is returned to the calling routine at block 474. If all of the binary bits of the row use vector are set to "0", then processing returns at block 478 to the calling routine.
Referring to FIG. 11C, a more detailed discussion of the DELETE VALUE FROM SUBSET routine is now discussed. During block 480, the RPU via BBVP 14 determines the ordinal position in the corresponding entity select vector associated with the unique value whose row use vector has been deleted from the row use set. During block 482, the RPU 22 via the BBVP 14 sets the binary bit in the entity select vector, associated with the unique value to "0" to indicate the absence of the unique value in the subset. In block 485, processing returns to the calling program.
Referring to FIG. 11D, a more detailed discussion of the DELETE VALUE FROM VALUE SET routine is now discussed. During block 488, the system removes the value from the value set by removing the value in the value set structure. Additionally, the binary bit corresponding to the deleted value in all entity select vectors is also removed to account for the reduced size of the value set.
1. Detailed Example for the DELETE Operation
Referring to FIG. 13, the names of the suppliers of the suppliers relation depicted in FIG. 4 are shown along with the row use set corresponding to suppliers name column in the relation. Essentially, this example starts where the "Insert Operation Example," left off.
Zeus had been added to the names value set, and Zeus had been added to the column of names in the suppliers relation. In this example, it is required to DELETE the name Zeus from the value set and to DELETE the name Zeus from the column (504, FIG. 13). FIG. 13 is a detailed results table depicting the various tasks performed by the DELETE routine (FIG. 11A) . This example has been chosen to illustrate all of the routines for deleting a value from a value set and from a binary representation of a relation.
The results table of FIG. 13 is broken up into three parts. From left to right, the first column depicts the existing value set, including the unique value Zeus; the second column is the entity select vector representing the subset of the value set for names corresponding to column for names, including the name Zeus; and the third column is a row use set which represents the "names" column of the suppliers relation.
Each row of the results table (FIG. 13) depicts a change in either the value set, entity select vector, or row use set as the DELETE routine (FIG. 11A) is performed. The routines depicted in FIGS. 11A, B, C and D are all transparent to the application. The application provides only the following instruction to the system:
DELETE From (Relation-Suppliers) Where Name = Zeus This instruction is interpreted by the command interpreter 28 (FIG. 1A) to DELETE a value from the column of names in the suppliers relation. Because the value Zeus only appears once in the column for names in the relation and in the database, the system will call the routine DELETE (FIG. 11A) to remove the value Zeus from the column of names and also from the value set of names. Specifically, during block 460 (FIG. 11A) , the
RPU 22 calls the subroutine DELETE VALUE FROM COLUMN
(FIG. 11B) to remove the value Zeus from the names column of the relation. Referring to FIG. 11B, during block 470, the RPU 22 via the BBVP 14 changes the binary bit set to "1" to "0" in the row use vector to indicate that the value Zeus is no longer in the column for names
(506, FIG. 13) . In block 472, the system determines whether all of the bits of the row use vector have been set to "0". Zeus only appeared in the column once, and thus, by changing the one binary bit to "0", Zeus is no longer represented in the column; all of the binary bits of the row use vector are set to "0". Thus, in block 478, processing returns to block -462 of the DELETE routine (FIG. 11A) .
In block 462 (FIG. 11B) , the DELETE VALUE FROM SUBSET routine (FIG. 11C) is called to remove the unique value Zeus from the subset depicted by the entity select vector. Specifically, during block 463, the row use vector associated with the unique value Zeus is removed from the row use set. Processing continues at block 480, during which the ordinal position of the binary bit associated with the value Zeus in the entity select vector is determined. Specifically, the RPU 22 via the BBVP14, during block 480 determines that the ordinal position of the value Zeus in the entity select vector is the tenth position. Then, in block 482, the system sets the tenth binary bit from "1" to "0" to indicate the absence of the value Zeus' from the subset (510, FIG. 13). During block 487, processing returns to the DELETE routine (FIG. 11A) at block 462(a). In block 462(a), the RPU 22 determines whether there are any "1" bits set at the ordinal positions corresponding to Zeus in any entity select vector. The entity select vectors corresponding, to the value set of names for the entire relational database are evaluated to determine if the unique value is referenced in any other subset of the relational database. Because the other tables (i.e., parts (FIG. 5) and shipments (FIG. 6)) do not reference the "names" value set, there is only one entity select vector and the bit corresponding to the ordinal position of Zeus has been set to "0". Processing continues in block 464 which calls the DELETE VALUE FROM VALUE SET routine (FIG. 11D) .
Referring to FIG. 11D, during block 488, the RPU 22 removes Zeus from the value set and the corresponding bit in the entity select vector is also removed (512, FIG. 13) . Processing continues at block 490, during which the RPU 22 returns to the DELETE routine (FIG. 11A) at block 466, where processing returns to the calling program.
C. SELECT To this point, we have discussed the operations
INSERT and DELETE, which are basically functions for updating binary representations of relations. The next operations, are query functions for finding relevant information about a relation or groups of relations. The operations include SELECT, JOIN and PROJECT, and are principally used for determining a resultant binary relation. The resultant binary relations can then be converted into their byte value form for users to under¬ stand. This section concentrates on the operation SELECT which generates a resultant binary relation for depicting which row or rows of a relation contain
selected values. Stated differently, given one or more value sets and a relation, SELECT determines the rows of the relation which correspond to a particular value or values, in one or more columns of a relation and the result is depicted in a binary representation: a binary bit vector called a "select vector". A typical example of a SELECT operation .might be for determining which suppliers (i.e., Smith, Jones, Blake, Clark and Adams) are located in Athens (63, FIG. 2). A more detailed discussion on this query will be presented shortly.
Referring to FIG. 14, a flow diagram of the SELECT operation is depicted. During block 516, the RPU 22 via the BBVP 14 (it is assumed after this point that any time processing needs to be done on a bit vector that the RPU calls the BBVP 14 for processing) determines the ordinal positions of one or more selected unique values, which are in one column of the relation, in a particular value set. In block 518, the RPU 22 determines whether the selected unique values are found in the value set. If the selected unique values are not found in the value set, then processing returns to the calling program during block 520. Assuming that the selected unique values are found in the value set, then in block 522 a binary bit vector displaying the ordinal positions of the selected values within the value set is generated. Specifically, the binary bit vector contains bits set to "1" at the ordinal positions corresponding to those of the selected values and the remaining bits of the bit vector are set to "0". In block 524, the bit vector generated in block 522 is "ANDed" with the entity select vector, corresponding to the column in which the values reside, to determine whether the selected unique values are referenced in the column. In block 526, the RPU 22 determines whether the resultant bit vector has all bits of the resultant vector set to "0"; i.e., if the corresponding set is empty. If resultant bit vector is all zeros, then the selected unique values are not in the column, and thus, no select vector can be generated and processing returns at block 528 to the calling program.
Assuming that the resultant bit vector, from the AND operation step, is not empty (e.g., some bits are set to "1" in the bit vector) , processing continues at block 530 in which the RPU 22 determines count; the number of binary bits, in the entity select vector, which are set to "1" up to and including the ordinal position of each selected value. For each unique value, the count is then used to determine which row use vectors of the row use set correspond to the selected unique values. During block 532, the row use vectors, corresponding to "count", are retrieved. Processing continues at block 536, in which the RPU 22 determines whether one or more unique values were selected from the particular column over which this part of SELECT is processed. Assuming that only one unique value from a particular column was selected by the application, the RPU 22 returns the one row use vector, corresponding to the unique value, retrieved at block 532. Processing returns to the calling program at block 542. However, if more than one unique value from a particular column was selected by the application for this operation, .then during block 537 the Boolean OR operation is performed on the selected row use vectors to determine a resultant relation. The Boolean OR operation is performed by the BLU 24 (FIG. 1A) . The steps for performing Boolean operations for compressed bit string is fully discussed in Glaser et al, which was referenced earlier. During block 538, RPU 22 determines whether the values selected are from only one column of the relation. If all of the selected values are from one column, then the system returns the resultant select bit vector to the calling routine at block 542. If, during block 538, it is determined that other values are selected from other columns of the relation, then processing continues at block 539. During block 539, RPU 22 determines whether any more row use vector for values need to be selected from other columns. If more values need to be processed, then processing continues at blocks 516, 518, 522, 524, 526, 530, 532, 536, 537, 538 and 539 until all of the row use vectors are processed and the entity select vector for each resultant column is generated. When there are no more columns from which to select values, then during block 540, a Boolean operation specified by the SELECT instruction is performed on the entity select vectors. For example, the SELECT instruction might require the determination of whether one value, in one column of the relation, is associated with another value in a different column of the relation. The select vectors for the two values would be ANDed together to determine whether both values reside in the same row of the relation. It should be noted that any of the Boolean operations (i.e. OR, XOR, etc.) could be used to calculate the desired results. For purposes of discussion, however, the operation is assumed to be AND. (For a more detailed discussion, refer to detailed ex¬ amples.) In summary, the flow diagram of FIG. 14 depicts the SELECT operation for returning a resultant entity select vector (e.g. binary bit vector) for depicting which rows of a relation contain one or more selected values.
Once the resultant select vector has been determined, the rows of the relation corresponding to the selected values can be displayed to the user. By having the row positions of each column of the relation, which contain the selected values, the RPU 22 determines which row use vector of the corresponding row use set contains a binary bit set to "1" in the ordinal position corresponding to the selected row position. An indexing function is performed, which determines the position of the selected row use vector in the corresponding row use set. Specifically, RPU 22 counts the number of row use vectors in the row use set up to and including the selected row use vector. This number corresponds to the ordinal position of the binary bit set to "1" in the entity select vector, which references the unique value of the relation. The unique value is retrieved from the value set. For each column of the relation, the value in the selected row is determined and displayed for the user.
In another embodiment of the SELECT operation, a step is added for selecting the unique value according to whether the selected value is greater than, less than, equal to, not equal to, equal to or greater than, or equal to or less than a prespecified value selected by the application program or user.
In another embodiment to be discussed in PART VI, the actual values in the selected rows of the relation are determined and displayed to the user via a mapping function through a vector called the "entity use vector". For each column of the relation, an entity use vector is maintained for identifying a value in the value set which corresponds to the value at a particular row of the column.
1- Detailed Example of a Two-Column SELECT for Two Values. This example relies principally on the suppliers relation of FIG. 2. Specifically, it is assumed for purposes of this example that the suppliers relation (63, FIG. 2) is in its binary represented form as depicted in FIG. 4 and this representation of the relation resides in memory 18 to be processed by RPU 22. The SELECT operation in this example involves two columns, namely, Suppliers ID column (168, FIG. 4) and the City column (174, FIG. 4) . Specifically, the query entered by the user is to determine all information by a supplier whose number is S5 and whose location is Athens. The instruction for this query is: SELECT ID#,CITY FROM S WHERE (ID# = S5) AND (CITY = ATHENS)
This query is interpreted by the command interpreter 28 (FIG. 1A) and RPU 22 retrieves the row use vectors associated with the unique values S5 and Athens, and then performs a Boolean AND operation to determine the resultant relation. FIG. 15 is a detailed results table depicting the various steps performed by the SELECT routine (FIG. 14) . The results table of FIGS. 15A and B are broken up into six columns. From left to right, the first column depicts the value set associated with the selected unique value, the second column is a bit vector corresponding to the ordinal position of the selected unique value within the value set, the third column is the entity select vector associated with the column in which the value resides, the fourth column is a resultant bit .vector determined by ANDing the ordinal position bit vector with the entity select vector, the fifth column is the row use set associated with the column in which selected unique value resides, and the last column is the select vector determined by performing a Boolean OR operation on all of the row use vectors corresponding to the selected unique values from one column.
The query in this example is a simplified SELECT query to minimize the explanation and steps required to perform the operation. However, generally, the query will be over several value columns of the relation (see the next example for select) . This example could easily be expanded to determine which of the suppliers IDs (i.e., SI through S5) is located in Athens. The row use vector corresponding to Athens indicates the rows of the relation which contain the supplier IDs associated with Athens. For simplicity, in this example we are concerned with only one of the supplier IDs, namely S5, and whether it is associated with Athens. Referring now to FIGS. 14, 15A and B, a detailed example of the query for determining whether supplier ID S5 is located in Athens is now discussed. Specifically, during block 516, the system determines the ordinal position of S5 in the value set for suppliers IDs. Essentially, the system traverses a structure associated with the suppliers ID value set and determines the specific node in the structure where the S5 value resides. During block 518, the system determines whether or not the value S5 has been found in the value set. The value S5 is located in the structure, and thus, it is within the value set for the suppliers IDs. In block 522, the system creates a binary bit vector for representing the ordinal position within the value set associated with value S5 (544, FIG. 15A) . As shown at row 544 of FIG. 15A, the fifth ordinal position in the new binary bit vector is set to "1" corresponding to the ordinal position of the value S5 in the value set. During block 524, the new binary bit vector is ANDed with the entity select vector associated with the suppliers identifiers in the suppliers relation (546, FIG. 15A) . In block 526, the resultant bit vector from- the AND operation is evaluated and it is determined that bit vector is not an empty set. The resultant bit string contains a binary bit set to "1" (548, FIG. 15A) . In other words, the unique value S5 is located in the subset referenced by the entity select vector of the suppliers column, and thus, it is in the relation of suppliers. In block 530, a count is performed on the entity select vector to determine the number of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique value S5. The RPU 22 determines that there are five binary bits set to "1", and thus, the unique value S5 is associated with fifth row use vector of the row use set associated with suppliers IDs (260, FIG. 4) . In block 532, the row use vector associated with unique value S5 is retrieved from the row use set (260, FIG. 4) . The row use vector for S5 is a binary bit vector containing four binary bits set to "0" and a fifth binary bit set to "1", indicating that the value S5 resides in the fifth row for the column for the suppliers IDs (550, FIG. 15A) . During block 536, RPU 22 determines that there is only one value selected from the column for suppliers IDs. Processing continues in block 538, in which the RPU 22 determines that more than one column is involved in this select operation, i.e., the supplier ID and city columns. In block 539, the RPU 22 determines that the value Athens has also been selected by the user in this query, and thus, processing returns to block 516. In block 5-16, the RPU 22 determines the ordinal position of Athens in the value set for cities. Essentially, the system traverses the value set for cities, and during block 518, the RPU 22 determines that Athens is in the value set for cities. During block 522, a binary bit vector is constructed to indicate which ordinal position of the value set for cities contains the city Athens (552, FIG. 15A) . Specifically, the system creates the binary bit vector, which shows a binary bit set to "1" in the first ordinal position (552, FIG. 15A) . During block 524, the Boolean AND operation is performed between the new binary bit vector and the entity select vector associated with cities for the suppliers relation (556, FIG. 15A) . In block 526, the RPU 22 determines that the resultant vector does not contain all "O's" (558, FIG. 15B) , and thus, the value Athens is determined to be in the suppliers relation. If the value Athens was not located in the suppliers relation, then the RPU 22 would return at block 528 to alert the user that the selected value for Athens, although found in the value set, is not within the suppliers relation. Processing continues at block 530, in which the RPU 22 does a count of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique number value Athens. RPU 22 determines that Athens is associated with the first binary bit set to "1" in the entity select vector, and thus, the unique value Athens corresponds to the first row use vector in the row use set (266, FIG. 4). In block 532, the RPU 22 retrieves the row use vector (560, FIG. 15B) associated with
Athens. The row use vector for Athens contains four binary bits set to "0", followed by a binary bit set to "1", indicating that the value Athens occupies the fifth row of the column associated with cities. During block 536,- the RPU 22 determines that there are no more unique values selected in the column of cities, and thus, processing continues at block 538, in which the RPU 22 determines that more than one column, i.e., suppliers and city, was selected by the user. In block 539, the RPU 22 determines that no more value need to be selected. In block 540 the row use vectors, associated with S5 and Athens, are ANDed together to generate a resultant select vector, which represents the rows of the relation which satisfy the query (562, FIG. 15B) . As shown, at 562 (FIG. 15B) the resultant binary bit vector contains four binary bits set to "0" followed by a binary bit set to "1", indicating that the fifth row of the relation contains the supplier ID S5 and the city
Athens. The actual row of the relation can be reconstructed and displayed to the user in one or two ways. First, for each column, the system can use the entity use vectors and associated row use vectors to map the row number determined by the select vector to the ordinal position in each value set. A more detailed discussion on the entity use vector approach will be discussed in Part VI. Second, the RPU 22 could trace back from the row use vectors to the entity select vectors and then back to the value set to determine the unique values in the fifth row of relation.
2. Detailed Example of Two Column SELECT for Multiple Values As in the previous example, this example relies principally on the suppliers relation of FIG. 2. Specifically, it is assumed for the purposes of this example that the suppliers relation (63, FIG. 2) is in its binary represented form, as depicted in FIG. 4 and that this binary representation of the relation resides in memory, to be processed by RPU 22. Again, the SELECT operation in this example involves two columns, namely, suppliers names column (170, FIG. 4) and the city column (179, FIG. 4) . Specifically, the query entered by the user is to determine whether suppliers Smith or Blake are located in London or Paris. The standard instruction for the query is: SELECT S#, CITY FROM S
WHERE (SNAME = 'SMITH') OR (SNAME =. 'BLAKE') AND (CITY = 'LONDON') OR (CITY = 'PARIS') This query is interpreted by the command interpreter (FIG. 1A) and RPU 22 retrieves the row use vectors associated with the unique values Smith or Blake which are, in turn, ANDed with the row use vectors for the unique values London or Paris. FIG. 16 is a detailed results table depicting the various steps performed by the SELECT routine (FIG. 14) . The results table of FIGS. 16A and B are broken up into seven columns. From left to right, the first column depicts the value set associated with the unique values, the second column is a bit vector corresponding to the ordinal positions of the selected unique values within the value set, the third column is the entity select vector associated with the column in which the selected values reside, the fourth column is a resultant bit vector determined by ANDing the ordinal position bit vector with the entity select vector, the fifth column is the row use set associated with the column in which the selected values reside, the sixth column is the select vector determined by performing a Boolean OR operation on all the row use vectors corresponding to selected unique values from one column, and the last column is the resultant vector determined by performing a Boolean AND operation on the select vectors determined for the selected values from more than one column.
Referring now to FIGS. 14, 16A and B, a detailed example of the query for determining whether the suppliers Smith or Blake are located in London or Paris is now discussed. Specifically, during block 516, the RPU 22 determines the ordinal positions of the suppliers names Smith and Blake in the value set for suppliers names. Essentially, the system traverses a structure associated with the supplier name value set and determines the specific nodes in the structure where the Smith and Blake values reside. .During block 518, the RPU 22 determines whether or not the values Smith and . Blake have been found in the value set. The values Smith and Blake are located in the structure; thus, they are x^ithin the value set for the suppliers names. In block 522 the RPU 22 creates a binary bit vector for representing the ordinal positions within the value set associated with the values. Blake and Smith are as shown in row 565 of FIG. 16A, the third and ninth ordinal positions. The new binary bits are set to "1" corresponding to the ordinal positions of the values in the value set for suppliers names. During block 524, the new binary bit vector is ANDed with the entity select vector associated with the suppliers names in the suppliers relation (567, FIG. 16A) . In block 526, the resultant bit vector from the AND operation is evaluated and it is determined that the resultant bit vector is not all zeros. The resultant bit string contains binary bits set to "1" (569, FIG. 16A) . In block 530, a single "count", with two ordinal positions as input, is performed on the entity select vector, up to and including the binary bit associated with the unique value Smith, the last value characterized in the new entity select vector. The RPU 22 determines that there are two binary bits set to "1", one for the count corresponding to Blake and that there are five binary bits set to "1" corresponding to the count for Smith. Therefore, the unique values, Blake and Smith, are associated with the second and fifth row use vectors of the row use that is associated with suppliers names (262, FIG. 4) . In block 532, row use vectors associated with the unique values Smith and Blake are retrieved from the row use set (262, FIG. 4) . The row use vector for Blake is a binary bit vector containing five binary bits in which the third binary bit is set to "1", indicating that the value Blake resides in the third row of the column for the suppliers names (570, FIG. 15) . Likewise, the row use vector for Smith is a binary bit vector containing five binary bits and the first bit is set to "1", indicating that the value Smith resides in the first row of the column for suppliers names (570, FIG. 16A) .
During block 536, RPU 22 determines that there is more than one value selected from a column for suppliers names, thus processing continues at block 537. During block 537, the row use vectors associated with Smith and Blake (570, FIG. 16A) are ORed together to form the select vector as shown at 571 of FIG. 16A. During block 538, the RPU 22 determines that more than one column is involved in this select operation, i.e., the suppliers names column and the city column. It continues at block 539, during which the RPU 22 determines that the values London and Paris are selected in the separate column cities. In block 516, the RPU 22 determines the ordinal positions for London and Paris in the value set for cities. Essentially, the system transverses the value set for cities, and during block 518, the RPU determines that London and Paris are both located in the value set for cities. During block 532, a binary bit vector is constructed to indicate which ordinal positions the value set for cities are associated with the cities London and Paris (573, FIG. 16A) . Specifically, the system creates a binary bit vector which shows binary bits set to "1" in the fifth and eighth ordinal positions (573, FIG. 16A) . During block 524, the Boolean AND operation is performed on the new binary bit vector with the entity select vector associated with cities for the suppliers relation (575, FIG. 16B) . In block 526, the RPU 22 determines that the resultant vector does not contain all zeros (579, FIG. 16B) . London and Paris are determined to be in the suppliers relation. If the values London and Paris were not located in the suppliers relation, then the RPU would return at block 528 to alert the user that the selected values were not found in the suppliers relation. Processing continues at block 530, in which the RPU determines the number of bits set to "1" up to and including the binary bit associated with the unique value London, and the same is done for Paris. RPU 22 determines that London is associated with the second binary bit set to "1" in the entity select vector; thus, the unique value London corresponds to the second row use vector in a row use set (266, FIG. 4) . Additional¬ ly, the RPU determines that Paris is associated with the third binary bit set to "1" in the entity select vector and, thus, Paris corresponds to the third row use vector in the row use set (266, FIG. 4). In block 532, the RPU 22 retrieves the row use vectors associated with London and Paris (561, FIG. 16B) . The row use vector for London contains five bits, the first and fourth bits of the row use vector containing binary bits set to "1". The row use vector for Paris contains five binary bits, the second and third binary bits set to "1". Essentially, the row use vectors for London and Paris indicate that the first through fourth rows of the column are associated with cities London, Paris, Paris, London.
During block 536, the RPU determines that more than one unique value was selected in the column of cities and, thus, processing continues. In block 537, the RPU performs the Boolean OR operation on the row use sets for London and Paris. A select vector is generated (563, FIG. 16B) , which has five binary bits and bits one through four are set to "1". In block 538, the RPU determines that more than one column of the relation was involved in the SELECT, i.e., suppliers names and cities. In block 539, the RPU determines that no more values in other columns need to be selected from the relation. In block 540, the select vectors associated with the suppliers names (571, FIG. 16A) and cities (563, FIG. 16B) are ANDed together to generate a resultant select vector, which represents the rows that satisfy the query (565, FIG. 16B) . As shown in 565 (FIG. 16B) , the resultant entity select vector indicates that the first and third binary bits are set to "1", indicating that the first and third rows of the suppliers relation contain information on whether Smith is associated with Paris and/or London and whether Blake is associated with Paris and/or London. As in the last example, the actual rows in the relation can be reconstructed to display to the user in one of two ways. First, the system can use the entity use vectors associated with each column of the relation to map the row numbers determined by the resultant select vector, to the ordinal positions in the appropriate value sets. Second, the RPU 22 could trace back from the various row use vectors to the entity select vectors and back to the appropriate value sets to determine the actual unique values in the first and third rows of the relation. A more detailed discussion on the entity use vectors approach will be discussed in Part .VI.
D. RECONSTRUCT
The purpose of the RECONSTRUCT operation is to generate the values associated with a particular column of a relation for the user to ascertain. Typically, the binary representation of a relation is constructed and stored in memory 18. If the user of the system wishes to see the actual relation and the values depicted in the relation, then the RECONSTRUCT operation can be per- formed for reconstructing and displaying the relation to the user. The flow diagrams in FIGS. 17A and 17B are for reconstructing and displaying various columns specified by the user or an applications program. Typically, the user will specify one or more columns of a relation to be displayed by the system. The user might request the Suppliers ID column of the relation for suppliers at 63 of FIG. 2, which is currently stored in its binary representation in memory 18 (FIG. 1A) (260, FIG. 4) . Referring to FIG. 17A, the first step in performing the RECONSTRUCT operation is in block 565, in which the user specifies various columns of a relation to be reconstructed. As stated above, an applications program may also specify particular columns of the relation to be projected. -For example, when the SELECT operation is performed, the resultant binary representation can then be reconstructed and displayed to the user via the RECONSTRUCT operation (FIG. 17A) . In the next step, block 566 calls the routine DISPLAY/RECONSTRUCT (FIG. 17B) to reconstruct one of the specified columns. The DISPLAY/RECONSTRUCT routine (FIG. 17B) essentially performs the necessary steps for obtaining the values and for placing the values in the proper rows in the column. In block 567, the RPU determines whether there are any more columns that need to be displayed. If there are more' columns to be displayed, then processing continues at block 566. Blocks 566 and 567 are performed until all of the columns specified by the user or applications program have been reconstructed. If all of the columns have been reconstructed, then processing continues at block 568, in which the RPU returns to the calling program.
Referring to FIG. 17B, a flow ^diagram of the DISPLAY/RECONSTRUCT routine is depicted. During block 571, the RPU 22 obtains 'the entity select vector associated with the particular column to be displayed. During block 575, the first row use vector associated with the row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18, is transferred to the RPU 22. Then during block 577, the RPU 22 performs a Boolean AND operation on the row use vector obtained in block 577 with a row select vector. (The row select vector is created by performing a query operation on the relation thereby selecting which rows of the relation the user or application program wishes to display.) The row select vector is a new binary bit vector and each binary bit corresponds to a row of the column or columns to be displayed. A binary bit set to "1" indicates that the corresponding row needs to be displayed. The result of the AND operation is a new vector Z which depicts the rows of the column which contain a particular value associated with the row use vector. The results of the AND operation are sent to memory 18 for future processing. Then, in block 579, the RPU 22 determines whether the resultant vector Z is "0". If the resultant vector Z is "0", then during block 581 a "0" is placed in the first binary position of a new vector called the index vector. The index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the row use set. Each bit indicates whether the unique value associated with the row use vector exists in the relational column to be displayed. If a binary bit in the index vector is set to "0", the unique value associated with the row use vector does not exist in the column to be displayed; Whereas, if the binary bit is set to "1" in the index vector, then the unique value associated with the row use vector exists one or more times in the column. Processing continues at block 593, during which the next row use vector of the row use set is obtained.
Returning to block 579, if the result of the Boolean operation performed in block 577 is non-zero, then processing continues at block 585. During block 585, the RPU 22 sets the binary bit in the index vector, which is associated with the current row use vector, to "1". In block 587, the resultant bit vector Z is stored in memory 18. The resultant vector Z is later used in the reconstruction process.
During block 589, the RPU 22 clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z. The purpose of this step is to shortcut the processing of the row use vectors in the row use set. Stated differently, when the row select vector is cleared, all of the values in the rows of the column have been determined. The row use vectors which have been processed with the row select vector contain all of the values to be displayed in the column. Then during block 591, the RPU 22 (28, FIG. 1A) determines whether the row select vector has been completely cleared, or stated differently, all of the binary bits have been set to "0". If the row select vector contains only binary bits set to "0", then processing continues at blocks 597, 601, 603, 605, 607, 609 and 611 to reconstruct the column with the values associated with the row use vectors in the row use set. However, if not all of the binary bits in the row use vector are set to "0", then processing continues at block 593. During block 593, the RPU 22 gets the next row use vector in the row use set currently being pro¬ cessed. During block 595, the RPU 22 determines whether the end of the row use set has been reached. If the end of the row use set has been reached, then processing continues at blocks 597, 601, 603, 605, 6077 609 and 611 to reconstruct the column. However, assuming that the end of the row use set has not been reached, then processing continues at blocks 577, 579, 585, 587, 589, 591, 593 and 595 until all of the row use vectors of the row use set have been processed.
Assuming that all of the row use vectors of the row use set have been processed or the row select vector contains binary bits set to "0", then processing continues at block 597. During block 597, the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Each binary bit set to "1" indicates which row use vectors reference unique values which are to be displayed in the column. During block 601, for a row use vector which has a corresponding binary bit set to "1" in the index vector, the ordinal position of the binary bit set to "1" in the entity select vector is determined. Then, during block 603, the value associated with the ordinal position obtained in block 601 is obtained from the value set. During block 605, the RPU 22 finds the appropriate location in the index vector associated with the value. Then during 607, the appropriate resultant Z vector stored during step 587 is retrieved. The resultant vector Z indicates which rows of the column contain the unique value associated with the row use vector, and the unique value is placed in the column at the appropriate row locations. During block 609, the RPU 22 determines whether any more values are left for processing. Assuming that there are more values, then processing continues in blocks 601, 605 and 607 until all of the values have been placed in the proper rows of the column. Once all of the values have been placed into the column, then processing returns to the calling program during block 611. 1. Detailed Example o Performing
RECONSTRUCT Operation
Referring to FIGS. 17A, 17B, 18A, 18B, 18C and 18D, a detailed example for reconstructing the column for Suppliers IDs in the Suppliers relation (FIG. 2) is now discussed. More particularly, it is assumed that only the binary representation of the column Suppliers IDs exist in the RDMS 10. The binary representation of the column may be from a result of a SELECT operation or it may have been previously stored after processing by the BBVP 14. In either case, the binary representation of the column Suppliers IDs exist in memory 18 and now the user or an applications program needs to display the actual values of the column. Although, this example is for reconstructing and displaying only one column of a supplier for the suppliers relation, the PROJECT operation could be sequentially performed to supply all of the columns of the supply relation.
Referring to FIGS. 18A, B, C and D, a Results Table for depicting the results of the RECONSTRUCT operation for reconstructing or supplying the Suppliers ID column (80, FIG. 2) of the suppliers relation (63, FIG. 2) is shown. Each row (800-848) of the Results Table depicts a result of the routines shown in FIGS. 17A and 17B. The Results Table is. separated into seven columns. From left to right, the first column of the Results Table shows the entity select vector for the selected column to be displayed. The second column shows the row use vector set associated with the specified column to be displayed. The third column is the row use vector currently being processed, and the fourth column is the row select vector specified by the user application program for determining which rows of the column are to be displayed. The fifth column is the result of ANDing the row use vector and the row select vector together; the result is called vector Z. The sixth column depicts the reconstruction of the index vector for displaying which row use vectors of the row use set have associated unique values in the column. The last column is for the reconstruction of the column to be displayed.
Referring to FIGS. 17A and 17B, the operation performed by the RPU 22 (FIG. 1A) for displaying and reconstructing the Suppliers ID column (80, FIG. 2) is now discussed. Specifically, during block 565 (FIG. 17A) the user or application program selects the various column or columns which are to be reconstructed and displayed as a result of the query operation. For this example, the user has selected the Suppliers ID column (80, FIG. 2) . Currently, the Suppliers ID column only exists in the secondary memory (18, FIG. 1A) in the form of a binary representation or row use set. During block 566, the DISPLAY/RECONSTRUCT routine (FIG. 17B) is called for finding the values and reconstructing the Suppliers ID column. Referring to FIG. 22B, during block 571, the RPU 22
(FIG. 1A) finds the entity select vector associated with the Suppliers ID column in the memory 18 (FIG. 1A) . The entity select vector stored in memory 18 (FIG. 1A) is transferred via bus 30 (FIG. 1A) to RPU 22 (FIG. 1A) . Then, during block 573, the RPU 22 finds the row use set associated with the Suppliers ID column in memory 18 and transfers via a bus to the RPU 22 (FIG. 1A) . During block 575, the RPU 22 obtains the first row use vector of the row use set for the Suppliers ID column (804, FIG. 18A) . For purposes of this example, the user wishes to display the first four rows of the Suppliers ID column. Constructed earlier in the system is a row select vector having four binary bits set to "1" in the first four ordinal .positions, i.e. "1 1 1 1 0". During block 577, the row use vector (804, FIG. ISA) and the row select vector (806, FIG. 18A) are transferred to the RPU 22 (FIG. 1A) . Here the row use vector (804, FIG. 18A) and the row select vector (806, FIG. 18A) are ANDed together to determine a resultant binary vector Z (808, FIG. 18A) . The resultant binary bit vector Z depicts the rows of the Suppliers ID column in which the unique value associated with the row use vector (804, FIG. 18A) is to reside. The resultant binary vector Z contains a binary bit set to "1" in the first position which means that the unique value associated with the row use vector (804, FIG. 18A) will be placed in only the first ordinal position of the Suppliers ID column. During block 579, the RPU 22 determines that the result of the AND operation is not "0", thus processing continues to block 585. During block 585, the RPU 22 via BBVP 14 (FIG. 1A) sets the first binary bits of the index vector to "1" (810, FIG. 18A) to indicate that the unique value associated with the row use vector (804, FIG. 18A) exists at least once in the Suppliers ID column. In block 587, the resultant binary bit vector Z is stored in memory 18 for future reconstruction of the column. Specifically, the binary bit set in the resultant binary bit vector Z indicate the rows of the Suppliers ID column with the unique value SI associated with the row use vector (804, FIG. 18A) reside. Then during block 589, the binary bits set to "l" in the row select vector which match the binary bits in the resultant vector Z are set to "0" (812, FIG. 18A) . During block 591, the row select vector is a value determined if all of the binary bits have been set to "0". The row select vector contains three more binary bits set to "1" (812, FIG. 18A) , thus it continues at block 593. During block 593, the RPU 22 obtains the next row use vector of the row use set (814, FIG. 18B) . The row use vector (814, FIG. 18B) and the row select vector (816, FIG. 18B) are ANDed together during block 577. The resultant vector Z is shown at 818 of FIG. 18B. During block 579, the resultant vector Z is evaluated to determine if all the binary bits of the resultant vector have been set to "0". Not all of the binary bits of the resultant vector are "0" (the resultant vector is "0 1 0 0 0" (818, FIG. 18B) ) , and thus during block 585, the second binary bit of the index vector is set to "1" (820, FIG. 18B) . In block 587, the RPU 22 (FIG. 1A) stores the resultant vector Z in memory 18 for future processing. Then during block 589, the binary bits of the row select vector which were set to "1" and matched the binary bit set to "1" in the resultant vector Z are set to "0"
(820, FIG. 18B) . In block 591, the row select vector is evaluated to determine if all the binary bits have been set to "0". The row select vector still has two binary bits set to "1" (821, FIG. 18C) , and during block 579, it is evaluated to determine if all the binary bits are set to "0". The resultant vector Z contains one bit set to "1", and thus processing continues at block 585.
" During block 585, the RPU 22 (FIG. 1A) sets the third binary bit of the index vector to "1" (828, FIG. 18C) .
During block 587, the resultant vector Z is stored in memory 18 for future processing. In block 589, the binary bit of the row select vector which matched the binary bit of the row use vector is set to "0" (830, FIG. 18C) . Then during block 591, the row select vector is evaluated to determine if all the binary bits are set to "0". The row select vector still has one binary bit set to "1" (834, FIG. 18C) , thus it continues at block 593. During 593, the RPU 22 obtains the next row use vector of the row use set (832, FIG. 18C) . The row use vector (832, FIG. 18C) and the row select_vector (834, FIG. 18C) are ANDed together during block 577. The resultant vector Z is shown at block (836, FIG. 18C) . During block 579, it is determined that not all of the binary bits of the resultant vector are "0" (the resultant vector is "00010" (836, FIG. 18C) . Thus, during block 585, the fourth binary bit of the index vector is set to "1" (838, FIG. 18C) . The resultant vector Z is stored for future processing. Then during block 589, the binary bit of the row select vector which were set to "1" and match the binary bit set to "1" in the resultant vector Z are set to "0" (840, FIG. 18D) . Then during block 591, the row select vector is evaluated to determine if all the binary bits had been set to "0". All the binary bits of the row select vector are set to "0" (830, FIG. 18D) , and thus processing continues to block 597. During block 597, the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Specifically, the first, second, third and fourth binary bits of the index vector are set to "1". Thus the first, second, third and fourth row use vectors of the row use sets are associated with unique values which exist in the Suppliers ID column. During block 601, RPU 22 (FIG. 1A) determines the ordinal positions of the entity select vector which correspond with the row use vectors having corresponding binary bits set to "1" in the index vector. ■ Then during block 603, the values associated with each ordinal position of the entity select vector are obtained from the value set of Supplier IDs. Specifically, the values SI, S2, S3 and S4 are obtained. Then, during block 605 and 607, the first resultant binary vector Z is obtained from temporary storage, and the value SI is placed in the proper ordinal position of the column for Suppliers IDs. And during block 609, the RPU 22 (FIG. 1A) determines whether there are any more values left for processing. There are three more values left for processing, and thus block 605 and 607 are performed. During block 605 and 607, the second resultant vector Z is obtained and the value S2 is placed in the second ordinal position in the column (844, FIG. 18D) . In block 609, it is determined that there are more values left for processing, and thus block 605 and 607 are performed. During block 605 and 607, the third resultant vector associated with the third resultant vector Z which is associated with the value S3 is obtained from memory. The value S3 is placed into the third ordinal position of the relational column for Suppliers IDs (846, FIG. 18D) . In block 609, it is determined that there is still one more value to process, and thus block 605 and 607 are performed. During block 605 and 607, the fourth resultant vector Z stored in temporary memory is obtained and the value S4 is placed into the fourth ordinal position of the column for Suppliers IDs (848, FIG. 18D) . During block 609, the RPU 22 (FIG. 1A) determines that there are no more values for processing and thus returns to the calling RECONSTRUCT routine (FIG. 17A) at block 567.
During block 567, the RPU 22 (FIG. 1A) determines whether there are any more columns to be displayed to the user. For this example, it is assumed that only the Suppliers ID column is to be reconstructed and displayed. However, if more columns of a particular relation were to be displayed, then processing would continue at blocks 566 and 567 until all of the columns were displayed. Also, the same row select vector would be used each time the DISPLAY/RECONSTRUCT routine (FIG. 17B) was performed. Assuming that there are no more columns to be displayed, then processing returns to the calling program during block 568. E. JOIN
The ability to "JOIN" two or more relations is considered to be the most powerful feature of a relational system, An Introduction to Data Base Systems, Vol. 1, 4th Ed. (1986) . Essentially, a JOIN is a SELECT over the Cartesian product of more than one relation of the relational database.
To understand the purpose of the JOIN operation, an overall view of how an operation might be implemented for a conventional system is shown. Suppose that a user needs to get all combinations of supplier and part information for the SUPPLIERS relation (63, FIG. 2) and PARTS relation (65, FIG. 2) such that the supplier and part in question were located in the same city. The user might use the following query:
SELECT S.ID#,S.NAME,STATUS,S.CITY,P.ID#,P.NAME, COLOR,P.WEIGHT,P.CITY FROM S,P
WHERE S.CITY=P.CITY; The result of this query produces the following table:
Figure imgf000084_0001
The data shown in Table A above, comes from the two relations; suppliers 63 and parts 65 (FIG. 2) . In the query above, the names of the relations are listed in the FROM clause, and connection between the two relations (63, 65 FIG. 2) is listed in the WHERE clause (i.e., the fact that the city values must be equal) which is called the JOIN predicate. The JOIN is used to combine relations based on equivalent values in the column if specified by the JOIN predicate. In this case, the specified columns are the "city" columns of each relation. The JOIN pairs each of the N rows of a first relation; e.g., the SUPPLIERS relation (63, FIG. 2), with each of the M rows of a second relation, e.g., the Parts relation (65, FIG. 2) , to form an N * M resultant relation. Then, the JOIN operation discards all resultant rows of the JOIN relation which do not satisfy the JOIN specification. This type of JOIN relation is generally referred to as the "EQUIJOIN" operation.
As an example of EQUIJOIN, consider any two rows from the two relations (i.e., the suppliers relation (63, FIG. 2) and the Parts relation (65, FIG. 2). For example, the rows shown below:
S.ID# SNAME STATUS CITY // P.ID# PNAME COLOR WEIGHT CITY SI Smith 20 London PI Nut Red 12 London
These rows show supplier SI and part PI, are located in the same city, namely, London. Therefore, a result row is generated, since both rows satisfy the predicate in the WHERE clause, namely, S.CITY=P.CITY.
Similarly, for all other pairs of rows in the SUPPLIERS relation (63, FIG. 2) and the PARTS relation (65, FIG.2) which satisfy the predicate clause, a resultant row is generated (see Table A) . Referring to Table A and to FIG. 2, notice that the supplier, S5 at 75, located in Athens, does not appear in the JOIN relation (Table A) because there are no parts associated with the city. Likewise, part P3 at 81, associated with Rome, does not appear in the resultant relation, because there are no suppliers associated with Rome.
There is no requirement that the comparison operator, in a JOIN predicate be equality. The EQUIJOIN by definition produces a result containing two identical columns as shown in Table A. If one of these two columns is eliminated, the result is called NATURAL JOIN. In conclusion, the JOIN operation is the restriction of the Cartesian product of two or more relations. The Cartesian product of a set of N relations is a new relation consisting of all possible rows "r", such that ιrr" is the concatenation of all rows from the participation of relations. Once the Cartesian product is generated, all rows that do not satisfy the "JOIN predicate" are eliminated from the Cartesian product. What is left is the EQUIJOIN result relation.
In this example, the complete table contains thirty rows. Now, all the rows' Cartesian product in which S.CITY is not equal to P.CITY are eliminated and what is left is the EQUIJOIN result as shown earlier.
In the following sections, two aspects of the present inventions are discussed. First, the JOIN relation is efficiently represented by binary bit vectors and second, the JOIN relation is constructed without having to create a cross product relation.
1. Binary Representation of a JOIN Relation Referring to FIGS. 2, 19, 20 and 21, a binary representation of a JOIN relation is now discussed. Specifically, FIG. 19 represents the depiction of a JOIN relation from the following query: SELECT S.ID#,S.STATUS,S.CITY,P.ID#
FROM S,P
WHERE S.CITY=P.CITY;
This query is a projection of the JOIN because the P.CITY is not mentioned in the SELECT clause of the query. Like the EQUIJOIN example discussed above, this query requires that data come from two relations, namely the suppliers relation (63, FIG. 19) and the Parts relation (65, FIG. 19) . Both relations are named in the FROM clause and the connection between the tables is through the CITY columns in the WHERE clause. The result of the JOIN for displaying the columns CITY, SUPPLIERS ID#s, STATUS, and PART ID #«s is shown at 628 of FIG. 19. To construct the JOIN relation of the SUPPLIERS relation and the PARTS relation, a new set of entity select vectors 600 and 602 for depicting the values in the JOIN relation columns, are created. The entity select vectors at 600 and 602 are binary bit vectors that indicate which rows of the particular relation associated with the entity select vector, participate in the JOIN relation. More particularly, each binary bit has an ordinal position, which corresponds to a row of the relation. Binary bits 601, 603, 605, 607 and 609 correspond to the five rows of the suppliers relation 63. When the binary bit is set to "1" the particular row associated with the binary bit participates in the JOIN relation. Specifically, binary bit 601 indicates that the first row of the SUPPLIERS relation 63 participates in the JOIN relation. Likewise, the entity select vector 602, having binary bits 611, 613, 617, 619 and 621, indicates that the first, second, fourth, fifth and sixth rows of the PARTS relation 65 participate in the JOIN relation.
Vectors 600 and 602 act just like the entity select vectors discussed in FIGS. 4, 5 and 6. . The only difference is that the ordinal positions of the bits in the entity select vectors 600 and 602 do not correspond toβunique values in a value set. Instead, the binary bits of entity select vector 600 and 602 refer to row locations in a particular relation. Like the entity select vectors in FIGS. 4, 5 and 6, an implied mapping correspondence exists between each binary bit in the entity select vector to and a particular row use vector in an associated row use set. The implied mapping scheme is illustrated by the dotted lines 608, 610, 612 and 614, which show that the binary bits of the entity select vector 600 indicate the correspondence of the rows of the suppliers relation to the row use vectors 615, 617, 619 and 621, respectively. Likewise, the binary bits of the entity select vector 602, which correspond to the row use vectors of the Parts relation 65, are mapped in an implied manner to the row use vectors 623, 625, 627, 629 and 631, as shown by the dotted lines 616, 618, 620, 622 and 624. The row use sets of the JOIN relation perform a dual task of representing the values in the rows of the JOIN relation and for depicting more than one column, of the JOIN relation. Specifically, the row use set 604 represents the columns S.CITY 638, S. ID# 636 and S.STATUS 634. Likewise, the row use set 606 represents the P.ID#, 630, the column of the JOIN relation 628. The columns 638, 636, 634 and 630 depict the result of the JOIN.
To summarize, the entity select vector 600 and the row use set 604, represent all of the binary information necessary to construct the suppliers relation portion 626 of the JOIN relation, 628; namely, columns 638, 636 and 634. The following is a detailed discussion on how this representation is achieved.
Referring to row use vector 615 at FIG. 19 of the row use set 604, a dotted line 608 maps the row use vector 615 to the binary bit 601 of the entity select vector 600. The binary bits of the row use vector 615 indicate the row positions of the columns 638, 636 and 634, which contain a particular value in the first row of the suppliers relation 63. Binary bit 601 cor¬ responds to the values London, 20, Smith and SI. To build the S.CITY column of the JOIN relation, only the value London is referenced. Thus, the three binary bits set to "1" in the row use vector 615 represent three occurrences of the value London in the S.CITY column 638 of the suppliers portion 626 of the JOIN relation. For the S.ID# column 636, the first three bits set to "1" in the row use vector 615 indicate three occurrences of the value SI. Likewise, in the STATUS column 634, the first three binary bits set to "1" in the row use vector 615 represent occurrences of the value 20. Thus, the first three rows of the suppliers portion 626 of the JOIN relation 628, are characterized by the first three bits of the row use vector 615. Likewise, the next three rows of the suppliers relation 626, are characterized by the row use vector 621 of the- row use set 604. The replication of the three bits in row use vector 615 indicates that the three values [LONDON, SI, 20] of the first table of the Supplier relation occur in rows 1, 2, 3 of the JOIN relation. The remaining values and the columns of the Suppliers portion 626, are indicated by the row use vectors 617 and 619, which contain binary bits set to "1" in the seventh, eighth, ninth and tenth rows of both row use vectors. It should be noted that although the columns S.CITY, S.ID# and S.STATUS of the Suppliers relation are indicated by the query, the column SNAME in the SUPPLIERS relation, could just as easily have been represented by the row use set 604.
The values of the Parts relation are mapped into the JOIN relation 628 by the row use set 606 and the entity select vector 602 in exactly the same fashion as for Suppliers.
In sum, the row use sets 606 and 604, together give a binary representation of the JOIN relation 628. Only the binary representation of the JOIN relation is stored in the RDMS 10 (FIG. 1A) . The actual values depicted by the binary representation, are retrieved when the system performs a reconstruct and display for the user. FIGS . 20 and 21 represent a more detailed view of the JOIN relation 628 (FIG. 19) . More particularly, the SUPPLIERS relation 63 (FIG. 19) is depicted by its row use sets at 63 (FIG. 21) and the Parts relation 65 (FIG. 20) , as depicted by its row use sets 65 (FIG. 20) . In addition, the entity select vectors 600 and 602 are shown corresponding to the row use sets 630 and
(FIG. 20). Also, FIGS. 20 and 21 depict the value sets referred to in the SUPPLIERS and PARTS relations with their associated entity select vectors.
Referring to FIG. 20 a detailed discussion of the SUPPLIERS relation portion 626 (FIG. 19) is now presented. As discussed above, the row use set 604 for the JOIN relation (628, FIG. 19) is mapped back to the Suppliers relation by the entity select vector 600. As shown is FIG. 19,' the entity select vector 600 is repeated for each column of the Suppliers relation which has one or more values in the suppliers portion of the JOIN relation. The implied mapping from each row use vector of the row use set 604 is shown by the dotted lines 608, 610, 612 and 614. The row use sets for representing the columns of the suppliers relation are shown in FIG. 4. The first three binary bits set to "1" in the row use vector 615 of the row use set 604 are mapped to the binary bit 601 of the entity select vector 600. The entity select vector 600 corresponds to the row use set 260. The first binary bit 601 is set to "1" indicating that the value in the first row of the S.ID# column of the suppliers relation is present in the JOIN relation. To determine which suppliers ID is referenced by the binary bit 601, a Boolean AND operation is performed on each row use vector 184, 186, 188, 190 and 192 to determine which row use vector contains a corresponding "1" bit in the first position. Row use vector 184 contains a binary bit set to "1" in the first position. Row use vector 184 maps back to the first binary bit of the entity select vector 176 for the suppliers relation. The first binary bit of entity select vector 176 corresponds to the value SI in the value set for suppliers identifiers 160. Thus, the first three binary bits set to "1" in the row use vector 615 indicate that the value SI is in the first three rows of the S column in the JOIN relation. In the same way, the first three binary bits set to "1" in the row use vector 615 are mapped to the entity select vector 600 associated with the row use sets 264 and 266, corresponding to the S.STATUS and S.CITY columns of the suppliers relation. The row use vector 615 represents the first three values of the S.CITY column 638 and the S.STATUS column 634 of the prior portion of the JOIN relation. Referring to FIG. 21, a more detailed view of the
Parts relation portion of the JOIN relation is shown. Specifically, the mapping of the part ID #'s to the P.ID# column 630 of the JOIN relation is shown. Row use vector 623 of the JOIN relation contains binary bits set to "1" in the first and fourth positions. These binary bits correspond to the first and fourth positions of the P.ID# column 627 of the JOIN relation. Row use vector 623 is impliedly mapped to the first binary bit 611 of the entity select vector 602. The entity select vector 602 corresponds to the row use set 304. A Boolean AND operation is performed on the entity select vector in each row use vector of the row use set 304 to determine which row use vector contains the binary bit set to "1" in the first position. The left most row use vector contains a binary bit set to "1" and this row use vector maps back to the first binary bit position of entity select vector 284. The first binary bit of the entity select vector 284 is set to "1", indicating that the first row of the value set 282 contains the unique value mapped into the Parts relation 65 and into the JOIN relation. Thus, binary bit 611 indicates that the value PI is mapped into the row use vector 623 of the JOIN relation, specifically, that of the first and fourth rows. The remaining rows of the P.ID# column 627 of the JOIN relation are depicted -by the row use vectors 625, 627, 629 and 631 of the row use set 666.
2. Constructing a Binary Representation of a JOIN Relation Referring to FIGS. 22A, 22B, 22C, 22D, 22E, 22F and
22G, a detailed discussion on the operations performed by the RPU 22 (FIG. 1A) for performing a JOIN operation and constructing the JOIN relation is now discussed. Specifically, referring to FIG. 22A, the routine EQUI- JOIN, which builds a binary representation of the JOIN relation, is shown. FIG. 22B is a flow diagram of the routine BUILD ROW USE SETS for constructing the particular row use sets associated with each column of the JOIN relation. FIG. 22C is a flow diagram for the routine CONSTRUCT JOIN ROW USE VECTORS for controlling the overall construction of each row use vector of a row use set in the JOIN relation. FIG. 22D is a routine EVALUATE ROW USE SETS for determining the number of occurrences of the unique values participating in the JOIN operation. FIG. 22E is a routine called PRODUCTS for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors for a particular row use set of the JOIN relation. FIG. 22F is a routine called NUMS for determining the number of times a particular bit pattern repeats itself in a row use vector in the JOIN relation. FIG. 22G is a routine called GENERATE BIT STRING for building the row use vector in the JOIN relation.
When the JOIN operation is performed, the binary representation of the one or more relations to be JOINed are found in the memory 18 of the RDMS 10. Here the binary represented relations are stored until the RPU 22 is ready for processing. When the RPU 22 is ready to perform the JOIN operation, the relations are sent via bus 48 to the BBVP 14. Using the relations in the BBVP 14, the RPU 22 performs the EQUI/NATURAL JOIN operation (FIG. 22A) to create a binary representation of the resultant JOIN relation.
Referring to FIG. 22A, a detailed description of the overall process for performing the EQUIJOIN operation is now discussed. Specifically, during block 652, the RPU 22 obtains all the entity select vectors, for the participating relations, of the columns which contain values from the same particular value set over which the JOIN operation is performed (e.g., CITY VALUE SET for S.CITY = P.CITY).
The JOIN operation is performed over the values of the relations which fulfill the WHERE clause of the JOIN query (e.g., S.CITY = P.CITY) . For example, referring to FIG. 19, where the JOIN relation was characterized by the WHERE clause (e.g., S.CITY = -P.CITY), the values London and Paris in the CITY columns of relation 63 and 65 were common to both relations, and the JOIN operation was performed with respect to these values. During block 654, the RPU 22 performs a Boolean AND operation on the entity select vectors obtained to determine a resultant binary bit vector x. The resultant binary bit vector indicates which values of the particular value set are common to all of the entity select vectors involved in the JOIN operation. Specifically, binary bits set to "1" in the resultant bit vector "x" indicate the values of the value set which are common to all the columns represented by the obtained entity select vectors.
During block 658, the RPU 22 determines which binary bits of each entity select vector correspond to the binary bit set to "1" in the resultant bit vector x. For each binary bit set to "1" in the entity select vector that corresponds to a binary bit set to "1" in the resultant bit vector x, the RPU 22 obtains the row use vector in the associated row use set during block 658. Then, during block 660, for each row use set associated with each entity select vector, the Boolean operation OR is performed on the selected row use vectors of the associated row use set. The resultant vectors are referred to as JOIN entity select vectors, which characterize values belonging to one or more JOIN columns in the JOIN relation. In block 664 , a row use set corresponding to each JOIN entity select vector is constructed. Specifically, during block 664, the BUILD ROW USE SET routine (FIG. 22B) is called to construct each row use set corresponding to each JOIN entity select vector of the JOIN relation. When all of the row use sets for the JOIN relation have been constructed, processing returns to the calling routine in block 668.
Referring to FIG. 22B, a detailed discussion for the routine BUILD ROW USE SETS is now discussed. The purpose of this routine is to construct the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns of the resultant JOIN relation.
Specifically, during block 672, the RPU 22 selects the first unique value i in the resultant bit vector x. In other words, the RPU 22 selects the first value represented by the occurrence of a "1" bit in the resultant bit vector x. Then, during block 673, a variable "START ROW" is set equal to zero. This variable indicates the start position of the first row in the JOIN row use vector in the JOIN row use set being generated and will be discussed in more detail along with the description of FIG. 22G. Then, during block 674, the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C is called. This routine is performed by the RPU 22 to determine the characteristics of a particular row use vector of the JOIN relation corresponding to the value i and to build the row use vector or vectors associated with the value in the JOIN relation. During block 676, the RPU 22 determines if there are any more unique values which are indicated in the resultant vector x. If there are, then the next unique value, i, is obtained at block 678. Processing continues at the CONSTRUCT JOIN ROW USE VECTORS routine to build the JOIN row use vector(s) associated with the next unique value. Assuming . that there are no more unique values represented in the resultant bit vector x, then processing returns to the calling program (EQUI/NATURAL JOIN, FIG. 22A) at block 680.
Referring now to FIG. 22C, a detailed description of the CONSTRUCT JOIN ROW USE VECTORS routine is now discussed. As stated above, the purpose of this routine is to determine the characteristics of the row use vectors of the JOIN relation and to build the row use vectors for a particular - row use set in the JOIN relation. During block 684, the routine EVALUATE ROW USE VECTORS is called. The purpose of this routine is to determine the number of occurrences of a particular value which participates in the JOIN operation. A more detailed discussion of this routine will be presented shortly with reference to FIG. 22D. Then, during block 686 the PRODUCTS routine (FIG. 22E) is called. The PRODUCTS routine calculates a series of product terms which characterize the formation of bit patterns in each of the row use vectors of a particular row use set of the JOIN relation. A more detailed discussion of this routine will be presented with reference to FIG. 22E. Then, during block 688 the NUMS routine (FIG. 27F) is called. The NUMS routine determines the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. A more detailed discussion will be shortly presented with reference to FIG. 22F. Assuming that all the calculations for determining the characteristic of a row use set have occurred, processing continues at block 690 during which the first input column "j" (where "j" is set equal to 1) is obtained. Then, in block 692, the row use set associated with the first input column j is obtained.
Then, during block 694, the GENERATE BIT STRING routine (FIG. 22G) is called for constructing the row use vectors associated with a particular value in the row use set of the JOIN relation. The GENERATE BIT STRING routine (FIG. 22G) evaluates the calculations of the PRODUCTS (FIG. 22E) and NUMS (FIG. 22F) routines to determine the characteristics of the bit patterns in the row use vectors and constructs these bit patterns in the row use vectors of the JOIN relation. Then, during block 696, the RPU 22 determines if there are any more input columns which need to be processed. Assuming that there are still other input columns to be processed, block 698 increments the variable "j" by 1. Processing -95- continues at blocks 692, 694 and 696 until all of the input columns have been processed. Assuming that there are no more input columns for processing, block 700 is entered to calculate a new value for the variable "START ROW." More particularly, the equation START ROW = START ROW + PRODS(1) is determined. The new value for START ROW will indicate the starting position of the bit pattern in the row use vector for the next value i of the resultant bit vector x. Processing returns at block 702 to the calling program, the BUILD ROW USE SETS routine, FIG. 22B.
FIG. 22D is a flow diagram of the EVALUATE ROW USE VECTORS routine, discussed below in more detail. As stated above, the purpose of this routine is to determine the number of occurrences of a particular value which participates in the JOIN operation. During block 704, the RPU 22 selects the first column and obtains the row use set of the column. Then, during block 705, the RPU 22 obtains the row use vector of the RUS (j) which corresponds to the unique value i of the bit vector x. The row use vector is referred to as Vj . During block 706, the number of binary bits set to "1" in the row use vector Vj is determined. The number of binary bits set to "1" in Vj corresponds to the number of occurrences of the particular value i in the current column over which the JOIN is performed. The number of occurrences calculated for this particular row use vector Vj is placed in the variable Cj . Cj is used by the PRODUCTS routine (FIG. 22E) , to be discussed. Then, during block 707, the RPU 22 determines whether there are any more input columns. Assuming that there are still more input row use vectors, processing continues at block 709, during which the variable j is incremented by l. Processing continues at block 705, 706 and 707 until all of the input columns are processed. Assuming that all of the input columns have been processed, then during block 711 processing returns to the calling program, the CONSTRUCT JOIN ROW USE VECTORS, FIG. 22C (AT BLOCK 686) . FIG. 22E, is a flow diagram for the routine
PRODUCTS. As stated earlier, the purpose of this routine is for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors of the JOIN relation. During block 710, an array called PRODS is set equal to the series
PRODS = { (l,Cχ * C2 * C3 * . . . * Cn) , (2,C2 * C3 * . . . * Cn), (3,C3 * . . . * Cn), . . . (N - 1, Cn - 1 * Cn) ,
(N, Cn) , (N + 1, 1) } where Cj is equal to the number of occurrences of value i in column j participating in the JOIN operation. For example, suppose the JOIN operation is performed over the CITY columns for two relations and the value of London is found to be present in both columns. Assume that the CITY column in the first relation contains two occurrences of the value London and the CITY column in the second relation contains three occurrences of the value London. Then C]_ is set equal to 2 and C2 is set equal to 3. Therefore, PRODS(1) is equal to C-^ * C2 , which is equal to 6. This number is used by the GENERATE BIT STRING routine (FIG. 22G) to determine the characteristics of the bit patterns in the row use vectors of the JOIN relation. When processing block 710 is completed, processing continues at block 712, to return processing to the calling program or the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
Referring to FIG. 22F, a detailed description of the NUMS routine is now discussed. The NUMS routine is for determining the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. Specifically, during block 716 the following series is calculated
NUMS = (1, PRODS(1)/PRODS(1) )
(2, PRODS(1)/PRODS(2) )
(3, PRODS(1)/PRODS(3) )
(4, PRODS(1)/PRODS(4) )
(N, PRODS(1)/PRODS(N) ) .
Then, during block 718, processing returns to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
Referring now to FIG. 22G, a detailed description of the GENERATE BIT STRING routine is now discussed. As stated above, this routine determines' the characteristics of a bit' pattern in a JOIN row use vector associated with a particular value i. An OFFSET value is determined for specifying the number of binary "0" bits in the row use vector ahead of the first binary "1" bit in addition to the zero-bits specified by START- ROW. During block 722 the offset value is equal to an initial value of zero.
During block 724, RPU 22 obtains the input bit vector associated with the variable Vj . During block 726, RPU 22 obtains the first binary "1" bit in the Vj bit vector. At block 728, a variable K is set equal to the ordinal position of the selected (726, FIG. 22G) bit of Vj . Processing continues at block 730, during which the Create Output vector "W" is generated. The output vector "W" at a position in the row use set corresponds to the bit position k, in the entity select vector (the destination entity select vector corresponding to column j) . The characteristics of the output bit vector W are determined by calculating NUMS (j), PRODS (j + 1) and PRODS (j). NUMS (j) indicates the number of repetitions of a bit pattern having PRODS (j + l) "l" bits. PRODS (j) indicates the total number of bits in the bit pattern associated with the output bit vector W. The position of the first bit set to 1 in the output bit vector is determined by the equation: POSITION = START ROW + OFFSET; where START ROW is the first bit position of the bit string indicating a particular value and OFFSET specifies the number of binary bits set to "0" in the row use vector before the series of bits set to "1" occurs.
Processing continues in block 732, during which the RPU 22 determines if there are any more bits set to "1" in the bit vector V . Assuming that there are more binary bits set to "1" in the bit vector Vj, processing continues at block 736 during which OFFSET is set to OFFSET + PRODS (j + 1) . Then, during block 734, the next bit of the bit vector Vj is obtained. Processing continues at blocks 728, 730, 732, 734, 736, until all of the binary "1" bits in the bit vector Vj have been evaluated and output bit vectors W have been built. When all the binary bits in the bit vector Vj have been evaluated, processing returns to the calling program during block 738.
3. Detailed Example For Constructing a
Binary Representation of the JOIN Relation Referring to FIGS. 19, 22A, 22B, 22C, 22D, 22E, 22F, 22G, 23A, 23B, and 23C, a detailed example for constructing a binary representation of the JOIN relation as shown in FIG. 19 is now described. As discussed earlier, FIG. 19 depicts a JOIN operation for the following query:
SELECT S.ID#,S.STATUS,.S.CIT ,P.ID# FROM S,P WHERE S . CITY=P . CITY ;
The result of this JOIN operation for columns CITY, SUPPLIERS ID#s, STATUS and PART ID#s is shown at 628 of FIG. 19. The binary representation for the new JOIN relation is shown at 604 and 606. More particularly, the SUPPLIERS relation portion of the JOIN relation 628 is shown at 604 and the PARTS relation portion of the JOIN relation is shown at 606. The purpose of this discussion is to construct the binary representation of the JOIN relation for the query above.
Fig. 23 is a Results Table depicting the JOIN operation performed on the SUPPLIERS relation and the PARTS relation for the specific query above. Each row of the Results Table, indicated at 850-878, depicts a different step of the creation of the binary representation of the JOIN relation, as shown by the flow diagram of Fig. 22. There are ten columns in the Results Table of FIG. 23. From left to right, the first column is the value set over which the JOIN operation is performed. The second column is the entity select vector associated with the column of the first relation (or SUPPLIER relation) over which the JOIN operation is performed. The third column is the entity select vector for the column in the second' relation (or PARTS relation) over which the JOIN operation is performed. The fourth column is the resultant bit vector x generated by Boolean ANDing the entity select vectors for the two columns over which the JOIN operation is being performed. The fifth column is the selected input row use vectors of the SUPPLIERS relation. The sixth column is the selected input row use vectors for the PARTS relation. The seventh column is the entity select vector 1, corresponding to the first column of the JOIN relation which also refers to the SUPPLIERS relation portion of the JOIN relation. The eighth column is the entity select vector 2 corresponding to the second column of the JOIN relation which also refers to the PARTS relation portion of the JOIN relation. The ninth column is the first JOIN row use set corresponding to entity select vector 1, and the tenth column is the second JOIN row use set corresponding to entity select vector 2.
Referring to FIG. 22A, a more detailed discussion for performing the EQUIJOIN operation is now discussed. During block 652, the RPU 22 obtains all entity select vectors for the SUPPLIERS relation and the PARTS relation, for the columns which contain values from the values set over which the JOIN operation is performed (e.g., the CITY value set corresponding to the Where clause, S.city = P.city) . Thus the entity select vectors for the CITY columns of the SUPPLIERS relation
(63, FIG. 19) and the PARTS relation (65, FIG. 19) are obtained (850, FIG. 23A) . The second column of the Results Table of FIG. 23 depicts the entity select vector for the CITY column of the SUPPLIERS relation, and the third column of the Results Table of FIG. 23 depicts the entity select vector associated with the CITY column of the PARTS relation. Then during block 654, the RPU 22 performs a Boolean AND operation on the entity select vectors for the SUPPLIERS and PARTS relations to obtain a resultant binary bit vector x (852, FIG. 23A) . The resultant binary bit vector, shown in the third column of Fig. 23A, indicates which values of the CITY value set are common to the entity select vectors for the CITY columns of the SUPPLIERS and PARTS relations. During block 658, the RPU 22 determines which binary bits of each entity select vector correspond to the binary bits set to "1" in the resultant vector x. For the entity select vector corresponding to the CITY column in the SUPPLIERS relation, the row use vectors corresponding to the values London (L) and Paris (P) are obtained (854, FIG. 23A) . Additionally, the row use vectors corresponding to the values London and Paris in the CITY column for the PARTS relation are obtained (856, FIG. 23) . Then, during block 660, for each row use set associated with* an entity select vector (i.e., 854 and 856 of FIG. 23A) , the Boolean OR operation is performed. The resultant vectors of the Boolean OR operations are entity select vectors for the output or JOIN row use sets of the JOIN relation being formed. Specifically, the entity select vector 1, for the Boolean OR of the bit vectors for London and Paris is shown at 858 of FIG. 23A. The entity select vector 2 for the Boolean OR of the bit vectors for London and Paris is shown at 860 of FIG. 23A. In block 664, the routine BUILD ROW USE SETS is called for constructing the row use sets associated with the JOIN entity select vectors. Referring to FIG. 22B, the BUILD ROW USE SETS routine for constructing the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns of the resultant JOIN relation is shown.
During block 672 (FIG. 22B) , the RPU 22 selects the first value represented to be present by the first occurrence of a bit set to "1" in the resultant bit vector x. The first unique value of the resultant relation which corresponds to a bit set to "1" is
London. During block 673, the variable START-ROW is set equal to zero. In block 674, the CONSTRUCT JOIN-ROW USE VECTORS routine is called. Processing continues at block 684 of the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
During block 684, the EVALUATE ROW USE VECTORS routine (FIG. 22D) is called. Referring to FIG. 22D at block 704, the RPU 22 obtains the first row use set associated with the S.CITY column over which the JOIN operation is performed. Particularly, the RPU 22 obtains the row use set corresponding to the S.CITY column of the Supplier relation. The variable j corresponding to this particular column is set equal to 1. Processing continues in block 705, during which the row use vector of the row use set for S.CITY column is obtained for the unique value London which corresponds to the first binary bit set to "1" in the resultant binary bit vector x. The variable V^_ is set equal to the row use vector "10010" (854, FIG. 23A) . Then, during block 706, the RPU 22 sets the variable C~_ equal to 2, the -number of "1" bits in the bit vector V^. During block 707, the RPU 22 determines if there are any more input columns to be processed. There is a second input column corresponding to the PARTS relation. Thus, processing continues at block 709, during which the variable j is incremented by 1 (j = 2) . Processing continues at block 705, during which the row use vector "100101" (856, FIG. 23B) , corresponding to London in the row use set for the P.City column, is obtained. V2 is set equal to this vector. Then, during block 706, the RPU 22 sets the variable C2 equal to 3, the number of "1" bits in the bit vector V2. During block 707, the RPU 22 determines that there are no more input columns over which the JOIN operation is to be performed. Thus, processing returns during block 711 of FIG. 22D to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 686.
During block 686, the PRODUCTS routine (FIG. 22E) is called. Referring to FIG. 22E during block 710, a series of numerical products which characterize the formation of the bit patterns in each of the row use 1 vectors of the JOIN row use set is constructed. Specifically, the function called PRODS is: PRODS = [(1, 6) (2, 3) (3, 1)]. ^ Processing returns during block 712 to the
5 CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at ^ block 688. During block 688, the NUMS routine (FIG.
22F) is called to determine the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. The NUMS determination is 10 a. series of divisions for calculating the number of repetitions of a particular pattern of "1" bits in a row use vector of the JOIN relation. Specifically, NUMS is equal to the following series:
NUMS = (1, 1) (2, 2) . 15 Processing returns during block 718 to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 690.
During block 690, the RPU 22 sets j = 1 and selects the first input column, S.CITY, over which the JOIN 20 operation is being performed. Processing continues in block 692, during which the row use set associated with the S.CITY column is obtained. Then, in block 694, the GENERATE BIT STRING routine (FIG. 22G) is called.
Referring to FIG. 22G at block 722, the variable
25 OFFSET is set equal to an initial value of zero. During block 724, the RPU 22 obtains the bit vector associated with the variable V where j = 1. Then, during block
726, the RPU 22 performs a count function on the first bit set to "1" in the binary bit vector Vn . Then, at
30 block 728, the variable K is set equal to the ordinal position of the first binary bit set to "1" in V]_; the ordinal position is 1. Processing continues at block
730, during which the create output vector W is generated. The output vector W corresponds to the bit
35 position 1 in the entity select, vector 1 (the Destination or JOIN entity select vector corresponding to S.CITY column). The characteristics of the output bit vector W are determined by calculating NUMS (1) , PRODS (2) and PRODS (1) . NUMS (1) indicates the number of repetitions of the bit string of the output vector W corresponding to the first occurrence of London in the bit vector V^_. PRODS (2) indicates that there are three "1" bits in the bit pattern of the output vector W. PRODS (1) indicates the total number of bits, six, in the bit pattern associated with the output bit vector W. The position of the first bit set to "1" in the output bit vector W is determined by the equation: POSITION = START ROW + OFFSET; where START ROW = zero and OFFSET = zero. Thus, the position of the first binary bit set to "1" in the output bit vector W is zero. The output vector W is "111000" (862, FIG. 23). Processing continues in block 732, during which the RPU 22 determines that there is a second "1" bit in the bit vector V^. Thus, processing continues at block 736, during which the VARIABLE OFFSET is determined to be 3 (i.e., OFFSET = OFFSET + PRODS (2)). Then, in block 734, the next bit of V]_ is obtained. The position of the selected bit of V! is 4. Thus, during block 728, K is set to the 4. . During block 730, the characteristics of the output vector W associated with the bit position 4 in V-L is determined by calculating NUMS (1) , PRODS (2) and PRODS (1)). The calculation for these formulas was determined above. The position of the first bit set to "1" in the output vector W is determined to be 3 (position = 0 + 3) . Thus, the resultant output vector W is "000111" (864, FIG. 23). Processing continues in block 732, during which the RPU 22 determines that there are no more "l" bits in -. to evaluate. Thus, processing returns during block 738 to the JOIN ROW USE VECTOR routine (FIG. 22C) at block 696. During block 696 (FIG. 22C) the RPU 22 determines that there is another column, P.CITY, over which the JOIN operation is performed. Processing continues at block 698, during which the variable j is incremented by 1 (j = 2) . Then in block 692, the row use set associated with the column, P.CITY, is obtained. Processing continues in block 694, during which the GENERATE BIT STRING routine (FIG. 22G) is called.
During block 722 (FIG. 22G) the variable OFFSET is set equal to the initial value of zero. During block 724, the bit vector V2 is obtained. This is the row use vector corresponding to London in the P.CITY row use set. In block 726, the first bit set to "1" of the bit vector V2 is selected. During block 728, the position of this selected bit of V2 is determined to be 1. During block 730, the output vector W corresponding to the bit position 1 of the entity select vector 2 (Destination entity select vector corresponding to P.CITY column) is determined. The characteristics of the output vector W are determined by calculating NUMS (2), PRODS (3) and PRODS (2). NUMS (2) is the number of repetitions of the generated bit string which is equal to 2. The bit string consists of PRODS (2) bits which is equal to three bits. The number of "1" bits in the bit pattern is given by PRODS (3) which is equal to one "1" bit. This series commencing with the first bit set to "1" begins at position zero as indicated by the START ROW + OFFSET, where START ROW and OFFSET are both zero. Thus, the bit pattern of output vector W is "100100" (866, FIG. 23). Processing continues at block 732, during which the RPU 22 determines that there are still "1" bits in the bit vector V2 to be evaluated. Specifically, bits at positions 4 and 6 of the bit vector V2 need to be evaluated. Thus, processing continues at block 736, during which the variable OFFSET is .determined to be equal to 1. Processing continues at block 734, during which the next bit of the bit vector V2 is obtained and the position of the newly selected bit of bit vector V2 is determined to be 4. This value is set equal to variable K during block 728. During block 730, the output bit vector W corresponding to the bit position 4 of the entity select vector 2 (destination entity select vector corresponding to column P.CITY) is determined. Specifically, the output bit vector W is determined by calculating NUMS (2) , PRODS (3.) , and PRODS (2) _ The calculations for these formulas have been determined above and the series of bits corresponding to PRODS (2) begins at the first bit position corresponding to an offset of 1 (POSITION = START ROW + OFFSET; -where START ROW = zero and OFFSET = 1) . Thus, the output bit vector W is "010010" (868, FIG. 23) . Processing continues at block 732, during which the RPU 22 determines that there is still one "1" bit in the bit vector V2 to be evaluated. Thus, processing continues at block 736, during which offset is incremented by the value PRODS (3) . PRODS (3) is equal to 1 and thus the value of offset is equal to 2. During block 734 the next bit of the bit vector V is obtained and processing continues at block 728. The position of the newly selected bit of the bit vector V2 is 6 and the variable K is set to 6. During block 730, the output vector W corresponding to bit position 6 in the entity select vector 2 is determined. The characteristics of the output vector W have been determined above and the series of bits PRODS (2) begins at the second position which corresponds to an offset equal to 2. Thus, the output vector W is "001001" (870, FIG. 23) . Processing continues at block 732, during which the RPU 22 determines that there are no more "1" bits to evaluate in the bit vector V2. Thus, processing continues at block 738, during which processing returns to the CONSTRUCT JOIN ROW USE VECTOR routine (FIG. 22C) at block 696.
During block 696, the RPU 22 determines that there are no more input columns for processing. Thus, processing continues at block 700, during which the START-ROW variable is calculated to be 6; START-ROW plus PRODS (1) = 6. Processing returns at block 702 to block 676 of the BUILD ROW USE SETS routine, FIG. 22B. During block 676, the RPU 22 determines that there is one more unique value indicated to be present in the resultant bit vector x. The bit set to "1" in the resultant bit vector corresponds to the unique value Paris. During block 678, the unique value i = 2 for Paris is obtained. Then, during block 674, the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) is called. In a similar fashion, the row use vectors corresponding to the value PARIS are formed.
4. Constructing a BINARY REPRESENTATION of a
GREATER THAN JOIN
Referring to FIGS. 2 and 24, a detailed discussion for performing the GREATER THAN JOIN operation is now discussed. Specifically, in the JOIN operation, the "where" clause contains the predicate "greater than"
( > ) . For example, referring to FIG. 2, suppose that a user wishes to combine the SUPPLIERS relation 63 and the
PARTS relation 65 such that the SUPPLIER CITY follows the PARTS CITY in alphabetical order. The command for this query is:
SELECT S.*, P.*
FROM S, P
WHERE S.CITY > P.CITY The result of this query on the Suppliers relation 63 and the Parts relation 65 (FIG. 2) is the following relation:
TABLE A
S# SNAME STATUS S.CITY P# H©ME COIOR
Figure imgf000110_0001
Figure imgf000110_0002
As in the case of the EQUIJOIN examples discussed above, the command for the GREATER THAN JOIN operation requires that the data come from two relations, namely, the Suppliers relation (63, FIG. 2), and the Parts relation (65, FIG. 2) . As shown above, both relations are named in the FROM clause, and the express connection between the tables is the City column in the WHERE clause. The result of the GREATER THAN JOIN is for displaying all the columns of the Suppliers relation with all the columns of the Parts relation. The process for creating the binary representation of the GREATER THAN JOIN relation is the same as for the EQUIJOIN operation. Namely, the BUILD ROW USE set (FIG. 223) is called to physically construct the columns of the JOIN relation. The major difference in performing a GREATER THAN JOIN rather than an EQUIJOIN is inserting and preparing the data prior to the construction of the row use sets for the JOIN relation. Specifically, FIG. 24 is a flow diagram which depicts the steps for evaluating and preparing the data prior to performing the BUILD ROW USE SETS routine (FIG. 22B) , which constructs the binary representation of the JOIN relation.
Note in Table A for the GREATER THAN JOIN, that the City column of the Suppliers portion of the relation contains the value Paris, which is in all instances lexically greater than the value London in the Parts relation. Whereas, the. value for London is not in the City column for the Suppliers portion of the JOIN relation because London is not greater than the lexical values for Rome, Paris or London. Also note that the GREATER THAN JOIN operation can be performed for any particular characteristic. The results table above relied on the lexical ordering of the cities. However, any other kind of ordering could have been used to perform a comparison of greater than, for example, numerical ordering of values, etc.
Referring to FIG. 24, a generic routine for performing the GREATER THAN JOIN operation is shown. Specifically, the JOIN operation is constructed for all the values of column A (i.e. all the values of the City column of the Suppliers relation) which are greater than all the values of column B (i.e. all the values of the City column for the Parts relation) . It should be noted that this routine could be modified for any other of the predicate JOINs such as LESS THAN, LESS THAN OR EQUAL TO, and GREATER THAN OR EQUAL TO.
During block 1004, a sort operation is performed on all the values of column A to order the values from the greatest value to the least greatest value. Then, in block 1006, a binary bit vector D is generated. The binary bit vector D contains a number of binary bits equal to the number_ of values in the value set over which the GREATER THAN JOIN operation is performed. The binary bit vector D has its binary bits, set to "1" at all the ordinal positions corresponding to the values of the value set, which are less than the value y of column- A.
During block 1008, the Boolean AND operation is performed on the binary bit vector D with the entity select vector corresponding to column B. The resultant vector contains binary bits set to "1" at the ordinal positions corresponding to the values of column B, which are less than the value y of column A. In block 1010, the resultant vector is checked to see if it is "0". If the resultant vector is "0", then processing returns to the calling routine during block 1012 because none of the values of B is less than the value of A. Therefore, the GREATER THAN JOIN operation cannot be satisfied, and thus, the user is notified. Assuming that the resultant vector is not "0", then processing continues at block 1014. During block 1014, the ordinal positions of the binary bit set to "1" in the result vector calculated in block 1008 are determined. The ordinal positions correspond to the ordinal positions of the entity select vector, which correspond to the values of column B, which, are less than the values of column A. Then during block 1016, all of the row use vectors associated with the ordinal positions of column B, which are associated with the binary bits set to "1" in the result vector, are obtained. Additionally, the row use vector for value y of column A is also obtained. During block 1018, the row use vectors associated with column B are placed into a temporary area, along with the row use vector for column A. During block 1020, the Boolean OR operation is performed between the row use vectors associated with column B to generate the entity select vector for the column B portion of the JOIN relation. With the entity select vector for the column A portion of the JOIN relation is a row use vector associated with the value y of column A. Stated differently, the entity select vector for the input column B depicts all of the values of B, which are less than the value y of column A. Thus, the BUILD ROW USE SETS routine (FIG. 22B) will construct the first several rows of the JOIN relation, which will consist of the value y of column A and all of the values of column B, which are less value y. Then, during block 1024, a determination is made of whether any more values of column A need to be evaluated. If there are no more values to be evaluated by the GREATER THAN JOIN operation (FIG. 24) , then processing returns to the calling program at 1026. However, if there are more values to be evaluated by the routine, then processing continues at block 1028. During block 1028, the next value y of column A, which is less than the last value of column A evaluated, is obtained.
F. DISPLAY/RECONSTRUCT For JOIN Operation
Typically, the JOIN operation is performed over two relations and a resultant JOIN relation is generated. Recall that the resultant JOIN relation (FIGS. 19, 20 and 21) was a binary representation. The system can automatically perform the DISPLAY/RECONSTRUCT FOR A JOIN operation routine (FIG. 25A) to generate the particular columns of the JOIN relation for the user to ascertain. The DISPLAY/RECONSTRUCT FOR JOIN operation (FIG.
25A) will be referenced by the PROJECT operation at block 566 (FIG. 17A) , instead of the DISPLAY/RECONSTRUCT routine (FIG. 17B) . Referring now to FIG. 25A, a more detailed discussion for reconstructing and displaying the columns of a JOIN relation is now discussed. More particularly, during block 1102, a row use vector corresponding to the rows of the JOIN relation to be constructed is created. Then, during block 1104, an index vector associated with the JOIN relation is created. The index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the JOIN row use set. Each bit indicates whether the unique value associated with the row use vector exists in- the JOIN column, which is to be displayed. If a binary bit in the index vector is set to "0", the unique value associated with the row use vector does not exist in the JOIN column to be displayed. Whereas, if the binary bit is set to "1" in the index vector, then the unique value associated with the row use vector exists one or more times in a column. At this time, the index vector contains a quantity of binary bits equal to the number of row use vectors of the JOIN row use set. During block 1106, the. row use set for the JOIN column to be displayed is obtained. During block 1108, the first row use vector associated with the JOIN column row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18 (FIG. 1A) , is transferred via bus 48 to the BBVP 14 (FIG. 1A) . Then, during block 1108, the RPU 22 via BBVP 14 performs a Boolean AND operation on a row use vector with the row select vector. As discussed earlier in Part V. (D.), the row select vector is a binary bit vector, and each binary bit corresponds to a row of the column or columns to be displayed by the system. A binary bit set to "1" in the row select vector indicates that the corresponding row needs to be displayed by the system. The result of the AND operation is a binary bit vector Z, which depicts the rows of the column which contain a particular value associated with the current row use vector. The results of the AND operation are sent via bus 48 back to memory 18 (FIG. 1A) for future processing. Then, in block .1110, the RPU 22 determines whether the resultant vector is "0". More particularly, the resultant vector Z contains all binary bits set to "0". If the binary bit vector is "0", then during block 1112, a "0" is placed in the first binary bit position of the index vector to indicate that the unique value associated with the row use vector does not exist in the JOIN column to be displayed. However, if the resultant bit vector Z is not equal to "0", then processing continues to block 1114. The RPU 22 via BBVP 14 sets the binary bit of the index vector, which is associated with the current row use vector, to "1". In block 1116, the resultant bit vector Z is stored in memory 18. The resultant bit vector Z is identified for this particular row use vector presently being processed, and it will later be used in the reconstruction process.
During block 1118, the RPU 22 via BBVP 14 (FIG. 1A) clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z. The purpose of this step is. to "shortcut" the processing of the row use vectors in the row use set. Stated differently, when the JOIN row select vector is cleared, all of the values in the rows to be displayed for the column can be determined. During block 1120, the RPU 22 (FIG. 1A) determines whether the JOIN row select vector has had all its binary bits set to "0". If the JOIN row select vector contains only binary bits set to "0", then processing continues at block 1126. However, if all of the binary bits in the row use vector are not set to "0", then processing continues at block 1122. During block 1122, the RPU 22 (FIG. 1A) gets the next JOIN row use vector in the JOIN row use set currently being processed. During block 1124, the RPU 22 (FIG. 1A) determines whether the end of the JOIN row use vector has been reached. If the end of the JOIN row use vector has been reached, then processing continues at block 1126. However, assuming that the end of the row use vector has not been reached, then processing continues at blocks 1108, 1110, 1114, 1116, 1118, 1120, 1122 and 1124 until all of the JOIN row use vectors of the JOIN row use set have been completely processed.
Assuming that all of the JOIN row use vectors in the JOIN row use set have been processed or that the JOIN row select vector contains only binary bits set to "0", then processing continues at block 1126. During block 1126, the RPU 22 (FIG. 1A) determines whether the present column for display references a value set or a relation. The DISPLAY/RECONSTRUCT routine for the JOIN operation (FIG. 25A) starts with the row use set of the JOIN column and then references the relation corresponding to the row use set in the JOIN relation. Thus, the column to be displayed is a JOIN column, and it does not reference a value set. Therefore, processing continues at block 1112, during which the REFERENCE RELATION routine (FIG. 25B) is called.
During block 1136 of. the REFERENCE RELATION routine (FIG. 25B) , the RPU 22 obtains the entity select vector associated with the JOIN column. The entity select vector for the JOIN column is used as a row select vector for obtaining values from the referenced relation. Processing returns to the DISPLAY/RECONSTRUCT FOR JOIN operation routine (FIG. 25A) at block 1104.
During block 1104, an index vector for the referenced relation is created. More particularly, the index vector contains a number of binary bits is equal to the number of row use vectors of the reference relation. Then, during 1106, the RPU 22 (FIG. 1A) obtains a row use set for a column of the relation to be displayed. The first row use vector associated with the row use set is obtained. During block 1108, the RPU 22 instructs the BBVP 14 to perform a Boolean AND operation on the row use vector obtained in block 1106 with the row select vector obtained during the REFERENCE RELATION routine (FIG. 6B) . The result of the AND operation is a vector Z which displays the rows of the column which contain a particular value associated with the row use vector. Then, during block 1110, the RPU 22 (FIG. 1A) determines whether the resultant vector Z is "0". If the resultant vector Z is "0", then during block 1112, a "0" is set in the first binary bit position of the index vector. Processing continues at block 1122, during which the next row use vector associated with the column in the reference relation is obtained. Returning to block 1110, if the result of the Boolean operation performed in block 1108 is "0", then processing continues at block 1114. During block 1114, the RPU 22 via BBVP 14 (FIG. 1A) sets the binary bit in the index vector, which is associated with the current row use vector to "1". During block 1116, the resultant bit vector Z is stored in memory 18 (FIG. 1A) .
During block 1118, the RPU 22 instructs the BBVP 14
(FIG. 1A) to clear the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z. Again the purpose of this step is to shortcut the processing of the row use vectors of the row use set associated with the referenced relation. During block 1120, the RPU 22 (FIG. 1A) instructs the BBVP 14 to determine whether the row select vector has been completely cleared. Stated differently, whether all the binary bits have been set to "0". The row select vector contains only binary bits set to "0", and processing continues at decision block 1126. Otherwise, assuming that all of the binary bits in the row use vector are not set to "0", then processing continues at block- 1122. During block 1122, the RPU 22 (FIG. 1A) acquires the next row use vector in the row use set of the referenced relation currently being processed. Then, during block 1124, the RPU 22 instructs the BBVP 14 to determine whether the end of the row use set has been reached. If the end of the row use set has been reached, then processing continues at block 1126. Otherwise, assuming that the end of the row. use set has not been reached, then processing continues at block 1108, 1110, 1114, 1116, 1118, 1120, 1122, and 1124 until all the row use vectors of the row use set of the referenced relation have been processed.
Assuming that all of the row use vectors of the row use set have been processed, or in the event that the row select vector corresponding to the referenced relation contains only binary bits set to "0", then processing continues at block 1126. During block 1126, the RPU 22 (FIG. 1A) determines whether the column currently being evaluated references a relation or a value set. The currently column presently being evaluated references a value set, and thus, processing continues at block 1130. During block 1130, the REFERENCE VALUE SET routine (FIG. 25C) is called.
The REFERENCE VALUE SET routine (FIG. 25C) obtains the values from the value set and places the values in the proper rows of the JOIN relation column. More particularly, referring to block 1142, the RPU 22 instructs BBVP 14 (FIG. 1A) to determine the ordinal positions of the binary bits set to "1" in the index vector for the referenced column. Each binary bit set to "1" indicates which row use vectors reference unique values to be displayed in the referenced column. In addition, for each row use vector which has a corresponding binary bit set to "1" in the index vector, the ordinal position of the corresponding binary bit set to "1" in the entity select vector associated with the value set is determined. This process is performed for each of the row use vectors which has a corresponding binary bit set to "l" in the index vector for the referenced column. Then during block 1144, the values associated with the ordinal positions obtained in the block 1142 are retrieved from the referenced value set. During block 1146, the RPU 22 (FIG. 1A) finds the appropriate location in the index vector associated with each of the values retrieved in block 1144. Then, during block 1148, the appropriate resultant Z vector associated with each of the row use vectors is retrieved. The resultant Z vector indicates which rows of the referenced column contain the unique value associated with the row use vector. Then, during block 1150, the RPU 22 (FIG. 1A) determines whether any more values are left for processing. Assuming that there are more values, then processing continues at blocks 1146 and 1148 until all the values from the value set have been placed in the proper rows of the referenced column. Assuming that all of the values have been placed in the proper rows of the column, then processing continues at block 1152. During block 1152, for each row of the referenced column, the corresponding JOIN row use set is determined. More particularly, the corresponding binary bit set to "1" in the index vector for the JOIN column is determined. Then, during block 1153, the appropriate vector Z temporarily stored at an earlier step is obtained. The resultant vector Z indicates the row positions of the JOIN column where the value from the referenced column should be placed. During block 1155, the RPU 22 determines whether any more values from the referenced column need to be processed. If more values need to be processed, then processing continues at blocks 1152, 1153 and 1155 until all the values of the referenced column have been processed. Assuming that all the values have been processed, then processing returns to the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) . During block.1132, processing returns to the calling program. 1. Example of the DISPLAY/RECONSTRUCT Operation For A JOIN Relation
Referring to FIGS. 20, 25A, 25B, 25C, 26A, 26B, 26C, 26D and 26E, a detailed example for performing the DISPLAY/RECONSTRUCT program on the JOIN column for Suppliers IDs in the JOIN relation (604, FIG. 29) is now discussed. More particularly, it is assumed that only a binary representation of the JOIN column for Suppliers IDs exists in the RDMS 10. The binary representation of the JOIN relation is a result of the JOIN operation, or it may have been previously stored after processing of the BBVP 14 (FIG. 1A) . In either case, the binary representation of the JOIN column exists in memory 18 and now the user or applications program needs to display the actual values of the JOIN column. Although, this example is for reconstructing and displaying only one column of the JOIN relation (604, FIG. 20), the remaining columns of the JOIN relation could be reconstructed by sequentially performing the PROJECT operation. Recall, that the PROJECT routine (FIG. 17A) calls the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 20A, 20B and 20C) . Referring to FIGS. 21A, 21B, 21C, 21D and 21E, a results table for depicting the results of the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 25A, 25B and 25C) for reconstructing and supplying the Suppliers ID column of the JOIN relation (604, 626 of FIG. 20) is shown. Each row of the results table depicts a result of the routines shown in FIGS. 25A, 25B and 25C. The results table is separated into 12 columns from left to right. The first column of the results table shows the row select vector associated with the JOIN column to be displayed. The second column shows the row use set associated with the JOIN column to be displayed. The third column is the row use vector of the row use set which is currently being processed, and the fourth column is the index vector associated with the row use set corresponding to the JOIN column. The fifth column is a resultant binary bit vector Z. The resultant binary bit vector Z is determined by ending a JOIN relation row use vector and the row select vector. The sixth column depicts the entity select vector associated with the referenced relation. The seventh column is an index vector associated with the row use set corresponding to the referenced relational column. The eighth column depicts the row use set corresponding to the referenced relational column. The ninth column is for the index vector associated with the row use set for the referenced relational column. The tenth column is for the resultant bit vector Z. The resultant bit vector Z is determined by ending a row use vector of the row use set with the entity select vector. The eleventh column is a depiction of the creation of the referenced relational column. The last column depicts the JOIN relational column, which is to be displayed.
Referring to FIG. 25B, the referenced relation obtains the entity select vector (600, FIG. 20) associated with the JOIN column. The entity select vector is used as a row select vector for specifying the rows of the referenced relation which have values in the JOIN column (1184, FIG. 26C) . Then, during block 1138, processing returns to the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) at block 1104.
Block 1104 creates an index vector for the row use sets corresponding to the Supplier ID column (1181, FIG. 20) . During block 1106, the row use set associated with the Supplier ID column is retrieved (1186, FIG. 26C) . Then, during block 1108, the first binary bit vector of the row use set (1186, FIG. 26C) is ANDed with the entity select vector to form a resultant vector Z (1192, FIG. 26C) . During block 1110, the resultant vector is evaluated to determine if all the binary bits are set to "0". One bit in the resultant vector Z is set to "1", and thus, processing continues at block 1114. During block 1114, the binary bit in the index vector associated with the current row use vector is set to "1" (1194, FIG. 26C) . Then, during block 1116, the resultant vector Z associated with the current row use vector is stored in memory 18 (FIG. 1A) . During block 1118, the binary bit set to "1" in the row select vector that matched the binary bit set to "1" in the resultant vector Z are cleared (1196, FIG. 26D) . Processing continues at block 1120, during which the row select vector is evaluated to determine if all the binary bits are set to "0". All the binary bits of the row select vector are not "0" (1196, FIG. 26D) , and thus, processing continues at block 1122. During block 1122, the next row use vector of the row set for the Supplier ID column is obtained (1198, FIG. 26D) . During block 1124, the RPU 22 (FIG. 1A) determines if the end of the row use set has been reached. The end of the row use set has not been reached, and thus, processing continues at block 1108.
VT. ENTITY USE Vectors
To this point, the use of binary bit vectors in the setting of a relational database has been extensively discussed. The purpose of the binary bit vectors as described in the Binary Bit Vector Technology (Part III) section of this specification is for characterizing a subset of an ordered set. In contrast, the purpose of an entity use vector is for defining a relationship between the elements of two sets.
More particularly, referring to FIG. 27, sets S 1234 and T 1236 define the set of all ordered pairs (S, T) such that s is an element of S and t is an element of T. A "mapping" exists from S 1234. to T 1236. More than one element of set S 1234 may map onto unique elements of T 1236; however, there will never exist more than one element of T into which maps the elements of S. In mathematical terminology, the set S 1234 is the "domain", and the set T 1236 is the "range". FIG. 27 shows the many-to-one mapping. Specifically, si and s2 map to the value tl. The purpose of the entity use vector is to facilitate this relationship between ordinal positions of elements in one set to another set. Stated differently, the entity use vector is a vector whose elements are values expressed in ordinal positions of elements within another set. In the context of the current invention, the entity use vector constitutes a function (or a mapping) between either a value in a value set or a relational row in another or the same relation. The mapping may apply specifically to a relational column, whether it be a single physical column or a physical column consisting of many conceptual columns. The implied ordinal position within the entity use vector corresponds to a row number in the "domain". The associated value of the element is the entity location (ordinal position of the value in the value set or row position of the relational row being referenced) of the "range".
Referring to FIGS. 4, 20, 28, and 29, a more detailed discussion regarding the entity use vector and its use in the relational database is now discussed. Referring to FIG. 4, a binary representation of the
Supplier relation of the Supplier and Parts relational database is shown. In this database, the mapping between the domains and the columns was achieved by entity select vectors and row use vector sets. The added feature, entity use vectors, have been added to the Supplier relation as depicted in FIG. 4, as shown in FIG. 28. Note that the entity use vectors 1238, 1240, 1242 and 1244 correspond to the columns 168, 170, 172 and 174. Recall that the columns do not actually exist in the RDMS 10; instead, the binary representations of the columns 260, 262, 264 and 266, respectively, are stored in memory 18 area of the RDMS. If the entity use vectors are employed into the relational database setting, an efficiency for determining the values associated with each row of a column in the relation is created. For example, referring to the entity use vector 1244, which corresponds to the city column 174 of the Supplier relation, note that in the first ordinal position 1237 of the entity use vector, the value 5 resides. Essentially, the value 5 maps the value in the first row of the city column 174 to the fifth row of the domain 166. Referring to the dotted line 1249, the mapping of the first row of the city column 174 to the fifth row of the value set is shown. Likewise, the fourth row of the city column 174 is also mapped -to the fifth row position of the value set 166. Thus, the first row and the fourth row of the city column 174, both map to the same row of the value set 166. In this way, the many-to-one mapping can be achieved via the entity use vector. Likewise, the second row and the third row of the city column 174 map to the same row of the value set 166 may be implied mapping at 1247. Lastly, the fifth row of the city column 174 maps back to the first row of the value set 166. With all the elements in the entity use vector, the city column 174 can be easily reconstructed with the values of the value set 166. More particularly, the first row of the city
___ column 174 corresponds to the value 5 in the entity use vector at 1237 which corresponds to the value London. The second row of the city column 174 corresponds to the entity use vector value 8 which corresponds to the value Paris. Likewise, the third row of the city column 174 corresponds to the value 8 of the entity use vector and to the value Paris. The fourth row corresponds to the value 5 in the entity use vector and to the value London. Lastly, the fifth row of the city column 174 corresponds to value 1 in the entity use vector which corresponds to the value Athens. The entity use vector will be used specifically during the DISPLAY/RECONSTRUCT operations for the RDMS 10. The entity use vector facilitates the steps for determining the values in the particular columns of a relation and hence facilitates the DISPLAY/RECONSTRUCT process.
Referring to FIG. 29, a depiction of how entity use vectors may be used in the setting of a JOIN relation is shown. More particularly, the entity use vectors 1238, 1240, 1242, 1244, and 1246 have been added to the binary representation of the JOIN relation of FIG. 20. The entity use vectors 1238, 1240, 1242 and 1244 are identical to the entity use vectors of the Supplier relation in FIG. 28. The new entity use vector of FIG. 29 is at 1246. This entity use vector maps the rows of the JOIN columns to the rows of the input relations. More particularly, the values of the entity use vector correspond to the row numbers of the referenced relation. In particular, the first three rows of the entity use vector, 1250, 1251, 1252, respectively, correspond to row 1 of the Supplier relation. Thus, the first three rows of the JOIN relation will contain values from the first row of the city column. The city column has associated with it an entity use vector 1244 which has a value 5 in the first row corresponding to the first row of the city column. The value 5 maps back to the fifth row of the value set 166. The fifth row of the value set 166 contains the value London. Therefore, the first three rows of the entity use vector, 1250, 1251, and 1252, correspond to the value London as shown by the physical representation of the S.CITY column of the JOIN relation. The entity use vectors associated with the JOIN relation provide a mapping between the columns of the Supplier relation and the columns of the JOIN relation. As stated earlier, the columns are mapped via the entity use vectors to the value sets of the relational database. Once again, the entity use vector will be used during the DISPLAY/RECONSTRUCT process of the JOIN relation. The entity use vector will substantially improve the processing time for generating the actual representation of the JOIN relation by not having to perform the multitude of steps associated with the DISPLAY/RECONSTRUCT operations as discussed earlier. Again, a more detailed discussion on the DISPLAY/RECONSTRUCT will be presented shortly.
Referring to FIG. 1A, typically, the entity use vectors will be stored in byte or multi-byte form in memory 18. When a binary representation of a JOIN relation, or any other type of relation for that matter, needs to be displayed to the user, the RPU 22 causes the appropriate entity use vectors to move from the memory 18 via bus 46 to MVP 15. There, the RPU 22 evaluates the entity use vectors via the DISPLAY/RECONSTRUCT routines (FIGS. 30 and 31). When the relation has been reconstructed and it is ready for display to the external device 12 for storage or to display 3 (FIG. 1) for the user to ascertain. Referring now to FIG. 7, the entity use vectors are constructed during the load operation performed by the BINARY REPRESENTATION routine at block 356. More particularly, when the creation of the relational database has been completed, the system loads the file representation of the relations into external device 12, where they reside until summoned by the RDMS system 10. A particular column is retrieved by referring to its system identifier as discussed earlier and in more detail in part VII. The first value of the particular column is brought to the RPU 22; as discussed earlier, a row use vector associated with the value is built by the BBVP 14. Additionally, a bit is set to "1" in the first position of the row use vector to indicate that the value occupies the first row of the first column of the relation. At the same time the first value is brought to the RPU 22, it is also evaluated by the MVP 15. When the first value is evaluated by the MVP 15, an entity use vector associated with the column is built. Specifically, the first position of the entity use vector is assigned- a numerical value to indicate the row of the value set corresponding to the input value. As each value is input from the column, the MVP 15 generates a corresponding value associated with the row position in the value set which contains the particular value. Once the column has been completed and the next column is entered from the external device to the MVP 15, a new entity use vector is created. This process continues until all of the columns input from the external device have associated entity use vectors. As the entity use vectors are constructed, they are sent via bus 46 to memory 18, where they reside until they are called for processing via the DISPLAY/RECONSTRUCT routine at FIGS. 25A, 25B and 25C.
Referring to FIG. 30A, a flow diagram for the DISPLAY/RECONSTRUCT - WITH ENTITY USE VECTORS is shown. The purpose of this routine is to efficiently reconstruct and display the values of a specified column or columns. More particularly, referring to block 1270, the RPU 22 (FIG. 1A) causes the entity use vector associated with the particular column to be displayed to move from memory 18 via bus 46 to MVP 15. Then during block 1272, the RPU 22 determines whether the entity use vector associated with the column references a value set or another relation. Assuming that the entity use vector references another relation, then processing continues at block 1273, during which the REFERENCE RELATION (FIG. 3OB) is called. However, if the entity use vector associated with the column to be displayed references a value set, then the REFERENCE VALUE SET routine (FIG. 3OB) is called. The REFERENCE RELATION routine (FIG. 30C) is called when the column to be displayed is, for example, a column of the JOIN relation. In contrast, the REFERENCE VALUE SET routine (FIG. 30C) is called when the column to be displayed is a column of a base relation (one which references one or more value sets directly) . In either case, when the column has been reconstructed and is ready for display, processing returns to the calling program at block 1276.
Referring to _ FIG. 3OB, a flow diagram for the REFERENCE RELATION routine is shown. Specifically, during block 1280, the entity select vector associated with the column is obtained. This entity select vector is used as a row select vector for obtaining the values from a REFERENCE RELATION. For example, the entity select vector 600 (FIG. 29) which is associated with the JOIN column depicting the row use set of the JOIN relation 604 (FIG. 29) could be such a REFERENCE RELATION. Processing continues at block 1282, during which processing returns to the DISPLAY/RECONSTRUCT- WITH ENTITY USE VECTORS routine (FIG. 29A) .
Referring to FIG. 30C, a flow diagram for the REFERENCE VALUE SET routine is now discussed. More particularly, during block 1286, the values from the value set ' are obtained corresponding to the ordinal positions expressed by the entity use vector elements. Then, during block 1288, a value is placed in the appropriate row of the column. The ordinal position of the value in the column corresponds to the ordinal position of the corresponding element in the entity use vector. During block 1290, the next value of the value set is obtained and in block 1292, the RPU 22 (FIG. 1A) determines whether all the values have been processed. Assuming that not all of the values have been processed, then processing continues at block 1288, 1290, 1292, until all of the values are placed in the appropriate row positions of the column. Once all the values have been placed in the column, processing continues at block 1294. During block 1294, it is determined whether there is another column to go through. More particularly, whether there is a JOIN column which needs to be reconstructed, for example, the CITY column 638 (FIG. 29) of the JOIN relation. Assuming that a JOIN relation needs to be reconstructed, then processing continues at blocks 1288, 1290, 1292, 1294 until all the values associated with the JOIN column have been placed into the proper rows of the column. Assuming 'that there are no more levels of columns to reconstruct, then processing continues at block 1296-, during which processing returns to the DISPLAY/RECONSTRUCT - WITH ENTITY USE VECTORS routine (FIG. 30A) at block. 1276. During block 1276, processing returns to the calling program and the display of the column has been completed.
VII. Database Identification
Referring to FIGS. 31, 32, 33, 34, 35, 36, 37A,
37B, 37C and 37D, a detailed description of the structure for maintaining identification of the elements of the relational database is now discussed. Referring to FIG. 31, a representation of a relational database is shown. The top portion of FIG. 31 depicts seven domains of unique values referenced by two relations (Supplier and Parts) . The third relation, labeled JOIN relation, references both the Supplier and Parts relations as shown. All three relations depicted in FIG. 31 were thoroughly discussed in the previous sections of the specification. Specifically, the Supplier relation is detailed in FIG. 4, the Parts relation is detailed in FIG. 5, and the JOIN relation is detailed in FIG. 19. The three relations are assumed to be in their binary representations and stored in memory 18 (FIG. 1A) . The RDMS 10 maintains track of all of the columns of the relational database (FIG. 31) via an identification scheme, called the System relation. The System relation logically connects all of the necessary information for describing a column of any of the relations.
Referring to FIG. 32, the System relation for the Supplier/Parts relational database (FIG. 31) is shown. The System relation is broken up into four columns. From left to right, the column identifier (CID) , the relation identifier (RID) , the attribute identifier (AID) , and the domain identifier (DID) . The CID is a number which distinguishes a column from all of the other columns of the relational database. The RID identifies the relation in which a column logically resides. For example, column 80 (FIG. 2) resides in relation 63 (FIG. 2) . The AID defines the order or position in which the column resides in a particular relation. For example, column 80 (FIG. 2) is the first column of the relation 63 (FIG. 2) , and thus, has an AID number of 1. The DID identifies the particular domain associated with the column. All four identifiers together characterize each column of the relational database. Stated differently, all four identifiers characterize the relationship of a column in the relational database.
Referring to FIG. 31, each element of the Supplier/Parts relational database is identified by a CID number. Specifically, the Supplier/Identifier domain is characterized by a CID number "1" and an RID number "1". Domains do not have AID or DID numbers because the relation only has a single column, and it is itself a domain. Normally, a single column in the relational database would have associated with it an AID number of 1; however, in the preferred embodiment, the domain is always considered to have an AID of 0. Likewise, the Parts Identifier domain is characterized by a CID number "2" and an RID number "2". The domains are more fully depicted in FIG. 33 with their associated CID and RID numbers. The CID and RID numbers for the domains are shown in the first seven rows of the system relation (FIG. 32) . With the CID and the RID number, any of the domains for the Supplier/Parts relation (FIG. 31) can be referenced and obtained.
Now, referring to the Supplier relation of FIG. 31, the CID, RID, AID and DID numbers associated for each of the columns and the relation is now discussed. The Supplier relation is identified by a special row of the System relation, which generates a virtual column (as- seen by the user) for identifying the Supplier relation.
Each relation of the relational databases has a virtual column. The user never sees this column. This column
_. has a CID number of 8 and the Supplier relation has an RID number of 8. The virtual column contains row numbers associated with the rows of the relation. This column by convention does not have associated with it an AID or a DID number. Thus, in the eighth row of the system relation (FIG. 32) , the AID and DID numbers are set to 0. The first column (which the user sees) of the Supplier relation is the Supplier ID column and it has associated with it a CID number 9. The Supplier ID column is part of the Supplier relation, and thus, its RID number is 8. Likewise, the Person Name column of the Supplier relation has a CID number 10 and it, too, is part of the Supplier relation, and thus, has an RID number of 8. The Status column of the Supplier relation has a CID number 11 and it is part of the Supplier relation, and thus, has an RID number 8. The last column of the Supplier relation is the City column and it has a CID number of 12 and an RID number 8.
Note in FIG. 32 that the Supplier ID column has an AID number of 1, which identifies the Supplier ID column as the first column of the Supplier relation. The Supplier ID column is associated 'with the Supplier Identifier domain, and thus, the DID number associated with the Supplier column is 1. Recall that the CID number for a Supplier Identifier domain is equal to 1, and this corresponds to the DID number for the Supplier ID column. Likewise, the Person Name column of the Supplier relation has an AID number of 2 and a DID number of 3. More particularly, the AID number 2 means that the Supplier Name column is the second column of the Supplier Relation and the DID number 3 means that the Person Names column is associated with the Person Names domain. The Status column of the Supplier Relation has an AID number 3 and a DID number 7. The AID number 3 means that the Status column is the third column of the Supplier relation and the DID number 7 means that the Numbers domain is associated with the Status column. Lastly, the City column has AID number 4 and DID number 5 associated with it. The AID number 4 means that City column is the fourth column of the system relation and the DID number 5 means that the City domain is associated with the City column of the Supplier relation.
The Parts relation also has a virtual column which is identified by the CID number 13. Additionally, the JOIN relation is identified by a virtual column having a CID number 19. The columns of the Parts relation are identified by the CID, RID, AID and DID numbers in the same way the columns are identified for the Supplier relation. The CID number 20 corresponds to a JOIN column which references the Supplier relation having RID 8 (shown by the dotted lines) . More particularly, the DID number associated with the JOIN column (CID 20) is 8, which identifies the Supplier relation (RID 8) as the domain to the JOIN column. The AID number associated with the JOIN column having CID 20 is 1, which means that this ' column is the first column of the JOIN relation. The second column of the JOIN relation having CID number 21 references the Parts relation. More particularly, the DID number associated with the JOIN column having CID 21 is 14, which is the RID number for the Parts relation.
Referring to FIGS. 34, 35, 36, 37A, 37B, 37C and 37D, four constructs for developing each column referenced by the CID numbers in the system relation (FIG. 32) are shown. Specifically, a set containing domains and entity select vectors (FIG. 34) , an entity use set (FIG. 35) , a row select set (FIG. 36) , and a row use vector set (FIGS. 37A, 37B, 37C and ' 37D) are all used to build a particular column of the relational database. Essentially, each construct is a "set of sets".
Referring to FIG. 34, -the set contains 21 elements, and each number below each element corresponds to a different CID number or row of the System relation (FIG. 32) . The first element, 1, of the set (domain for supplier ID) corresponds to the first row of the System relation (FIG. 32) in which the CID number is 1, RID number is 1, AID number is 0, DID number is 0. The first element of this set characterizes all of the elements of the value set for Supplier IDs in a sorted fashion.
The domains of the relational database depicted in FIG. 34 are sorted by a lexical ordering scheme. However, any ordering scheme may be used, such as, order by entry, or by size (number of letter or number or both) etc. FIG. 35 shows the set of entity use vectors for mapping each value in each column of each relation
(FIG. 31) back to the ordinal position of the associated unique value in the domain. Elements 1 through 7 (FIG. 34) of the entity select set are lexically ordered which do not require entity use vectors for mapping because the domains are ordered to begin with. If the values in the domains (elements 1 through 7, FIG. 25) are in order, entity use vectors can be used for mapping the values in the column in the relations (elements 9-12, 14-18, 20 and 21) .
The column having CID 9, RID 8, AID 1 and DID 1 is the Supplier ID column of the Supplier relation. This column is characterized by the entity select set (FIG. 34) , entity use set' (FIG. 35) , row select set (FIG. 36) and row use set (FIGS. 37A, 37B, 37C and 37D) . More particularly, referring to FIG. 34, the entity select vector associated with the Supplier ID column is referenced by the vector element 9. The vector element 9 corresponds to the entity select vector "1010111000", which characterizes a subset of the Supplier identifier domain comprising those values which appear one or more times in the Supplier ID column.
Referring to FIG. 35, the entity use vector associated with the Supplier ID column is referenced by the ninth element of the set. The entity use vector for the ninth element characterizes a mapping between the values of the Supplier Identifier domain and the Supplier ID column. The values of the column are shown in the order they occur in the domain and thus, the ordering shown in the entity use vector represents a direct ordinal numbering scheme. The entity use vector represents a mapping between the row positions of the column and the values of the Supplier Identifier domain. Referring to FIG. 36, the row select vector associated with the Supplier ID column is referenced by the ninth element of the vector. The ninth element of the vector corresponds to the row select vector "11111" which means that all of the rows of the column are valid. Referring to FIGS. 37A and 37C, the row use set associated with the Supplier ID column is referenced by
•the ninth element of the set. The row use set corresponding to the ninth element also represents a mapping from a unique value from the Supplier's domain to a set of rows of the Supplier ID column. The set of rows for the Supplier ID column is characterized by the row use vector.
The row use set contains a series of row use vectors for providing a mapping between the sorted value set and the unsorted value set. The function performed by the row use vector associated with the first element of the vector (FIG. 37A) is the same as the entity use vector associated with the first element of the vector in the entity use set (FIG. 35) . However, the entity use vector is more efficient than the row use vector for mapping an unsorted value set to a sorted value set in the row use set. Additionally, the row use set is more efficient than the entity use vector for mapping the sorted value set to the unsorted value set. The relational database system could perform with either one, the row use set or the entity use set; however, for efficiency, both are maintained by the preferred embodiment of the invention. The ninth elements of the row use set and the entity use set are shown in (FIG. 37C) and (FIG. 35) , respectively.
The column at rows 8, 13 and 19 of the System relation (FIG. 32) is also characterized by the constructs depicted in FIGS. 34, 35, 36, 37A, 37B, 37C and 37D. More particularly, referring to FIG. 34, the eighth element of the set corresponds to the entity select vector for the column having CID number 8. The entity select vector referenced by the eighth element of the vector has five binary bits set to "1", which indicates that the relation contains five valid rows. Referring now to FIG. 35, the entity use vector associated with the column having CID 8 is referenced by the eighth element of the set. The entity use vector associated with the eighth element is empty because a mapping would not add any useful information for characterizing the column. Referring to FIG. 36, the eighth element of the row select set is depicted for representing the valid rows of the column. Note that there is the binary bit vector having five binary bits set to "1" corresponding to the five binary bits set to "1" in the entity select vector (FIG. 34) . The row use vector associated with the column is referenced by the eighth element of the set as shown in FIG. 37B. The row use vector is empty because a mapping would not add any further useful information for characterizing the column.
A. Performing the Database Identification Scheme
The row use set is more efficient for mapping a unique value from a value set to one or more rows of a column than the entity use vector and vice versa. Thus, the system maintains both constructs to create a maximum efficiency for mapping rows of a column to and from a value set.
In Part IV entitled, "Binary Representation of a Relational Database," a detailed description of the BINARY REPRESENTATION routine (FIG. 7) was discussed for generating the binary representation of the relational database and also for creating the necessary identifiers in each of the relations and their columns in the relational database. The process for creating the identification scheme is now more thoroughly discussed. Specifically, referring to FIG. 38A, the DATABASE IDENTIFICATION routine for generating a System relation for identifying and characterizing each column in a relation of the relational database is now discussed. Specifically, during block 1286, each domain necessary for specifying unique values in the relations of the relational database is identified. Commands are provided to be interpreted by the command interpreter 28 (FIG. 1A) , for creating domain identifiers in memory 18 (FIG. 1A) of the RDMS 10 (FIG. 1A) . For example, the command interpreter might read the following instructions into the system:
A) Create domain Supplier Identifiers;
B) Create domain Parts Identifier;
C) Create domain Person Names;
D) Create domain Part Names;
E) Create domain City; F) Create domain Colors;
G) Create domain Number;
As the command interpreter 28 (FIG. 1A) , reads the first instruction above, the RPU 22 creates an empty row in the system relation (FIG. 32) for the domain identifiers during block 1288. Then, during block 1290, .the RPU 22 (FIG. 1A) sets the CID number equal to the row number of the system relation. For example, the first domain identified by the RPU 22 (FIG. 1A) would have a CID number of 1 because it occupies the first row of the system relation (FIG. 32). During block 1292, the RPU 22 sets the RID number equal to the CID number, and during block 1294, the AID number and DID number are set equal to 0. In block 1296, the RPU 22 generates an element associated with the entity select set (FIG. 34) , the entity use set (FIG. 35) , the row select set (FIG. 36) , and the row use set (FIG. 37A, 37B, 37C and 37D) . During a later step in this routine, the necessary information for characterizing each of the columns is loaded during block 1332 (FIG. 38D) .
During block 1298, the RPU 22 determines whether any more domains need to be identified by the system. In the example above, seven different "create domain" commands have been specified for identifying each of the seven domains in the relational database. Thus, processing will return to blocks 1286, 1288, 1290, 1292, 1294, 1296 and 1298, during which the RPU 22 will identify each of the domains of the relational database. Assuming that all of the domains have been identified with their respective CID, RID, AID, DID and element locations in the vectors, processing will continue at block 1300.
Now that all of the domains to be referenced by the relational database have been specified, the application, during block 1300, identifies each table of the relational database. Specifically, the application provides the system with the following commands: CREATE TABLE - SUPPLIER
(Supplier ID; Person Name; Status; City);
CREATE TABLE - PARTS (Part ID; Part Name; Color; Weight; City) ;
CREATE TABLE - JOIN (S. City; S. ID#; Status; P. ID#) ;
The RDMS enters another row into the system relation (FIG. 32) for identifying the relation with a column. During block 1304, the CID number is set equal to the row number of the system relation, and during block 1306, the AID number and a DID number are set equal to 0. Then, during block 1308, a new element is entered to each of the sets corresponding to the entity select set (FIG. 34) , the entity use set (FIG. 35) , the row select set (FIG. 36) , and the row use set vector (FIG. 37) . Referring to block 1310 (FIG. 38C) , the columns of the database have been identified. During block 1312, the RPU 22 sets the variable CURRENT AID NUMBER equal to 1. Then during block 1314, the RPU 22 (FIG. 1A) generates a new row in the system relation (FIG. 32) . During block 131.6, the CID number is set equal to the row of the system relation, and during block 1318, the RID number is set equal to the RID number of the column for identifying the relation corresponding to the current column. The AID number is set equal to the "current AID number" and during block 1322, the "current AID number" is incremented by 1. In block 1324, the DID number for the current column is set equal to the CID number of the remaining reference by the current column. Processing continues at block 1326 (FIG. 39D) where new elements are added to the entity select set (FIG. 34) , the entity use set (FIG. 35) , the row select set (FIG. 36) , and the row use set vector (FIG. 37A, 37B, 37C and 37D) . Then, during block 1328, the RPU 22 (FIG. 1A) determines whether any more columns need to be created for the particular relation specified during block 1300. If more columns need to be identified by the RPU 22 (FIG. 1A) , then processing continues at blocks 1310, 1312, 1314, 1316, 1318, 1320, 1322, 1324, 1326 and 1328 until all the columns of the specified relation have been identified. Assuming that all the columns of the specified relation have been identified, then processing continues at block 1330.
During block 1330, the RPU 22 (FIG. 1A) determines whether any more relations need to be identified. Assuming that more relations need to be identified, then processing returns to block 1300, where the next relation to be created by the next CREATE TABLE instruction, listed above, is specified. Blocks 1300 through 1330 are performed until all of the relations in the relational database have been identified and the columns associated with each of the tables also have been identified. Assuming that all the tables and their related columns have been identified, then processing continues at block 1332. During block 1332, the RPU 22 instructs the file associated with each table of the relational database to be transferred from external device 12 to the RDMS 10 via 30 to the RPU 22. The RPU 22 instructs the BBVP 14 to generate the entity select vector (FIG. 34) , the row select vector (FIG. 36) and the row use set (FIG. 37A) associated with each column. Simultaneously, the RPU 22 instructs the MVP 15 to generate the entity use vector (FIG. 35) associated with each of the columns. The entity select vector, entity use vector, row select vector, and row use set are retrieved by the RPU 22 via the system identifiers (i.e. CID, DID, AID, RID identifiers). Then, during block 1334, the RPU 22 determines whether any more files associated with the relations need to be loaded into the RDMS 10. If more files need to be loaded for another table, then processing returns to block 1332; otherwise, processing continues at block 1336. Assuming that all the files have been loaded into the RDMS 10, then processing returns to the calling program at block 1336. At this time, all of the columns and the relations of the relational database have been identified and all of the characterizing vectors associated with each of the columns are also organized.
The invention has been described in an exemplary and preferred embodiment, but it is not limited thereto. Those skilled in the art will recognize a number of additional modifications and improvements which can be made to the invention without departure from its essential sphere and scope. For example, a number of different software and hardware embodiments and any number of different software languages and hardware configurations would be suitable for implementing the disclosed invention.

Claims

WHAT IS CLAIMED IS:
1. A method, utilizing a computer, for creating a relation in a relational database, said relation comprising one or more columns and rows, each said column having one or more values, each said value of each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, each said value of each said row being from a different column, and one or more of said values of each said row having one or more characteristics, said method comprising the steps of: for each said characteristic, forming a set of a plurality of unique values; for each said set, forming one or more subsets of said set, each said subset comprising one or more said unique values of said set; and forming said relation, said step comprising the step, for each said subset, forming one of said columns, said column comprising one or more of each said unique value of said subset and each said unique value of said column corresponding to one or more of said rows of said relation.
2. The method of claim 1 wherein said step of forming said set of said unique values further comprises forming said unique values , in a desired order of occurrence.
3. The method of claim 2 wherein said step of forming said subset of said set further comprises the step of forming a vector of binary bits, each said binary bit of said binary bit vector having an order of occurrence corresponding to said order of occurrence of said unique values in said set, and each said binary bit representing the presence or absence of each said unique value in said subset.
4. The method of claim 1 wherein said step of forming said relation further comprises forming a binary representation of said relation, said step comprises, for each said subset, forming a binary bit vector corresponding to each said unique value of said subset, each said binary bit of said binary bit vector corresponding to one of said rows of one of said relation, and each said binary bit representing the presence or absence of one of said unique values in one of said rows of one of said columns of said relation.
5. The method of claim 4 wherein said step of forming said binary representation of said relation further comprises the step of forming said bits of said binary bit vector in an order of occurrence corresponding to the order of occurrence of said rows in said column.
6. An apparatus, utilizing a computer, for creating a relation in a relational database, said relation comprising one or more columns and rows, each said column having one or more values, each said value of. each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, each said value of each said row being from a different column, and 'one or more of said values of each said row having one or more characteristics, said apparatus comprising: for each said characteristic, means for forming a set of a plurality of unique values; for each said set, means for forming one or more subsets of said set, each said subset comprising one or more of said unique values of said set; and means for forming said relation, said means comprising, for each said subset, means for forming one of said columns of said relation, said column comprising one or more of each said unique value of said subset and each said unique value of said column corresponding to one or more of said rows of said relation.
7 . A method, utilizing a computer, for creating a relational database, said relational database comprising at least one relation, said at least one relation comprising one or more columns and rows, each said column having one or more values, each said value of each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, each said value of each said row being from a different column, and said one or more values of each said row having one or more characteristics, said method comprising the steps of: for each said characteristic of said relational database, forming a set of a plurality of unique values; for each said set, separately .forming one or more subset representations of each said set, each said subset comprising one or more said unique values of said set; and forming said at least one relation of said relational database, said step comprising the step of forming said column of said at least one relation, said column comprising one or more of each said unique value of one of said subsets and each said unique value of said column corresponding to one or more of said rows of said at least one relation.
8. The method of claim 7 wherein said step of forming said at least one relation further comprises forming a binary representation of said at least one relation, said step further comprising, for each said subset, the step of forming a vector of binary bits cor¬ responding to each said unique value of said subset, each said binary bit of said binary bit vector corresponding to one of said rows of one of said at least one relation, and each said binary bit representing the presence or absence of one of said unique values in one of said rows of one of said columns of said at least one relation.
9. A method, utilizing a computer, for efficiently performing an operation on one or more relations of a relational database to determine a binary representation of a resultant relation, each said relation of said relational database comprising one or more columns and rows, each said column having one or more values, each said value of each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, each said value of each said row being from a different column, and said one or more values of each said row having one or more characteristics, said method comprising the steps of: forming a binary representation of each said relation said step comprising the step, ' for each said column of each said relation, forming a vector of binary bits corresponding to each unique value in said column, each ' said binary bit of said binary bit vector corresponding to one of said rows of said relation, and each said binary bit representing the presence or absence of said unique value in said row of said column; and performing a relational operation on one or more of said binary representations of said relations to determine a binary representation of said resultant relation. *
10. The method of claim 9 wherein said step for performing said relational operation comprises determining which one or more rows of said binary representation of said relation correspond to a selected unique value, said selected unique value in one of said columns of said relation, said step comprising the step of selecting said binary bit vector corresponding to said selected unique value, said binary bit vector being said binary representation of said resultant relation and indicating said one or more rows of said column which contain said selected unique value.
11. The method of claim 9 wherein said step for performing said relational operation comprises determining which one or more rows of said binary representation of said relation correspond to more than one of selected unique values, said selected unique values in more than one column of said relation, said step comprising the steps: for each said column, selecting said binary bit vectors corresponding to said selected unique values in said column, each said binary bit vector indicating said one or more rows of said binary represented relation which contain said selected unique values; operative when said column contains more than one selected unique values performing a Boolean OR operation on said binary bit vectors, corresponding to said column, to determine a SELECT binary bit vector indicating said one or more rows of said relation which contain said selected unique values; performing an operation on said SELECT binary bit vectors to determine said resultant relation, said resultant relation being a binary bit vector indicating said one or more rows of said relation which contain said selected unique values from different columns of said relation.
12. A method, utilizing a computer, for creating a binary representation of a relation in a relational database, said relation comprising one or more columns and rows, each said column having one or more values, each said value of each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, each said value of each said row being from a different column, and said one or more values of each said row having one or more characteristics, said method comprising the steps of: for each said characteristic, forming, in a desired order of occurrence, a set of a plurality of unique values; for each said set, forming one or more representative subsets of said set, each said subset comprising one or more of said unique values of said set, said step comprising the step of forming a vector of binary bits corresponding to each said subset, said binary bits of said binary bit vector having an order of occurrence corresponding to said order of occurrence of each said unique value in said set, and each said binary bit representing the presence or absence of each said unique value in said subset; forming said binary representation of said relation, said step comprising the step, for each said subset representation, forming a vector of binary bits corresponding to each said subset value, each said binary bit of said binary bit vector corresponding to one of said rows of one of said columns, and each said binary bit representing the presence or absence of each said unique value in one of the rows of said column, said bits of said binary bit vector being formed in a predetermined order of occurrence corresponding to the order of occurrence of said rows in said column.
13. The method of claim 12 further comprising the step of adding a unique value to said subset representation of said set, said step comprising the step of setting said binary bit, corresponding to the occurrence of said additional unique value in said set, to indicate the presence of said additional unique value in said subset; and adding said binary bit to said binary bit vector to indicate the occurrence of said additional unique value in said subset.
14. The method of claim 12 further comprising the step of constructing said relation from said binary represented relation, said step comprising the steps of: for each said binary represented column of said relation, referencing'said corresponding binary bit vectors, said step including the steps of identifying each said corresponding binary bit of said subset, and identifying each said unique value of said set corresponding to each said binary bit vector, and displaying each occurrence of each said unique value referenced by each said binary bit vector in said binary represented relation.
15. The method of claim 12 further comprising the step of performing a relational operation on said one or more relations to determine said resultant relation, said step for performing said relational operation comprising determining which one or more rows of said binary representation of said relation correspond to a selected unique value, said selected unique value in one of said columns of said relation, said step comprising the step of selecting said binary bit vector corresponding to said selected unique value, said binary bit vector being said binary representation of said resultant relation and indicating said one or more rows of said column which contain said selected unique value.
16. A method, utilizing a computer, for creating a binary representation of a relational database, said relational database comprising at least one relation, said at least one relation of said relational database comprising one or more columns and rows, each said column having one or more values, each said value of each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, each said value of each said row being from a different column, said one or more values of each said row having one or more characteristics', said method comprising the steps of: for each said characteristic of said relational database, forming in a desired order of occurrence, a set of unique values; for each said set, separately forming one or more subset representations of said set each said one or more subsets comprising one or more of said unique values of said set, said step comprising the step of forming a vector of binary bits for each said subset, said binary bits of said binary bit vector having an order of occurrence corresponding to said order of occurrence of said unique values in said set, and each said binary bit representing the presence or absence of each said unique value in said subset; forming a binary representation for said at least one relation, said step comprising the step, for each said subset representation of said at least one relation, forming a vector of binary bits in a predetermined order corresponding to each said subset value, each said binary bit of said binary bit vector corresponding to one of said rows of one of said columns of said at least one relation, and each said binary bit representing the presence or absence of said present value in one of the rows of said column.
17. The method of claim 16 further comprising the step of constructing said at least one relation of said relational database from said binary represented relation, said step comprising the steps of; for each said binary represented column 'of said relation, referencing said corresponding binary bit vectors, said step including the steps of identifying each said corresponding binary bit of said subset, and identifying each said unique value of each said set corresponding to each said binary bit vector, and displaying each occurrence of each said unique value referenced by each said binary bit vector in said binary represented relation.
18. The method of claim 16 further comprising the step of performing a relational operation on said one or more relations to determine a resultant relation.
PCT/US1988/003528 1987-10-09 1988-10-07 A relational database representation with relational database operation capability WO1989004013A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US107,447 1979-12-26
US10744787A 1987-10-09 1987-10-09
US23875488A 1988-08-29 1988-08-29
US238,754 1988-08-29

Publications (1)

Publication Number Publication Date
WO1989004013A1 true WO1989004013A1 (en) 1989-05-05

Family

ID=26804791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1988/003528 WO1989004013A1 (en) 1987-10-09 1988-10-07 A relational database representation with relational database operation capability

Country Status (4)

Country Link
EP (1) EP0398884A4 (en)
AU (1) AU632267B2 (en)
CA (1) CA1338601C (en)
WO (1) WO1989004013A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997004407A1 (en) * 1995-07-14 1997-02-06 Alcatel N.V. Emulator for an sql relational-database
WO1998047081A1 (en) * 1997-04-15 1998-10-22 Dci Datenbank Für Wirtschaftsinformation Gmbh Digital data selection method
US6035311A (en) * 1995-12-01 2000-03-07 Sand Technology Systems International, Inc. Method and system for performing a boolean operation on bit strings using a maximal bit slice
EP1323122A2 (en) * 1999-09-03 2003-07-02 Whamtech, L.P. Index relational processor
GB2395587A (en) * 2002-11-25 2004-05-26 Sun Microsystems Inc Method for restructuring data which may be in a relational database
US7908430B2 (en) 2000-08-18 2011-03-15 Bdgb Enterprise Software S.A.R.L. Associative memory
US8015198B2 (en) 2001-08-27 2011-09-06 Bdgb Enterprise Software S.A.R.L. Method for automatically indexing documents
US8276067B2 (en) 1999-04-28 2012-09-25 Bdgb Enterprise Software S.A.R.L. Classification method and apparatus
US8321357B2 (en) 2009-09-30 2012-11-27 Lapir Gennady Method and system for extraction
WO2014143791A1 (en) * 2013-03-15 2014-09-18 Informatica Corporation Efficiently performing operations on distinct data values
US9152883B2 (en) 2009-11-02 2015-10-06 Harry Urbschat System and method for increasing the accuracy of optical character recognition (OCR)
US9158833B2 (en) 2009-11-02 2015-10-13 Harry Urbschat System and method for obtaining document information
US9159584B2 (en) 2000-08-18 2015-10-13 Gannady Lapir Methods and systems of retrieving documents
US9213756B2 (en) 2009-11-02 2015-12-15 Harry Urbschat System and method of using dynamic variance networks

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0360387B1 (en) * 1988-09-23 1996-05-08 International Business Machines Corporation Data base management system
US7467155B2 (en) 2005-07-12 2008-12-16 Sand Technology Systems International, Inc. Method and apparatus for representation of unstructured data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4068298A (en) * 1975-12-03 1978-01-10 Systems Development Corporation Information storage and retrieval system
US4276597A (en) * 1974-01-17 1981-06-30 Volt Delta Resources, Inc. Method and apparatus for information storage and retrieval
US4318184A (en) * 1978-09-05 1982-03-02 Millett Ronald P Information storage and retrieval system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0079465A3 (en) * 1981-11-13 1985-01-23 International Business Machines Corporation Method for storing and accessing a relational data base

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4276597A (en) * 1974-01-17 1981-06-30 Volt Delta Resources, Inc. Method and apparatus for information storage and retrieval
US4068298A (en) * 1975-12-03 1978-01-10 Systems Development Corporation Information storage and retrieval system
US4318184A (en) * 1978-09-05 1982-03-02 Millett Ronald P Information storage and retrieval system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
E.H. BERTZ, "A Set-Theoretic View of Data-Base Representing", ACM Sigmod Workshop on Data Description, Access and Control, pages 477-494 (1-3 May, 1974). *
See also references of EP0398884A4 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997004407A1 (en) * 1995-07-14 1997-02-06 Alcatel N.V. Emulator for an sql relational-database
US6003026A (en) * 1995-07-14 1999-12-14 Alcatel N.V. Emulator for an SQL relational-database
CN1098501C (en) * 1995-07-14 2003-01-08 阿尔卡塔尔有限公司 simulator and method for SQL relational database
US6035311A (en) * 1995-12-01 2000-03-07 Sand Technology Systems International, Inc. Method and system for performing a boolean operation on bit strings using a maximal bit slice
WO1998047081A1 (en) * 1997-04-15 1998-10-22 Dci Datenbank Für Wirtschaftsinformation Gmbh Digital data selection method
US8276067B2 (en) 1999-04-28 2012-09-25 Bdgb Enterprise Software S.A.R.L. Classification method and apparatus
EP1323122A2 (en) * 1999-09-03 2003-07-02 Whamtech, L.P. Index relational processor
EP1323122A4 (en) * 1999-09-03 2007-01-17 Whamtech L P Index relational processor
US8209481B2 (en) 2000-08-18 2012-06-26 Bdgb Enterprise Software S.A.R.L Associative memory
US7908430B2 (en) 2000-08-18 2011-03-15 Bdgb Enterprise Software S.A.R.L. Associative memory
US9159584B2 (en) 2000-08-18 2015-10-13 Gannady Lapir Methods and systems of retrieving documents
US8015198B2 (en) 2001-08-27 2011-09-06 Bdgb Enterprise Software S.A.R.L. Method for automatically indexing documents
US9141691B2 (en) 2001-08-27 2015-09-22 Alexander GOERKE Method for automatically indexing documents
GB2395587A (en) * 2002-11-25 2004-05-26 Sun Microsystems Inc Method for restructuring data which may be in a relational database
US8321357B2 (en) 2009-09-30 2012-11-27 Lapir Gennady Method and system for extraction
US9152883B2 (en) 2009-11-02 2015-10-06 Harry Urbschat System and method for increasing the accuracy of optical character recognition (OCR)
US9158833B2 (en) 2009-11-02 2015-10-13 Harry Urbschat System and method for obtaining document information
US9213756B2 (en) 2009-11-02 2015-12-15 Harry Urbschat System and method of using dynamic variance networks
WO2014143791A1 (en) * 2013-03-15 2014-09-18 Informatica Corporation Efficiently performing operations on distinct data values
US9218379B2 (en) 2013-03-15 2015-12-22 Informatica Llc Method, apparatus, and computer-readable medium for efficiently performing operations on distinct data values
GB2528405A (en) * 2013-03-15 2016-01-20 Informatica Corp Efficiently performing operations on distinct data values
GB2528405B (en) * 2013-03-15 2021-06-23 Informatica Llc Efficiently performing operations on distinct data values

Also Published As

Publication number Publication date
AU2710088A (en) 1989-05-23
AU632267B2 (en) 1992-12-24
CA1338601C (en) 1996-09-17
EP0398884A4 (en) 1992-08-12
EP0398884A1 (en) 1990-11-28

Similar Documents

Publication Publication Date Title
US6633883B2 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
US6711563B1 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
AU632267B2 (en) A relational database representation with relational database operation capability
US7058621B1 (en) Method for extracting information from a database
EP1217541A1 (en) Method of processing queries in a database system, and database system and software product for implementing such method
US7246124B2 (en) Methods of encoding and combining integer lists in a computer system, and computer software product for implementing such methods
JPS61223941A (en) Method for storing and retrieving chemical structure
US5625812A (en) Method of data structure extraction for computer systems operating under the ANSI-92 SQL2 outer join protocol
JP3452531B2 (en) Method and system for data mining
Mitoma et al. Automatic data base schema design and optimization
CN110389953B (en) Data storage method, storage medium, storage device and server based on compression map
JP2004534981A (en) Methods for organizing data and processing queries in a database system, and database systems and software products for implementing such methods
Orchard-Hays History of mathematical programming systems
AU2002232035A1 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
CN115114293A (en) Database index creating method, related device, equipment and storage medium
Rishe A file structure for semantic databases
CN114138735A (en) Method for quickly loading Janus graph data in batches
Vaidya et al. Design and architectural implications of a spatial information system
Fry et al. Towards a formulation and definition of data reorganization
Xingjian A database design method for finite element analysis
CN113434540B (en) Dependency relationship generation method and system in application driving data generation supporting left external connection
Harris A new algorithm for tree modification in the primal transportation problem
Spiegler TOWARD A UNIFIED VIEW OF DATA Bridging Data Structure and Content
WO2018117901A1 (en) Information processing system
Lee Database design for decision support systems

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1988910209

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1988910209

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1988910209

Country of ref document: EP