EP0398884A1 - Representation de base de donnees relationnelles capable d'operations sur une base de donnees relationnelles - Google Patents

Representation de base de donnees relationnelles capable d'operations sur une base de donnees relationnelles

Info

Publication number
EP0398884A1
EP0398884A1 EP88910209A EP88910209A EP0398884A1 EP 0398884 A1 EP0398884 A1 EP 0398884A1 EP 88910209 A EP88910209 A EP 88910209A EP 88910209 A EP88910209 A EP 88910209A EP 0398884 A1 EP0398884 A1 EP 0398884A1
Authority
EP
European Patent Office
Prior art keywords
relation
column
vector
value
binary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP88910209A
Other languages
German (de)
English (en)
Other versions
EP0398884A4 (en
Inventor
Edward Lewis Glaser
Douglas Wyche Caldwell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nucleus International Corp
Original Assignee
Nucleus International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nucleus International Corp filed Critical Nucleus International Corp
Publication of EP0398884A1 publication Critical patent/EP0398884A1/fr
Publication of EP0398884A4 publication Critical patent/EP0398884A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • This invention relates to a computer apparatus and/or method for creating a relational database. This invention also relates to a computer apparatus and/or method for efficiently processing relational operations on one or more relations.
  • a database system is basically a computerized record-keeping machine — that is, a system whose overall purpose is to maintain data and to make that data available.
  • the data maintained by the system can be any data deemed to be of significance.
  • the basic purpose of a database system is to assist in the process of accessing data.
  • relational database is a database that is perceived by users as a collection of tables and nothing but tables.” Id. at 96.
  • the relational database system enables the user to generate new tables from old tables and to extract and combine subsets of information from one or more tables.
  • the present invention introduces a computer apparatus and/or method for efficiently accessing, representing and manipulating data in a relational database. More particularly, the present invention introduces a computer apparatus and/or method for creating and representing a relational database and an apparatus and/or method for efficiently processing relational operations on one or more relations.
  • one aspect of the preferred embodiment of the present invention includes an apparatus and/or method, utilizing a computer, for creating a relational database.
  • the relational database contains a plurality of relations and each of the relations contains one or more columns and rows.
  • a column has one or more values, which all have a common characteristic.
  • Each value in the column corresponds to one of the rows of the relation.
  • Each row contains one or more values, and each value is from a different column.
  • the values in each row thus have one or more characteristics.
  • the creation of a relational database occurs in three steps. First, for each characteristic of the relational database, a set, called a "value set" or "domain” is formed containing all the values entered in the data base having the same characteristic.
  • each relation (table) of the relational database one or more subsets of each value set containing values unique to the particular relation is formed. Each of the subsets contains one or more of the unique values of one of the value sets.
  • the relations (tables) of the relational database are formed. More particularly, for each subset associated with a particular relation, one of the columns of the relation is formed. Each column contains one or more of each unique value in the subset, and each unique value of the column occurs in one or more of the rows of the relation.
  • the first step involves an additional step of forming the set of unique values in some desired order of occurrence (e.g., numerical, lexical, or order of entry into the system) .
  • the second step may involve a step of representing a subset in the form of an entity select vector of binary bits, where each of the binary bits of the binary bit vector has an order of occurrence which corresponds to the order of occurrence of the unique values in the set.
  • the binary bits of the bit vector represent the presence or absence of each of the unique values in the subset, i.e., the vector selects which entities or values in the value set are in the associated subset forming the column of the relation.
  • the third step may also involve the additional step of forming a binary representation of each relation in the relational database.
  • a vector of binary bits is created for each unique value in the subset.
  • Each binary bit of this binary bit vector corresponds to one of the rows of the column, and the binary bits can be formed in an order of occurrence corresponding to the order of occurrence of the rows in the column or in some other appropriate order.
  • each binary bit represents the presence or absence of each unique value in one of the rows of the column.
  • the present invention also has the capability of efficiently performing database operations on one or more relations.
  • Binary bit vector represented relations need not be decompressed in order to perform database operations.
  • Computer time is saved by not having to decompress the data, and processing is made more efficient, because only a relatively small amount of compressed data needs to be considered while performing the relational operations.
  • the relational operations are reduced to operations on bit strings.
  • This aspect of the present invention involves a method and/or apparatus using a computer, for efficiently performing a relational operation on one or more relations of a relational database in order to produce a binary representation of a resulting relation.
  • the relational operations performed are in the form of "SELECT,” "JOIN” (“EQUIJOIN” or "NATURAL JOIN”) and "PROJECT.”
  • the SELECT operation determines which one or more rows of a relation corresponds to one' or more selected unique values.
  • the selection process occurs in two steps. First, the binary bit vector(s) associated with the one or more selected unique values are retrieved. (Each binary bit of the binary bit vector indicates the one or more rows of the binary represented relation which contain the selected unique value) . Second, when there is more than one selected unique value, a Boolean OR operation is performed on the retrieved binary bit vectors to determine the binary representation of a resultant relation. More particularly, the resultant relation is represented by a binary bit vector which indicates the one or more rows of the relation which contain the selected unique values from one column of the relation.
  • the quantity of the rows of the relation which contain the one or more selected unique values can be quickly determined by counting the binary bits of the resultant relation which indicate the presence of the selected values.
  • the preferred embodiment also has the capability of performing SELECT on more than one selected unique value which occur in more than one column of the relation.
  • the selection process occurs in three steps. First, for each column, the binary bit vectors associated with the selected unique values are retrieved. Each binary bit of the binary bit vector indicates a row of the binary represented column which contains the selected unique values. Second, for each column, when there is more than one selected unique value from the column, a Boolean OR operation is performed on the retrieved binary bit vectors to determine a SELECT binary bit vector indicating the one or more rows of the relation which contain the selected unique values. This bit vector characterizes a subset of the rows of the original relation. Third, a Boolean operation (i.e., AND, OR, etc.) is performed on the SELECT binary bit vectors to determine a resultant relation.
  • the step for performing a relational operation also may include the step for performing a JOIN on one or more rows of a first binary represented relation with one or more rows of a second binary represented relation. More particularly, the JOIN operation may be specified to occur only when a column of the first relation contains values having a particular characteristic (i.e., belonging to particular domain) and a column of the second relation contains values having the same characteristic, which bear a desired relationship with values from the column of the first relation. That is, they are greater than, less than, equal to, equal to or greater than, equal to or less than the corresponding values of the first relation. (The second relation need not be distinct from the first.)
  • performing a PROJECT operation is also included.
  • the PROJECT operation comprises generating the binary bit vectors associated with any subset of columns of a particular relation. The columns selected can be reconstructed for a user to interpret.
  • Any relational operation which creates a second resultant relation can be implemented with the present invention.
  • SELECT, PROJECT and JOIN are thoroughly discussed, one skilled in the art of database systems understands that there are comparatively few practical problems that cannot be solved by SELECT, PROJECT and JOIN alone.
  • the other relational operations such as PRODUCT, UNION and DIFFERENCE can be implemented by the present invention as well.
  • functions such as INSERT and DELETE for updating rows of a binary represented relation and for updating unique values to sets of unique values, are also implemented by the present invention.
  • the present invention significantly enhances a computer's capability of processing relational operations by performing the relational operations directly at the binary bit level. Additionally, the Boolean operations can also be performed more efficiently by a special Boolean Logic Unit discussed in a co-filed application to Glaser et al. , entitled “A BIT VECTOR PROCESSOR WITH BOOLEAN OPERATION PROCESSING CAPABILITY,” filing date September 24, 1987.
  • FIG. 1 depicts a computer system equipped with a Relational Database Management System (RDMS) for creating a binary representation of a relational database and/or for efficiently processing relational operations on the one or more binary represented relations in accordance with the present invention
  • RDMS Relational Database Management System
  • FIG. 1A depicts the RDMS on which software programs and/or hardware are performed for representing the relations and for performing relational operations;
  • FIG. 2 represents a typical relational database provided by a user of the system depicted in FIG. 1;
  • FIG. 3A represents an ordered set S where the ordering is a lexical ordering
  • FIG. 3B represents an (ordered) subset of the ordered set represented in FIG. 3A;
  • FIG. 3C represents an empty or null set A
  • FIG. 3D depicts a binary bit vector expression a for characterizing the subset depicted in FIG. 3B
  • FIG. 3E is a binary bit vector a 1 for characterizing the ordered set shown in FIG. 3A;
  • FIG. 3F depicts a binary bit vector representation of a null subset a" shown in FIG. 3C;
  • FIG. 4 depicts a binary representation of the Suppliers relation as shown in FIG. 2;
  • FIG. 5 depicts a binary representation of the Parts relation as shown in FIG. 2;
  • FIG. 6 depicts a binary representation of the Shipments relation as shown in FIG. 2;
  • FIG. 7 is a flow diagram of the BINARY
  • FIGS. 8A and 8B are a table depicting the results of executing a series of commands for setting up the columns of the relational database shown in FIG. 2;
  • FIGS. 9A, 9B. and 9C are a table depicting the results of constructing the binary representation of the Suppliers relation (FIG. 2) by the BINARY REPRESENTATION routine;
  • FIG. 10A is a flow diagram of the INSERT routine;
  • FIG. 10B is a flow diagram of the INSERT VALUE INTO VALUE SET routine
  • FIG. IOC is a flow diagram of the UPDATE SUBSET routin ;
  • FIG. 10D is a flow diagram of the ADD VALUE TO
  • FIG. 11A is a flow diagram of the DELETE routine
  • FIG. 11B is a flow diagram of the DELETE VALUE FROM COLUMN routine
  • FIG. 11C is a flow diagram of the DELETE VALUE FROM COLUMN routine
  • FIG. 11D is a flow diagram of the DELETE VALUE FROM VALUE SET routine
  • FIG. 12 is a results table depicting the operations performed by the INSERT routine (FIG. 10A) ;
  • FIG. 13 is a table depicting the results of the operations performed by the DELETE routine (FIG. 11A) ;
  • FIG. 14 is a flow diagram of the SELECT routine;
  • FIGS. 15A and 15B depict a table of the results of the operations performed by- the SELECT routine for a two-column SELECT for two values;
  • FIGS. 16A and 16B depict a table of results of the operations performed by the SELECT routine (FIG. 14) on two columns for multiple values;
  • FIG. 17A is a flow diagram of the PROJECT routine;
  • FIG. 17B is a flow diagram of the DISPLAY/RECONSTRUCT routine
  • FIGS. 18A, 18B, 18C and 18D depict a table of the results of the operations performed by the PROJECT routine (FIG. 17A) ;
  • FIG. 19 depicts a binary representation of a JOIN relation;
  • FIG. 20 depicts , a more detailed version of the binary representation of the Suppliers portion of the JOIN relation (FIG. 19) ;
  • FIG. 21 depicts a more detailed version of the Parts relation portion of the JOIN relation (FIG. 19) ;
  • FIG. 22A is a flow diagram of the EQUIJOIN routine
  • FIG. 22B is a flow diagram of the BUILD ROW USE SETS routine
  • FIG. 22C is a flow diagram of the EVALUATE ROW USE SETS routine
  • FIG. 22D is a flow diagram of the CONSTRUCT JOIN ROW USE VECTORS routine
  • FIG. 22E is a flow diagram of the PRODUCTS routine
  • FIG. 22F is a flow diagram of the NUMS routine
  • FIG. 22G is a flow diagram of the GENERATE BIT STRING routine
  • FIGS. 23A, 23B, and 23C represent a table of the results of the operations performed by the EQUIJOIN routine (FIG. 22A) ;
  • FIG. 24 is a flow diagram of the GREATER THAN JOIN routine
  • FIG. 25A is a flow diagram of the DISPLAY/RECONSTRUCT FOR JOIN routine
  • FIG. 25B is a flow diagram of the REFERENCE RELATION routine
  • FIG. 25C is a flow diagram of the REFERENCE VALUE SET routine
  • FIGS. 26A, 26B, 26C, 26D, 26E, 26F and 26G depict a table of results for the operations performed by the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) ;
  • FIG. 27 represents a mapping of the elements of Set S into Set T; Set S is the "domain” and Set T is the “range.”
  • FIG. 28 represents the binary representation of
  • FIG. 29 depicts a binary representation of the Suppliers portion of the JOIN relation (FIG. 20) with entity use vectors added;
  • FIG. 30A is a flow block diagram of the DISPLAY/RECONSTRUCT WITH ENTITY USE VECTORS routine
  • FIG. 3OB is a flow block diagram of the REFERENCE RELATION routine
  • FIG. 30C is a flow block diagram of the REFERENCE
  • FIG. 31 depicts a relational database
  • FIG. 32 depicts a SYSTEM RELATION
  • FIG. 33 depicts the domains of the relational database shown in FIG. 31;
  • FIG. 34 depicts the ENTITY SELECT SET associated With the SYSTEM RELATION (FIG. 32) ;
  • FIG. 35 depicts the ENTITY USE SET, which is associated with the SYSTEM RELATION (FIG. 32) ;
  • FIG. 36 depicts the ROW SELECT SET associated with the SYSTEM RELATION (FIG. 32) ;
  • FIGS. 37A, 37B, 37C, and 37D depict the ROW USE SETS associated with the SYSTEM RELATION (FIG. 31) ;
  • FIGS. 38A, 38B, 38C, and 38D are flow block diagrams of the DATABASE IDENTIFICATION routine.
  • FIGURE 1 depicts a computer system having a programmable computer and computer programs for creating a relational database and for processing operations on one or more relations (also called tables) of a relational database.
  • the system includes programmable computer 2, display 3, entry device 11 for the computer and external device 12 such as a disk for storage of data.
  • Hardware/software for representing the relations and hardware/software for performing relational operations are housed in a Relational Database Management System (RDMS) 10 (shown in phantom lines) , which is connected within the computer 2.
  • RDMS 10 coordinates the various activities related to representing relations in the relational database and to performing relational operations on one or more relations.
  • RDMS Relational Database Management System
  • RDMS 10 is a programmable computer on a printed circuit board which can be easily employed within most standard computers, including per ⁇ sonal, mini-, and mainframe computers. It is envisioned that RDMS 10 may be a special purpose computer formed by one or more integrated chips.
  • RDMS 10 includes an optional Binary Bit Vector Processor (BBVP) 14, an optional Bit Vector Encoder (BVE) 16, an optional Map Vector Processor (MVP) 15, an optional memory 18, a Relational Processing Unit (RPU) 22, including a Boolean Logic Unit (BLU) 24, and a Command Interpreter 28.
  • BBVP Binary Bit Vector Processor
  • BVE Bit Vector Encoder
  • MVP Map Vector Processor
  • RPU Relational Processing Unit
  • BLU Boolean Logic Unit
  • External device 12 is a permanent or buffered storage, typically in the form of a hard disk, for storing information used in relations which are represented in expanded form where each value is typically no smaller than a byte.
  • the contents of the external device are typically maintained in records which are divided into fields where the n ⁇ 1 field of each record corresponds to a specific type of data»
  • the contents of the external device 12 is loaded via bus 30 to the RDMS 10 and specifically to RPU 22.
  • the RPU instructs BBVP 14 to convert each relation stored on the external device into a binary representation (to be more thoroughly discussed in PART IV) .
  • Bus 32 then transfers the binary representation of each relation to optional BVE 16.
  • the BVE 16 compresses the binary representation, the techniques of which are discussed in a co-filed patent application to Glaser et al entitled "A BIT-STRING COMPRESSOR WITH BOOLEAN OPERATION PROCESSING CAPABILITY", having Serial No. 07/100,761, filed September 24, 1987, which is herein incorporated by reference.
  • the BVE 16 employed within the RDMS 10, evaluates uncompressed bit-string representations of each relation and separates the bit- strings into one or more "impulses."
  • An impulse is a run, which is a string of one or more bits of a same binary value or a polarity (e.g., "O's" or "l's"), and an ending bit which has a polarity opposite the polarity of the run.
  • Software programs executed by the bit- vector encoder encode the bit vectors into one of several different compressed impulse formats. The compressed bit vectors are then sent via bus 32 back to the BBVP and then in turn to the optional memory 18.
  • Memory 18 may be a memory component included in the host computer such as an external device or a memory component included within the RDMS 10 (as shown in FIG. 1A) .
  • Memory 18 holds the compressed binary representations of relations before processing the relations at the RPU 22 or stores the relations after processing at RPU 22.
  • RPU 22 via the BBVP 14 performs relational type operations (e.g., SELECT, PROJECT, JOIN, INSERT, DELETE, etc.) . on one or more relations by processing unique software programs at the RPU 22 via a microcontroller (e.g., Intel 80386).
  • relational type operations e.g., SELECT, PROJECT, JOIN, INSERT, DELETE, etc.
  • the Boolean operations can be performed by the hardware or software embodiments of the BLU 24 discussed in Glaser et al (Serial No. 07/100,761). Even though the relational operations in the preferred embodiment may be implemented primarily by software, the RPU 22 can perform relational operations more efficiently in terms of storage, speed, etc. , than presently known techniques for performing relational operations on one or more relations.
  • BLU 24 can take full advantage of the unique properties of the latest components, such as 32-bit microprocessors, CPU'S, etc.
  • the MVP 15 generates map vectors, called entity use vectors, and the map vectors are used by the RPU .22 to map each row of a column of a relation to a unique value of the relational database.
  • the purpose of the entity use vectors is for facilitating the reconstruction and display of the information represented by the binary representations of the relations.
  • the MVP 15 is optional because the RDMS 10 can reconstruct and display the relations without having entity use vectors.
  • the entity use vectors are used to perform the DISPLAY/RECONSTRUCT process more efficiently.
  • the preferred embodiment contains a command interpreter 28 which interprets instructions for processing data in the relational database.
  • the instructions used could be those found in Structured Query Language (SQL) which has become an industry standard language for enabling users to communicate with relational databases.
  • SQL Structured Query Language
  • the operation of the components in the RDMS 10 are controlled by software programs implemented by RPU 22. Data lines 30, 31, 41, 44, 46, 48, 50, and 52 of FIG. 1A depict the data flow between the various components of the RDMS 10.
  • the command interpreter 28 need not be a part of the RDMS 10; instead, it could be loaded into the host computer 2. In the preferred embodiment, however, the command interpreter 28 is located within the RDMS 10.
  • relations stored in the external device 12 need to be updated by inserting or deleting values in the different domains, etc.
  • the information can be updated by transferring data, via bus 30, to RPU 22, which then inserts or deletes necessary values, in the form of bytes, in the domain, etc., and restores the domain back to external device 12 via bus 30.
  • This capability enables the RDMS 10 to add new and unique values to the relational database.
  • FIG. 2 is an example of a relational database, namely, the "SUPPLIERS and PARTS" relational database, and it has been adopted from Date, An Introduction to Database Systems, 4th Ed. , Chapters 4 & 11 (1986) , which is incorporated herein by reference.
  • FIG. 2 depicts three relations, a table for suppliers 63, a table for parts 65, and a table for ' shipments 67.
  • “Relation” is just a mathematical term for the word "table”, and these two words are used interchangeably in the specification.
  • Table 63 represents the information on the suppliers for a particular company. Each supplier has a Supplier ID column 80, a Supplier Name column 82, a Status column 84, and a City column 86, indicating where the supplier is located. As shown in table 63, each row of the table depicts information on a different supplier, and each column 80, 82, 84, 86 represents a different characteristic of each supplier. For example, row 69 of table 63 depicts supplier "SI" having the name "Smith” with a Status "20" and located in "London.”
  • Table 65 represents the parts the suppliers may sell. Each part has a Part ID 88, a Part Name column 90, a Color column 92, a Weight column 94, and a City column 96. As shown by each row of the table, it is also assumed that the part only comes in one color and it is stored in a warehouse in exactly one city. For example, row 77 of table 65 corresponds to a single part having part having part ID "PI”, a part name "NUT”, a color "RED”, a weight "12", and a location or city, "LONDON.”
  • Table 67 represents the shipments of parts (table 65) made by suppliers (table 63) .
  • Table 67 is really a connection between tables 63 and 65 (to be discussed in Part V) .
  • the first row 87 of table 67 connects a specific supplier (SI) with a specific part (PI) ; stated differently, it represents a shipment of parts of kind PI, by supplier SI, and the shipment quantity is 300.
  • SI specific supplier
  • PI specific part
  • each shipment is uniquely described by supplier ID, corresponding to column 98, a part ID, corresponding to column 100, and a quantity, corresponding to column 102. It is assumed that at most, one shipment at any given time for a given supplier can be made for a given part.
  • the combination of SI and PI having a quantity shipment of 300 at row 87 is unique with respect to the set of the shipments appearing in table 67.
  • the Shipments table is a "relationship" between a particular supplier and a particular part.
  • the "SUPPLIERS and PARTS" relational database (FIG. 2) describes, in reality, a very elementary database. Databases are likely to be much more involved, containing many more entities and relationships. This database, however, is sufficient to illustrate what a relational database is and to illustrate the novel features of the present invention.
  • each of the data values depicted in the tables 63, 65, 67 are "atomic". That is, at every row and column position in every table, 63, 65, 67, there is always exactly one data value, never a set of values. For example, in table 63, at row 69 and column 89, the status for the supplier Smith is a single value "20" and not a set of statuses. Also, note there are no links connecting one table to another table. In the example of FIG. 2, there is a relationship between the supplier row 69 of table 63 and the part row 77 of table 65, because supplier SI supplies PI as shown by the existence of row 87 of table 67, in which SI has sold 300 Pi's.
  • relational models as disclosed by the prior art represent database systems "at a level of abstraction that is somewhat removed from the details in the underlying machine.”
  • relational database at the external level remains the same, the internal level of the relational database is actually depicted in the form of columns, as discussed in PART IV.
  • a domain is a set of unique values, and each domain has only one characteristic.
  • the domain for supplier identifiers 66 is the set of all possible unique supplier identifiers which are referenced in the system.
  • the domain for person names 70 is the set of all unique names
  • the set of numbers 78 is the set of all integers greater than 0 and less than 10,000 (for example). Domains are pools of unique values, from which actual values appearing in the columns of the tables 63, 65 and 67 are drawn. Typically, there will be values in a given domain that do not concurrently appear in one of the columns of one of the relations.
  • the value S8 or S10 may appear in the domain of supplier identifiers 66, but note that no supplier S8 or S10 actually appears in relation 63. They may appear in some other column of some other relation. Furthermore, each column of a relation corresponds to one of the domains. For example, supplier identifier domain 66 corresponds to column 80 of table 63.
  • RPU 22 acts as a file manager to retrieve each of the records associated with the particular columns. The columns will then be sent via bus 30 to the RDMS 10 or for further processing.
  • the RDMS 10's view of the database on the external device 12 is a collection of stored columns (in byte form) , and that view is supported by a file manager and disk manager (not shown) which may or may not be a part of the RDMS system.
  • a file manager and disk manager not shown
  • the relation will be converted into a binary representation via the BBVP.
  • This unique tabular representation of the relation is one aspect of the invention and it will be thoroughly discussed in part IV.
  • Part III is devoted entirely to creating a fundamental framework in which a binary representation of a relation can be developed.
  • One aspect of the present invention is the representation of a relational database by bit strings which do not contain the raw data values of the relational database. Instead, each value of the database is represented by a single binary bit within an ordered set of binary bits.
  • the basic building block for representing a relation of a relational database relies on the concept of ordered sets.
  • Ordered sets consist of ordered pairs, ⁇ M, X>, where each ordered pair represents an element of a set, where M is a symbol that defines the location or position of the value X within the set.
  • M is a non-negative integer.
  • every element of the ordered set constitutes a function between the integer describing the ordinal position M and the corresponding value X in the set.
  • the value M increases by one for each forward progression of one element in the set.
  • An ordered set with unique values can be defined as a set having elements X and Y which have ordinal positions M and N, respectively, and for all ⁇ M, X>, ⁇ N, Y> in the set, M is equal to N if, and only if, X is equal to Y.
  • Two elements of an ordered set can only be equal when they are at the same ordinal position in the set, as all ordinal positions in the ordered set are unique.
  • the ordering of values within ordered sets (ordering rule) namely, the way in which each value matches an ordinal position, is arbitrary. It depends on the system's or user's choices and requirements.
  • a "vector" consists of an ordered set of elements.
  • a vector consisting of a set "S" of elements is a one- dimensional array, where each value corresponds to a vector element whose ordinal position over the vector is implied. More particularly, a vector may define an ordering to a set of elements in an ordered or non- ordered set. Given the ordered set "S" 148 of FIG. 3A, a "binary-bit vector" a can uniquely define a subset "A" (FIG. 3D) of "S" 148 (FIG. 3A) , by representing the elements of "S" in "A".
  • each data value must represent whether an element of set "S" is in subset "A” or is not in subset "A” and, thus, a binary bit string, where each bit corresponds to an ordinal position of set "S", can represent subset "A”.
  • binary bit-vectors contain l's and O's, a "1” indicating a set element is present, and a "0” indicating that it is not present.
  • Each position of each binary bit within a binary bit-vector corresponds to an ordinal position of an element in subset "A" (FIG. 3A) .
  • a binary bit-value combined with its implied ordinal position within the vector, is an "element" of the vector.
  • the presence, of a "1" bit in the vector is equivalent to representing a value of the ordered set by only its ordinal position. In this way, representations of data can be operated on without operating on the data values themselves.
  • FIG. 2 depicts a relational database for suppliers, parts and shipments.
  • the relational database consists of tables 63, 65, and 67, which depict the Suppliers, Parts and Shipments relations, respectively.
  • domain 60 contains the unique values which are referenced by each of the tables 63, 65 and 67.
  • the domain 60 contains unique values for supplier identifiers 66, part identifiers 68, person names 70, part names 72, cities 74, colors 76, and numbers 78.
  • One aspect of the present invention is to create a binary representation of a relational data base, such as the one shown in " FIG. 2. Specifically, the binary- representation for the relational database shown in FIG.
  • FIG. 4 shows the binary representation of the suppliers table 63 (FIG. 2)
  • FIG. 5 is a binary representation of the parts table 65
  • FIG. 6 is a binary representation of the Shipments table 67 (FIG. 2) .
  • FIG. 4 can be broken up into two portions as shown.
  • the top portion of the figure represents the value sets or domains associated with the suppliers relation 63 of the relational database in FIG. 2. More particularly, domain 160 (FIG. 4) is associated with suppliers identifier domain 66 of FIG. 2.
  • Domain 162 corresponds to the Person Names domain 70 of FIG. 2.
  • Domain 164 corresponds to the Numbers domain 78 (FIG. 2)
  • domain 166 corresponds to the Cities domain 74 of FIG. 2.
  • the values of domains 160, 162, 164 and 166 have been arbitrarily chosen and formed in particular orders of occurrence, namely, lexical ordering for domains 162 and 166 and numerical ordering for domains 160 and 164.
  • each of the domain 160, 162, 164 and 166 is represented by corresponding binary bit-vectors 176, 178, 180, and 182.
  • Each binary bit-vector which is used for the purpose of characterizing a domain, is called herein an entity select vector.
  • the binary bit-vector is used to represent a selection (subset) of entities from a set of entities, where an entity is defined to be a unique value or grouping of values.
  • an entity may be a value in a value set or a row in a relation.
  • the entity select vector has a quantity of binary bits equivalent to the quantity of unique values in a given domain.
  • domain 160 for supplier identifiers contains ten supplier identifiers for ten corresponding binary bits.
  • Each binary bit of the entity select vector has a position which directly corresponds to the position of each of the unique values in the supplier identifier domain 160.
  • each of the binary bits represents the presence or absence of each of the unique values in the subsets represented by the entity select vector 176.
  • the binary bit 236 of entity select vector 176 corresponds to the unique value SI at 216 of domain 160.
  • the binary bit 236 has a value of "1" which indicates that the unique value SI is present in the subset represented by entity select vector 176.
  • binary bits 238, 240, 242 and 244 also indicate that the unique values S2, S3, S4, and S5 are present in the subset.
  • Binary bits 246, 248, 250, 252 and 254 are set to "0", indicating that unique values S6 226, S7 228, S8 230, S9 232, and S10 234, respectively, are not in the subset.
  • the binary bit vector represen ⁇ tation of the unique values of the domain 160 is a short-hand way of representing a subset of the domain 160 by indicating those unique values which are present in the subset. This aspect of the present invention is an important one. By only representing a subset of a domain, all values of a domain need hot be associated with a column. More particularly, only the entity select vector which corresponds to a particular column of a relation, and not the domain, needs to be associated with the column.
  • entity select vector 176 is associated with column 168
  • entity select vector 178 is associated with column 170
  • entity select vector 180 is associated with column 172
  • entity select vector 182 is associated with column 174.
  • Columns 168, 170, 172 and 174 correspond to the columns of the suppliers relation 63 Of FIG. 2.
  • each binary bit-vector 184, 186, 188, 190 and 192 corresponds to one of the values indicated to be present in the subset by the entity select vector 176.
  • the binary bit-vector is called a row use vector because it indicates the presence or absence of a unique value in one or more rows of a column in the relation.
  • the row use vectors are combined into a row use set 260 to form the binary representation of the column 168.
  • the order of row use vectors occurs in the same order as the order of values characterized by the entity select vector.
  • the lower order binary bits of the entity select vector have been arbitrarily assigned to correspond to the left most row use vectors of the row use set, and the higher order binary bits of the entity select vector correspond to the right most row use vectors of the row use set.
  • the implied mapping between the bits set to "1" of the entity select vector could have just as. easily been reversed.
  • the lower order binary bits may correspond to the right-most row use vectors of the row use set.
  • Binary bit 236 of the entity select vector 176 corresponds to row use vector 184 of the row use set 260.
  • binary bit 238 of the entity select vector 176 corresponds to the next row use vector 186 of the row-use set 260.
  • the system interpreting the entity select vectors and the row use set is pre-programmed so that the binary bits indicate unique values present in the subset.
  • the implied ordering scheme is illustrated by the dotted lines 268, 270, 272, 274 and 275, which show that the binary bits in entity select vector 176 correspond to the row use vectors 184, 186, 188, 190 and 192, respectively.
  • the binary bits in the entity select vector 180 which represent the unique values "10'.', "20”, and "30", are mapped in an implied manner to the row use vectors 204, 206 and 208, respectively.
  • each row use vector can indicate the presence or absence of a unique value in one or more rows of a column of the relation. For example, in row use vector 206, corresponding to the unique value "20", "1" bits indicate the presence of the value "20" in the first and fourth rows of column 172. Because the binary representation of each of the columns is represented by an entity select vector to unique values, the columns 168, 170, 172 and 174 of the relation need not be represented as a set of actual values in the database memory. Only the row use sets and the corresponding entity select vectors need be stored in the memory of RDMS 10 (e.g., memory 18, FIG. 2) .
  • FIG. 5 and FIG. 6 represent the entity select vectors and their corresponding row use sets for the Parts Relation 65 (FIG. 2) and the Shipments relation ' 67 (FIG. 2) .
  • domains 282, 286, 290, 294 and 298 correspond to the domains for part identifiers 68 (FIG. 2) , part names 72 (FIG. 2) , colors 76 (FIG. 2) , numbers 78 (FIG. 2) , and cities 74 (FIG. 2) respectively.
  • the numbers domain 294 and the city domain 298 of the Parts relation (FIG. 5) are identical to the numbers domain 164 and the city domain 166 of the suppliers relation (FIG. 4) .
  • An efficiency of the present invention is that the unique values of the domains need only be represented once. In other words, the unique values in the numbers domain 164 (FIG.
  • the city domain 166 (FIG. 4) and the city domain 298 ,(FIG. 5) are the same domain of unique values of cities.
  • the representation of the subsets for these domains is different.
  • the entity select vector 180 (FIG. 5) represents that the unique values "10", “20” and “30” of the numbers domain are in the relation of suppliers (FIG. 4), whereas the unique values "12", “14", “17” and “19” are in the relation for parts (entity select vector 296, FIG. 5) .
  • the system instead of having to store two complete sets of unique values 164 and 294, which are identical, the system only stores one version of the numbers domain and two entity select vectors, namely, 180 and 296, to represent subsets of the same domain. Likewise, the system need not retain two unique sets of values for cities, 166 and 298, instead, the system maintains only one version of the city domain and two entity select vectors, 182 and 300, to identify the subsets of cities in the associated relations. In fact, any number of entity select vectors may be associated with a particular domain. For example, in FIG. 6, the domain for numbers 328 is also referenced; however, the entity select vector 330 depicts a different subset of unique values from the entity select vectors 180 (FIG. 4) and 296 (FIG. 5) . Entity select vectors are efficient subset representations of domains because the same domain need not be represented more than once and the actual values of each domain can be uniquely represented by the entity select vector's binary bits.
  • the columns of the Parts relation 302, 304, 308, 312, 316 are represented in the relational database by the row use sets 304, 306, 310, 314, 318, respectively.
  • the row use sets contain row use vectors which are each associated with a unique value indicated to be present in a subset of a domain by an entity select vector.
  • FIG. 6 depicts a binary representation for shipments.
  • the domains associated with each of the relations with their unique sets of values are stored on the external . device 12 (FIG. 1A) , and the binary representation of each of the relations (FIGS. 4, 5 and 6) which make up the relational database of FIG. 2 are stored in memory 18 of the RDMS 10 (FIG. 1A) .
  • the binary representation of the relations need be referred to. A detailed discussion of this process will follow shortly.
  • FIG. 7 is a flow block diagram of the steps which are performed by the RDMS 10 for creating a binary representation of a relational database.
  • each domain necessary for specifying unique values in the relations of the relational database is identified.
  • Input commands are interpreted by the command interpreter 28 (FIG. 1A) for creating domain identifiers in the memory 18 (FIG. 1A) of the RDMS 10 (FIG. 1A) .
  • the system reads each of the instructions above, and an empty value set is created for each of the domains listed in the instruction. The actual unique values associated with each domain are loaded into memory in a later step of this routine.
  • the system determines whether any more domains for the database need to be specified.
  • the system identifies a table of the relational database, and the RDMS 10 builds an entry in the system relation (see Part VI) .
  • the system identifies a column, as with the table identified in block 348.
  • the RDMS 10 builds on entries in the system relation identifying each column, and the RPU 22 via the BBVP 14 creates empty entity select vectors associated with each of the domains. The entity select vectors will have their binary bits set when the columns of the particular relations have their values loaded into the system.
  • Processing continues at blocks 350 and 352 until all of the relational columns for the particular table identified during block 348 have been identified.
  • the system asks for the next table or relation of the database. Assuming that a number of different tables exist in the relational database, blocks 348, 350, 352 and 354 are processed until all of the tables and their related columns have been specified.
  • system identifiers identifying each column of each relation, are constructed. These identifiers are discussed more thoroughly in part VII.
  • the system loads a file associated with each table of the relational database.
  • the file of the table is assumed to exist in the external device 12 of the system depicted in FIG. 1.
  • the system loads the file representations of the relations into external device 12, where they reside until summoned by the RDMS system 10.
  • RPU 22 instructs the external device 12 to transfer the first column associated with a particular table of the relational database via bus 30 to the RPU 22.
  • the particular column is retrieved by referring to certain system identifiers (i.e., AID, RID identifiers, Part VII) .
  • AID i.e., RID identifiers, Part VII
  • a bit is set to "1" in the first position of the row -use vector to indicate that the value occupies the first row of the first column of the relation.
  • a new row use vector is created and a binary bit associated with a particular row of the column, where the unique value resides, is set to "1".
  • the binary bits of the entity select vector, associated with the column are set to "1" to indicate the unique values of the domain referenced in the column.
  • the corresponding bits of the row use vector associated with the unique value are set to "1" indicating the presence of the unique value in these rows of the column. Because the entity select vector has previously had the binary bit associated with the unique value set to "1", it need not be set again.
  • This process of converting the column into a row use set continues until all of the values of the column have been represented by binary bits.
  • the RPU 22 commands the external device 12, to bring the next column, via bus 30, into the RDMS 10. This process continues for all of the columns associated with a particular table until the row use sets for the table are completely constructed.
  • the system determines if any other tables need to be binary represented by the RDMS 10, assuming a single input file corresponds to a single table. ' If other tables are to be constructed, then blocks 356, 358 and 360 are performed until all the columns of the next relation are represented in their binary form. After all of the binary bit vectors (i.e., row use vectors and entity select vectors) are constructed, the RPU 22 may summon the bit vectors to bit vector encoder BVE 16 of the RDMS 10 to encode the bit vectors into compressed impulse format. The steps for compressing the bit vectors into compressed impulse formats are thoroughly discussed in Glaser et al. RPU 22 then instructs the bit vectors to be sent via bus 48 to the memory 18 where they are stored.
  • the relations can be updated by two utility functions called INSERT and DELETE, which will be discussed in PART V of the specification.
  • INSERT function
  • DELETE DELETE
  • the INSERT function perhaps with appropriate modification to enhance efficiency, can also be used to load additional rows into the relation.
  • the relations in their binary represented form can be manipulated via the relational operations SELECT, JOIN and PROJECT, also to be discussed in PART V.
  • FIG. 2 a hypothetical relational database is shown.
  • RPU 22 FIG. ' 1A
  • FIGS. 4, 5 and 6 This example has been designed to emphasize the steps of the BINARY REPRESENTATION routine for constructing the binary representation of a relational database in FIGS. 4, 5 and 6.
  • the assumption is made that the reader understands instruction formats.
  • the system of FIG. 1 creates a relational database as shown in the tables 63, 65 and 67 (FIG. 2) . These tables are constructed as inputs to the system. Once the input process is complete, the system calls the BINARY REPRESENTATION routine (FIG. 7) . The system, during block 344, creates the domains referenced by the relational database. Input instructions, specify the following domains; suppliers identifiers 66, the parts identifier 68, person names 70, part names 72, city names 74, colors 76, and numbers 78 (FIG. 2) .
  • the input' instructions in pseudo code look like:
  • FIG. 1A interprets the instruction (A) in block 344, RPU 22 creates an empty value set (364, FIG. 8A) .
  • block 346 the system determines whether there are any other domains to be identified by the system. In fact, there are other domains to be identified as specified by the instructions listed above; thus, block 344 is called.
  • the command interpreter 28 evaluates statement B and RPU 22 creates an empty value set to reference the domain of part identifiers (366, FIG. 8A) .
  • the system determines that there are still more domains to be identified, and thus, block 344 is called.
  • the system interprets the next instruction (C) , which is for the "person names" domain.
  • the RPU 22 (FIG. 1A) generates an empty value set (368, FIG. 8A) .
  • BBVP sets all the vectors to "0" to indicate the empty entity select vector (368, FIG. 8A) .
  • the system determines that there are still more instructions for identifying domains. Processing continues at block 344, and an empty value set for part names is generated (370, FIG. 8A) .
  • Block 346 determines that there are still more domain instructions to be processed and processing continues at block 344.
  • an empty value set corresponding to the city domain is created (372, FIG. 8A) .
  • the system determines that there are still more commands for identifying domains, so processing returns to block 344.
  • Block 344 the system generates an empty value set corresponding to the domain "colors" (374, FIG. 8B) .
  • Block 346 determines that there is one more instruction left (G) for creating a domain, and thus, block 344 is called.
  • the RPU 22 creates an empty value set for the "numbers" domain (376, FIG. 8B) .
  • CREATE TABLE SUPPLIER (SUPPLIERS ID; PERSON NAME; STATUS; CITY) ; which indicates that a table for suppliers is to be identified; specifically, an identifier for the suppliers table is stored in memory 18 of the RDMS 10.
  • the system generates an identifier for the first column of the relation, which is also stored in memory 18. Specifically, it interprets the command above, and a suppliers ID identifier is stored in memory. (Identifiers to be discussed in PART VI).
  • the system determines that the command requires that other columns be identified for the relation, so processing continues at block 350.
  • the system identifies the column for "person names" and stores an identifier indicating such in the memory 18.
  • an empty entity select vector is created, which is associated with the "person names” domain.
  • the system determines that there are still other columns to be identified, and thus, processing continues in block 350.
  • the system stores an identifier for the next column associated with "status” into the memory 18.
  • an empty entity select vector associated with the "status” domain is created.
  • the system determines that there is still a column remaining to be identified; specifically, during block 350, the system stores an identifier for the column associated with "city”.
  • an empty entity select vector associated with the "city” domain is created.
  • the system determines that no more columns need be identified, and thus, processing continues at block 354, which determines whether there are any more tables to be identified for the relational database.
  • the RPU 22 instructs the external device 12 to download the files associated with each table of the relational database via bus 30 to the RDMS 10 (FIG. 1A) .
  • the byte values of each column of the relation are sent to the RPU 22 and the appropriate vectors of the row use sets are built via BBVP 14, and the binary bits of the entity select vectors are set there as well.
  • BBVP 14 the binary bits of the entity select vectors are set there as well.
  • FIGS. 9A, B-and C are results tables depicting the formation of the binary representation of the suppliers relation of FIG. 4. Each row of the FIGS. 9A, B and C depicts an additional row use vector associated with one column in the relation.
  • the first value SI (69, 80 FIG. 2) is inserted into the first ordinal position of the value set for suppliers identifiers. Additionally, the first binary bit of the entity select vector 417 associated with the "suppliers identifier" domain is inserted and set to "1" to indicate that the unique value SI is referenced in the suppliers ID column. The column for the suppliers identifier requires a new row use vector 381 to be generated nd the first bit of the row use vector 381 is set to "1" to indicate that SI occupies the first row of the column.
  • FIG. 9A A new row use vector is added for each value present in the subset, as indicated at 383-389, and a new bit is added to the entity select vector 417. Additionally, five more bits, set to "0", are added to the ordinal positions of the entity select vector corresponding to the new values of the value set.
  • the row use set 260 and the entity select vector 176 have now been generated by the BBVP 14 via the BINARY REPRESENTATION routine (FIG. 7) .
  • the next column of the suppliers table is evaluated by the BBVP 14.
  • the next column in the relation is the "person names” column (82, FIG. 2) , and the first value of the "person names” column is Smith (69, 82, FIG. 2 ) .
  • the value Smith is inserted to the first position of the "person names” value set. Additionally, the entity select vector 419 associated with "person names” has a first binary bit inserted and set to "1" to indicate that the unique value for Smith is referenced in the column for person names. A row use vector for Smith has not been created, and thus, a new row use vector 401 is generated (388, FIG. 9A) . This process is repeated, as indicated in rows 390-396, for the next four names added to the "names" value set, with a new row use vector added for each name, as indicated at 403-409, and a "S" bit added to the entity select vector 419.
  • the order of the row use vectors 409, 405, 407, 403 and 401 corresponding to Adams, Blake, Clark, Jones and Smith is in the order of occurrence of the binary bits set "1" in the entity select vector 419.
  • the values Baker, Fabel, Rahn, Ross and Young are added to the "person names” value set. Specifically, the values are added in an order corresponding to the ordering (396, FIG. 9B) .
  • Binary bits set to "0" are also inserted into entity select vector 414 in the ordinal positions corresponding to the newly added unique values. The new binary bits are set to "0" to indicate the remaining values do not occupy the column.
  • the row use vectors 409, 407, 405, 403 and 401 correspond to the row use set 262 of FIG. 4 and row use vector 262 corresponds to entity select vector 178 of FIG. 4, which is the same as the entity select vector 419.
  • the next column of the input suppliers relation is for the "status” (84, FIG. 2), and the first value of the column is "20" (64, 84 FIG. 2) .
  • the value "20” is inserted into the first position of the value set for status numbers 421. Additionally, a first bit is inserted to entity select vector 421 and set to "1", indicating that unique value "20” is referenced in the relational column.
  • a new row use vector 411 is created, and the first binary bit of the row use vector is set to "1” indicating the presence of value "20" in the first row of the column (398, FIG. 9B) .
  • the process is repeated, as indicated in rows 400-406 of FIG. 9B, for each value in "status” set, resulting in row use vectors 411-415.
  • the next value in the column is 20.
  • the value 20 already exists in the value set for numbers, and thus, it need not be added again.
  • the entity select vector 421, for numbers need not be set because a binary bit corresponding to unique value 20 has already been set to "1" in a previous step.
  • This value also corresponds to an already existing row use vector 411.
  • Binary bits set to "0" are added to row use vectors 413 and 415 to indicate that the values 10 and 30 do not occupy the fourth row of the relational column for status.
  • a binary bit set to "1" is added to the row use vector 411 to indicate that the unique value 20 is also in the fourth position of the relational column for status.
  • the last column of the relation for suppliers is the "cities” column.
  • the first value of the “cities” column is London (69, 89 FIG. 2) .
  • the value London is placed into the first position of the value set for "cities”.
  • the entity select vector 423 as with the names of cities has a first binary bit inserted and set to "1" to indicate that the city London is referenced in the relational column for cities.
  • a row use vector 417 is created and a binary bit set to "1" is added to the row use vector to indicate that London occupies the first row of the column for cities (408, FIG. 9C) , "Paris” is added (row 410) in the same way. Since "Paris” and “London” occur twice, the row use vectors have bits added, as indicated, at rows 412 and 114.
  • the last value in the column for cities is Athens
  • the value Athens does not exist in the cities value set, and so it is added. Specifically, Athens is placed into the first position of the value set corresponding to the lexical ordering of the city names.
  • the entity select vector 423 for cities has a third binary bit inserted and set to "1" in the first ordinal position of the entity select vector to indicate that Athens is referenced in the column for cities.
  • a new row use vector 421 is added to the row use set.
  • Binary bits set to "0" are added to the row use vectors 417 and 419 to indicate that the unique values London and Paris do not occupy the fifth row of the column.
  • the new row use vector contains five bits and the fifth bit is set to "1" to indicate that the fifth row of the column contains the value Athens.
  • the values Cleveland, Fresno, Harrisburg, Los Angeles, New York, Rome, and San Francisco are added to the value set from the input column.
  • the values are arranged in the value set according to a lexical ordering. For each additional value, a corresponding binary bit set to "0" is inserted at a corresponding ordinal position of the entity select vector.
  • the binary bits set to "0" indicate that the values are not referenced by the column (418, FIG. 9C) .
  • the row use vectors 421, 417 and 419 corresponding to Athens, London and Paris, are in the order of occurrence corresponding to the binary bits set in the entity select vector 423. Additionally, the row use vectors 421, 417 and 419 correspond to the row use set 266 of FIG. 4, and entity select vector 423 is associated with the entity select vector 182 (FIG. 4) .
  • processing continues at decision block 360 in which the RPU 22 determines whether there are more files associated with the PARTS and the SHIPMENTS tables of the relational database. Processing continues at blocks 356 and 358 until the binary • representation for the PARTS and SHIPMENTS tables are constructed (418, FIG. 9C) .
  • the binary representations for the PARTS and SHIPMENTS tables are generated in the same fashion as the SUPPLIERS table (FIGS. 9A, 9B and 9C) discussed above. Processing returns during 362 to the calling routine of the.BINARY REPRESENTATION routine (FIG. 7) .
  • FIGS. 10A, 10B, IOC, 10D, 11A, 11B, 11C, 11D, 12, 13, 14, 15A and 15B, 16, 17, 18, 19, 20, 21, 22A, 22B, 22C, 22D, 22E, 22F, and 22G depict flowcharts of operations performed on relations in their binary represented form.
  • FIGS. 5A, B, C and D are flow diagrams of the utility function called INSERT.
  • FIGS. 11A, B, C and D are flow diagrams of the function DELETE.
  • FIG. 14 is a flow diagram of the relation operation called SELECT.
  • FIG. 16 is a flow diagram of the relational operation called PROJECT, and FIGS.
  • 22A, 22B, 22C, 22D, 22E, 22F, and 22G are flow block diagrams of the operation JOIN.
  • the functions INSERT and DELETE are basically for maintaining and manipulating data within the relations.
  • the relational operations, SELECT, PROJECT and JOIN are for generating resultant relations, and in the preferred embodiment, in a binary represented form. Only three relational operations SELECT, PROJECT and JOIN are discussed in order to simplify this disclosure and to provide a basic understanding on how relational operations are preformed on binary representations of relations. For example, the operations PRODUCT, UNION, INTERSECTION, DIFFERENCE and DIVIDE, which are described in Date, "An Introduction ' To Database Systems," Vol. 1. (4th ed.
  • the resulting compressed bit strings are then stored to memory 18. They stay in memory 18 until a request to perform a relational operation (i.e., INSERT, DELETE, SELECT, PROJECT, JOIN) is initiated and is interpreted at the command interpreter 28.
  • a relational operation i.e., INSERT, DELETE, SELECT, PROJECT, JOIN
  • FIGS. 4, 5 and 6 the binary representations of the relations of the rela- tional data base stored in memory 18 are shown in FIGS. 4, 5 and 6. It is also assumed that the bit vectors of the binary represented relation could also be in compressed impulse formats. However, for ease of understanding, the bit vectors are processed in the uncompressed form.
  • the binary represented relation(s) are brought to the RPU 22 via buses 48 and 31.
  • the specified relational operation is performed on the relation(s) . If any Boolean operations need be performed by the relational operation, then the steps for processing compressed bit strings as set forth in Glaser et al. are preformed by the BLU 24. Once processing is completed at the RPU 22, the RPU 22 outputs a new binary represented relation. The new output relation is sent to memory buses 31 and 48, where the resultant relation resides until it is sent back to RPU 22 for further processing.
  • A. INSERT INSERT is a function which adds one value at a time to a relation.
  • the necessity for performing an INSERT operation occurs in three categories of cases. First, a unique value needs to be added to a domain or value set, and it needs to be added to a column of a relation. Second, a unique value already exists in the value set, and it needs to be added into a column of a relation. Third, a value already exists in a column and it needs to be added again to the column. Multiple values may be added to a value set or to a column; however, the INSERT subroutines must be processed more than once corresponding to each time a value is added.
  • the function INSERT can be used to add values to more than one column of a relation.
  • the INSERT function is separately performed each time a value is added to the relation.
  • FIGS. 10A, B, C and D depict routines for processing any one of the three situations
  • FIG. 10A is a routine for adding a unique value to a value set, and adding the unique value to a column of a relation; the routine is called INSERT.
  • FIG. 10B is a flow diagram of a subroutine for adding a unique value to the value set, and this routine is called INSERT VALUE INTO VALUE SET.
  • FIG. IOC is a routine for updating an entity select vector to reflect the addition of a unique value into a particular subset; this routine is called UPDATE SUBSET.
  • FIG. 10D is a routine for updating a column of a relation with a new occurrence of a value; this routine is called ADD VALUE TO COLUMN. The routine in FIG.
  • FIG. 10D would by itself be used to insert a value into a column of a relation when the value already existed in a column.
  • the flow diagrams of FIG. IOC and 10D are combined for inserting a value already existing in the value set into a subset of values of the value set and adding the value to a column of a relation.
  • FIG. 10A a more detailed description of the INSERT routine is now discussed.
  • the system calls the INSERT VALUE INTO VALUE SET routine (FIG. 10B) to add a unique value to the value set.
  • block 424 is called to call the subroutine UPDATE SUBSET (FIG. IOC) .
  • This routine updates an entity select vector corresponding to the value set so that the unique value is represented in an associated subset.
  • the system performs the ADD VALUE TO COLUMN subroutine (FIG. 10D) to add the value into a specified column of the relation. Processing is completed; a value is added to the value set and to a column and the system returns at block 428 to the calling program.
  • the RPU 22 via the BBVP 14 determines the ordinal position in the value -set at which the new, unique value is to be inserted.
  • the value set is stored in a structure which is traversed to find the new value.
  • the structure contains all of the unique values presently stored in the value set, and is built as such to minimize access time for finding values.
  • the system locates a node of the structure, which corresponds to the ordinal position at which the new, unique value should be placed.
  • the system determines whether the value already exists. If the value already exists, then during block 434, the system returns to the calling program. Assuming that the value is not in the struc- ture, then the new value is added to the value set by adding a node assigning an already existing node in the structure to incorporate the unique value.
  • processing returns to the calling program.
  • the RPU 22 via the BBVP 14 determines the ordinal position of the entity select vector which corresponds to the ordinal position of the unique value in the value set.
  • decision block 441 the RPU 22 determines whether the unique value has been added to the value set. If the unique value has been added, block 442 is called.
  • block 442 the system inserts a bit to the entity select vector at the ordinal position corresponding to the new unique value. Processing continues at block 444.
  • the new bit added to the entity select vector is set to "1" to indicate the unique value in the subset.
  • block 443 is called.
  • the RPU 22 via the BBVP 14 determines whether the bit in the entity select vector has been set to "1". The binary bit set to "1" indicates that the column contains this unique value. If the bit has been set to "1", the processing returns to the calling program during block 445. However, if the bit is not set, then processing continues in block 444. In block 444, the new bit added to the select vector is set to "l" to indicate that the unique value is in the subset. During block 446, processing returns to the calling program. Referring to FIG. 10D, a more detailed discussion of the ADD VALUE TO COLUMN routine is now discussed.
  • the RPU 22 via BBVP 14 counts the number of binary bits set to "1" in the entity select vector up to and including the bit at the ordinal position corresponding to the unique value inserted. This number is called “count”.
  • the "count” of binary bits set to "1” corresponds to the location of the row use vector in the row use set.
  • the system inserts the new row use vector at the position in the row use set corresponding to "count.”
  • the system appends a binary bit set to "0" to all of the row use vectors of the row use set.
  • the system sets the last bit of the new or selected row use vector to "1" to indicate that the new value is added to the last row of the column.
  • processing returns to the calling program.
  • processing continues at blocks 452.
  • bits set to "0" are appended to the existing row use vectors and during block 454, the last bit of the new row use vector is set to "1".
  • FIG. 12 is a detailed results table depicting the various steps performed by the INSERT routine (FIG. 10A) .
  • the results table of FIG. 12 is broken up into three columns.
  • the first column depicts the existing value set
  • the second column is the entity select vector characterizing a subset of the value set
  • the third column is the row use set representing the names column of the relation.
  • Each column of the results table depicts a change in either the value set, entity select vector, or row use set as the INSERT routine (FIG. 10A) is performed.
  • the subroutines depicted in FIGS. 10A, B, C and D are transparent to the application. Only the following type of instruction is required:
  • This instruction is interpreted by- the command interpreter 28 (FIG. 1A) to add a new and unique value Zeus to the value set for names and to add the word Zeus to the binary representation of the relation for suppliers, currently stored in memory 18.
  • the RPU 22 brings the binary representation of the suppliers relation from memory 18 to the RPU 22. Additionally, the value set is brought from the external device 12 via bus 30 to the RPU 22. Because the unique value Zeus is not part of the value set names, the RPU 22 calls the routine (FIG. 10A) to insert the value Zeus into the column of names in the relation and also to add the unique value Zeus to the value set of names, referring to the INSERT routine (FIG. 10A) .
  • the RPU 22 calls the subroutine INSERT VALUE INTO VALUE SET (FIG. 10B) .
  • the system determines the ordinal position in which the value Zeus is to be inserted.
  • the RPU 22 determines that Zeus does not exist in the value set, and in block 436, the RPU 22 adds Zeus to the proper node in the structure for the value 494 (FIG. 12) . If a node did not exist in the structure, then a new node would be added and set to Zeus.
  • the RPU 22 returns to block 424 of the INSERT routine (FIG. 10A) .
  • the RPU 22 calls the subroutine UPDATE SUBSET (FIG. 10C) .
  • the RPU 22 via the BBVP 14 determines the ordinal position of the entity select vector which corresponds to the unique value Zeus in the value set. The ordinal position is the last position of the entity select vector.
  • RPU 22 determines that the value Zeus had been previously added to the value set of names, and processing continues at block 442.
  • the system inserts a bit to the entity select vector (494, FIG. 12) to indicate that a value exists in the last ordinal position of the value set.
  • the RPU 22 sets the new bit to "1" in order to indicate that the value Zeus is added to the subset (496, FIG. 12). Then, in block 446, the RPU 22 returns to block 426 of the INSERT routine (FIG. 10A) .
  • the RPU 22 calls ADD VALUE TO COLUMN routine (FIG. 10D) .
  • the RPU 22 determines the "count” of binary bits which are set to "1", up to and including the bit which corresponds to ordinal position corresponding to the new value Zeus. "Count” is equal to six because there are six binary bits set to "1" in the entity select vector; the bit corresponding to the value ' Zeus is the sixth binary bit set to "1". With the "count", the RPU 22 determines if there presently resides a row use vector which corresponds to the value Zeus.
  • a row use vector does not exist at position six as determined previously at block 440 (FIG. IOC), and during block 450, the RPU 22 inserts a new row use vector at the sixth position of the row use set.
  • the RPU 22 counts, from left to right, six row use vector positions in the row use set.
  • the RPU 22 adds a new row use vector having binary bits set to "0".
  • the RPU 22 appends binary bits set to "0" to the end of all of the row use vectors in the row use set (500, FIG. 12).
  • the last bit of the newest and sixth position row use vector is set to "1" to indicate that Zeus now occupies the last row of the column for names in the relation for suppliers (502, FIG. 12).
  • DELETE is an operation which removes one value at a time from a binary representation of a relation and possibly from a value set.
  • the DELETE operation occurs in three categories of cases. First, a unique value need not be removed from a subset; however, the unique value needs to be removed from the column. Second, a unique value exists in a column, and it needs to be deleted from the relation and from a subset; however, it does not need to be removed from the value set. Third, a unique value already exists in a relation and it needs to be deleted from the column from a corresponding subset and from a value set.
  • FIGS. 11A, B, C and D depict routines for processing any one of the three situations discussed above.
  • the flow diagram in FIG. 11A is a routine for removing a unique value from a value set and for removing the unique value from a column of a relation; this routine is called DELETE.
  • FIG. 11B is a flow diagram of a routine for removing a value only from a column. This routine is called DELETE VALUE FROM COLUMN. This routine by itself could be used to DELETE a value, one or more times, from a particular column.
  • the flow diagram of FIG. 11C is a routine for updating an entity select vector to reflect the removal of a unique value from a particular subset; this routine is called DELETE VALUE FROM SUBSET.
  • FIG. 11D is a routine for removing a unique value from a
  • the RPU 22 calls the DELETE VALUE FROM COLUMN
  • 10 22 calls block 462(a) to determine whether there is "1" bit set in the ordinal position corresponding to the value being deleted in any of the entity select vectors for the relational database. If there are, then the value is presently being used in other relations and,
  • the DELETE routine (FIG. 11A) can be called successively to DELETE one or more value of a row of one relation.
  • the RPU 22 determines which row use vector of the row use set is associated with the particular value to be removed from one row of the column. This operation can be conducted by performing successive Boolean AND operations on the row
  • 35 use vectors and a binary bit only having a single bit set to one at the ordinal position corresponding to the row position of the column. The bit is changed from “1" to "0", in the appropriate row, indicating the absence of the value in the particular row of the column.
  • the RPU 22 via the BBVP 14 determines whether all of the bits of the row use vector have been set to "0". If not all of the bits of the row use vector are set to "0”, then a "DONE" signal is returned to the calling routine at block 474. If all of the binary bits of the row use vector are set to "0", then processing returns at block 478 to the calling routine.
  • the RPU via BBVP 14 determines the ordinal position in the corresponding entity select vector associated with the unique value whose row use vector has been deleted from the row use set.
  • the RPU 22 via the BBVP 14 sets the binary bit in the entity select vector, associated with the unique value to "0" to indicate the absence of the unique value in the subset.
  • processing returns to the calling program.
  • the system removes the value from the value set by removing the value in the value set structure. Additionally, the binary bit corresponding to the deleted value in all entity select vectors is also removed to account for the reduced size of the value set.
  • FIG. 13 is a detailed results table depicting the various tasks performed by the DELETE routine (FIG. 11A) . This example has been chosen to illustrate all of the routines for deleting a value from a value set and from a binary representation of a relation.
  • the results table of FIG. 13 is broken up into three parts. From left to right, the first column depicts the existing value set, including the unique value Zeus; the second column is the entity select vector representing the subset of the value set for names corresponding to column for names, including the name Zeus; and the third column is a row use set which represents the "names" column of the suppliers relation.
  • Each row of the results table depicts a change in either the value set, entity select vector, or row use set as the DELETE routine (FIG. 11A) is performed.
  • the routines depicted in FIGS. 11A, B, C and D are all transparent to the application.
  • the application provides only the following instruction to the system:
  • RPU 22 calls the subroutine DELETE VALUE FROM COLUMN
  • FIG. 11B to remove the value Zeus from the names column of the relation.
  • the RPU 22 via the BBVP 14 changes the binary bit set to "1" to "0" in the row use vector to indicate that the value Zeus is no longer in the column for names
  • block 472 the system determines whether all of the bits of the row use vector have been set to "0". Zeus only appeared in the column once, and thus, by changing the one binary bit to "0", Zeus is no longer represented in the column; all of the binary bits of the row use vector are set to "0". Thus, in block 478, processing returns to block -462 of the DELETE routine (FIG. 11A) .
  • the DELETE VALUE FROM SUBSET routine (FIG. 11C) is called to remove the unique value Zeus from the subset depicted by the entity select vector. Specifically, during block 463, the row use vector associated with the unique value Zeus is removed from the row use set. Processing continues at block 480, during which the ordinal position of the binary bit associated with the value Zeus in the entity select vector is determined. Specifically, the RPU 22 via the BBVP14, during block 480 determines that the ordinal position of the value Zeus in the entity select vector is the tenth position.
  • the system sets the tenth binary bit from "1" to "0" to indicate the absence of the value Zeus ' from the subset (510, FIG. 13).
  • processing returns to the DELETE routine (FIG. 11A) at block 462(a).
  • the RPU 22 determines whether there are any "1" bits set at the ordinal positions corresponding to Zeus in any entity select vector.
  • the entity select vectors corresponding, to the value set of names for the entire relational database are evaluated to determine if the unique value is referenced in any other subset of the relational database. Because the other tables (i.e., parts (FIG. 5) and shipments (FIG.
  • the RPU 22 removes Zeus from the value set and the corresponding bit in the entity select vector is also removed (512, FIG. 13) . Processing continues at block 490, during which the RPU 22 returns to the DELETE routine (FIG. 11A) at block 466, where processing returns to the calling program.
  • INSERT and DELETE which are basically functions for updating binary representations of relations.
  • the next operations are query functions for finding relevant information about a relation or groups of relations.
  • the operations include SELECT, JOIN and PROJECT, and are principally used for determining a resultant binary relation.
  • the resultant binary relations can then be converted into their byte value form for users to under ⁇ stand. This section concentrates on the operation SELECT which generates a resultant binary relation for depicting which row or rows of a relation contain
  • SELECT determines the rows of the relation which correspond to a particular value or values, in one or more columns of a relation and the result is depicted in a binary representation: a binary bit vector called a "select vector".
  • SELECT operation A typical example of a SELECT operation .might be for determining which suppliers (i.e., Smith, Jones, Blake, Clark and Adams) are located in Athens (63, FIG. 2). A more detailed discussion on this query will be presented shortly.
  • a flow diagram of the SELECT operation is depicted.
  • the RPU 22 via the BBVP 14 determines the ordinal positions of one or more selected unique values, which are in one column of the relation, in a particular value set.
  • the RPU 22 determines whether the selected unique values are found in the value set. If the selected unique values are not found in the value set, then processing returns to the calling program during block 520. Assuming that the selected unique values are found in the value set, then in block 522 a binary bit vector displaying the ordinal positions of the selected values within the value set is generated.
  • the binary bit vector contains bits set to "1" at the ordinal positions corresponding to those of the selected values and the remaining bits of the bit vector are set to "0".
  • the bit vector generated in block 522 is "ANDed" with the entity select vector, corresponding to the column in which the values reside, to determine whether the selected unique values are referenced in the column.
  • the RPU 22 determines whether the resultant bit vector has all bits of the resultant vector set to "0"; i.e., if the corresponding set is empty. If resultant bit vector is all zeros, then the selected unique values are not in the column, and thus, no select vector can be generated and processing returns at block 528 to the calling program.
  • processing continues at block 530 in which the RPU 22 determines count; the number of binary bits, in the entity select vector, which are set to "1" up to and including the ordinal position of each selected value. For each unique value, the count is then used to determine which row use vectors of the row use set correspond to the selected unique values. During block 532, the row use vectors, corresponding to "count", are retrieved. Processing continues at block 536, in which the RPU 22 determines whether one or more unique values were selected from the particular column over which this part of SELECT is processed.
  • the RPU 22 returns the one row use vector, corresponding to the unique value, retrieved at block 532. Processing returns to the calling program at block 542. However, if more than one unique value from a particular column was selected by the application for this operation, .then during block 537 the Boolean OR operation is performed on the selected row use vectors to determine a resultant relation.
  • the Boolean OR operation is performed by the BLU 24 (FIG. 1A) .
  • the steps for performing Boolean operations for compressed bit string is fully discussed in Glaser et al, which was referenced earlier.
  • RPU 22 determines whether the values selected are from only one column of the relation.
  • RPU 22 determines whether any more row use vector for values need to be selected from other columns. If more values need to be processed, then processing continues at blocks 516, 518, 522, 524, 526, 530, 532, 536, 537, 538 and 539 until all of the row use vectors are processed and the entity select vector for each resultant column is generated.
  • a Boolean operation specified by the SELECT instruction is performed on the entity select vectors.
  • the SELECT instruction might require the determination of whether one value, in one column of the relation, is associated with another value in a different column of the relation.
  • the select vectors for the two values would be ANDed together to determine whether both values reside in the same row of the relation.
  • any of the Boolean operations i.e. OR, XOR, etc.
  • the operation is assumed to be AND. (For a more detailed discussion, refer to detailed ex ⁇ amples.)
  • the flow diagram of FIG. 14 depicts the SELECT operation for returning a resultant entity select vector (e.g. binary bit vector) for depicting which rows of a relation contain one or more selected values.
  • the rows of the relation corresponding to the selected values can be displayed to the user.
  • the RPU 22 determines which row use vector of the corresponding row use set contains a binary bit set to "1" in the ordinal position corresponding to the selected row position.
  • An indexing function is performed, which determines the position of the selected row use vector in the corresponding row use set. Specifically, RPU 22 counts the number of row use vectors in the row use set up to and including the selected row use vector. This number corresponds to the ordinal position of the binary bit set to "1" in the entity select vector, which references the unique value of the relation. The unique value is retrieved from the value set. For each column of the relation, the value in the selected row is determined and displayed for the user.
  • a step is added for selecting the unique value according to whether the selected value is greater than, less than, equal to, not equal to, equal to or greater than, or equal to or less than a prespecified value selected by the application program or user.
  • the actual values in the selected rows of the relation are determined and displayed to the user via a mapping function through a vector called the "entity use vector".
  • entity use vector For each column of the relation, an entity use vector is maintained for identifying a value in the value set which corresponds to the value at a particular row of the column.
  • FIG. 15 is a detailed results table depicting the various steps performed by the SELECT routine (FIG. 14) .
  • the results table of FIGS. 15A and B are broken up into six columns.
  • the first column depicts the value set associated with the selected unique value
  • the second column is a bit vector corresponding to the ordinal position of the selected unique value within the value set
  • the third column is the entity select vector associated with the column in which the value resides
  • the fourth column is a resultant bit .vector determined by ANDing the ordinal position bit vector with the entity select vector
  • the fifth column is the row use set associated with the column in which selected unique value resides
  • the last column is the select vector determined by performing a Boolean OR operation on all of the row use vectors corresponding to the selected unique values from one column.
  • the query in this example is a simplified SELECT query to minimize the explanation and steps required to perform the operation. However, generally, the query will be over several value columns of the relation (see the next example for select) .
  • This example could easily be expanded to determine which of the suppliers IDs (i.e., SI through S5) is located in Athens.
  • the row use vector corresponding to Athens indicates the rows of the relation which contain the supplier IDs associated with Athens. For simplicity, in this example we are concerned with only one of the supplier IDs, namely S5, and whether it is associated with Athens. Referring now to FIGS. 14, 15A and B, a detailed example of the query for determining whether supplier ID S5 is located in Athens is now discussed.
  • the system determines the ordinal position of S5 in the value set for suppliers IDs. Essentially, the system traverses a structure associated with the suppliers ID value set and determines the specific node in the structure where the S5 value resides. During block 518, the system determines whether or not the value S5 has been found in the value set. The value S5 is located in the structure, and thus, it is within the value set for the suppliers IDs. In block 522, the system creates a binary bit vector for representing the ordinal position within the value set associated with value S5 (544, FIG. 15A) . As shown at row 544 of FIG.
  • the fifth ordinal position in the new binary bit vector is set to "1" corresponding to the ordinal position of the value S5 in the value set.
  • the new binary bit vector is ANDed with the entity select vector associated with the suppliers identifiers in the suppliers relation (546, FIG. 15A) .
  • the resultant bit vector from- the AND operation is evaluated and it is determined that bit vector is not an empty set.
  • the resultant bit string contains a binary bit set to "1" (548, FIG. 15A) .
  • the unique value S5 is located in the subset referenced by the entity select vector of the suppliers column, and thus, it is in the relation of suppliers.
  • a count is performed on the entity select vector to determine the number of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique value S5.
  • the RPU 22 determines that there are five binary bits set to "1", and thus, the unique value S5 is associated with fifth row use vector of the row use set associated with suppliers IDs (260, FIG. 4) .
  • the row use vector associated with unique value S5 is retrieved from the row use set (260, FIG. 4) .
  • the row use vector for S5 is a binary bit vector containing four binary bits set to "0" and a fifth binary bit set to "1", indicating that the value S5 resides in the fifth row for the column for the suppliers IDs (550, FIG.
  • RPU 22 determines that there is only one value selected from the column for suppliers IDs. Processing continues in block 538, in which the RPU 22 determines that more than one column is involved in this select operation, i.e., the supplier ID and city columns. In block 539, the RPU 22 determines that the value Athens has also been selected by the user in this query, and thus, processing returns to block 516. In block 5-16, the RPU 22 determines the ordinal position of Athens in the value set for cities. Essentially, the system traverses the value set for cities, and during block 518, the RPU 22 determines that Athens is in the value set for cities.
  • a binary bit vector is constructed to indicate which ordinal position of the value set for cities contains the city Athens (552, FIG. 15A) .
  • the system creates the binary bit vector, which shows a binary bit set to "1" in the first ordinal position (552, FIG. 15A) .
  • the Boolean AND operation is performed between the new binary bit vector and the entity select vector associated with cities for the suppliers relation (556, FIG. 15A) .
  • the RPU 22 determines that the resultant vector does not contain all "O's" (558, FIG. 15B) , and thus, the value Athens is determined to be in the suppliers relation.
  • the RPU 22 would return at block 528 to alert the user that the selected value for Athens, although found in the value set, is not within the suppliers relation. Processing continues at block 530, in which the RPU 22 does a count of the binary bits set to "1" up to and including the ordinal position of the binary bit associated with the unique number value Athens. RPU 22 determines that Athens is associated with the first binary bit set to "1" in the entity select vector, and thus, the unique value Athens corresponds to the first row use vector in the row use set (266, FIG. 4). In block 532, the RPU 22 retrieves the row use vector (560, FIG. 15B) associated with
  • the row use vector for Athens contains four binary bits set to "0", followed by a binary bit set to "1", indicating that the value Athens occupies the fifth row of the column associated with cities.
  • the RPU 22 determines that there are no more unique values selected in the column of cities, and thus, processing continues at block 538, in which the RPU 22 determines that more than one column, i.e., suppliers and city, was selected by the user.
  • the RPU 22 determines that no more value need to be selected.
  • the row use vectors, associated with S5 and Athens are ANDed together to generate a resultant select vector, which represents the rows of the relation which satisfy the query (562, FIG. 15B) .
  • the resultant binary bit vector contains four binary bits set to "0” followed by a binary bit set to "1", indicating that the fifth row of the relation contains the supplier ID S5 and the city
  • the actual row of the relation can be reconstructed and displayed to the user in one or two ways.
  • the system can use the entity use vectors and associated row use vectors to map the row number determined by the select vector to the ordinal position in each value set. A more detailed discussion on the entity use vector approach will be discussed in Part VI.
  • the RPU 22 could trace back from the row use vectors to the entity select vectors and then back to the value set to determine the unique values in the fifth row of relation.
  • FIG. 16 is a detailed results table depicting the various steps performed by the SELECT routine (FIG. 14) .
  • the results table of FIGS. 16A and B are broken up into seven columns.
  • the first column depicts the value set associated with the unique values
  • the second column is a bit vector corresponding to the ordinal positions of the selected unique values within the value set
  • the third column is the entity select vector associated with the column in which the selected values reside
  • the fourth column is a resultant bit vector determined by ANDing the ordinal position bit vector with the entity select vector
  • the fifth column is the row use set associated with the column in which the selected values reside
  • the sixth column is the select vector determined by performing a Boolean OR operation on all the row use vectors corresponding to selected unique values from one column
  • the last column is the resultant vector determined by performing a Boolean AND operation on the select vectors determined for the selected values from more than one column.
  • the RPU 22 determines the ordinal positions of the suppliers names Smith and Blake in the value set for suppliers names. Essentially, the system traverses a structure associated with the supplier name value set and determines the specific nodes in the structure where the Smith and Blake values reside. .
  • the RPU 22 determines whether or not the values Smith and . Blake have been found in the value set. The values Smith and Blake are located in the structure; thus, they are x ⁇ ithin the value set for the suppliers names.
  • the RPU 22 creates a binary bit vector for representing the ordinal positions within the value set associated with the values. Blake and Smith are as shown in row 565 of FIG. 16A, the third and ninth ordinal positions. The new binary bits are set to "1" corresponding to the ordinal positions of the values in the value set for suppliers names.
  • the new binary bit vector is ANDed with the entity select vector associated with the suppliers names in the suppliers relation (567, FIG. 16A) .
  • the resultant bit vector from the AND operation is evaluated and it is determined that the resultant bit vector is not all zeros.
  • the resultant bit string contains binary bits set to "1" (569, FIG. 16A) .
  • a single "count”, with two ordinal positions as input, is performed on the entity select vector, up to and including the binary bit associated with the unique value Smith, the last value characterized in the new entity select vector.
  • the RPU 22 determines that there are two binary bits set to "1", one for the count corresponding to Blake and that there are five binary bits set to "1" corresponding to the count for Smith. Therefore, the unique values, Blake and Smith, are associated with the second and fifth row use vectors of the row use that is associated with suppliers names (262, FIG. 4) .
  • row use vectors associated with the unique values Smith and Blake are retrieved from the row use set (262, FIG. 4) .
  • the row use vector for Blake is a binary bit vector containing five binary bits in which the third binary bit is set to "1", indicating that the value Blake resides in the third row of the column for the suppliers names (570, FIG. 15) .
  • the row use vector for Smith is a binary bit vector containing five binary bits and the first bit is set to "1", indicating that the value Smith resides in the first row of the column for suppliers names (570, FIG. 16A) .
  • RPU 22 determines that there is more than one value selected from a column for suppliers names, thus processing continues at block 537.
  • the row use vectors associated with Smith and Blake (570, FIG. 16A) are ORed together to form the select vector as shown at 571 of FIG. 16A.
  • the RPU 22 determines that more than one column is involved in this select operation, i.e., the suppliers names column and the city column. It continues at block 539, during which the RPU 22 determines that the values London and Paris are selected in the separate column cities. In block 516, the RPU 22 determines the ordinal positions for London and Paris in the value set for cities.
  • the system transverses the value set for cities, and during block 518, the RPU determines that London and Paris are both located in the value set for cities.
  • a binary bit vector is constructed to indicate which ordinal positions the value set for cities are associated with the cities London and Paris (573, FIG. 16A) .
  • the system creates a binary bit vector which shows binary bits set to "1" in the fifth and eighth ordinal positions (573, FIG. 16A) .
  • the Boolean AND operation is performed on the new binary bit vector with the entity select vector associated with cities for the suppliers relation (575, FIG. 16B) .
  • the RPU 22 determines that the resultant vector does not contain all zeros (579, FIG.
  • the RPU determines that Paris is associated with the third binary bit set to "1" in the entity select vector and, thus, Paris corresponds to the third row use vector in the row use set (266, FIG. 4).
  • the RPU 22 retrieves the row use vectors associated with London and Paris (561, FIG. 16B) .
  • the row use vector for London contains five bits, the first and fourth bits of the row use vector containing binary bits set to "1".
  • the row use vector for Paris contains five binary bits, the second and third binary bits set to "1".
  • the row use vectors for London and Paris indicate that the first through fourth rows of the column are associated with cities London, Paris, Paris, London.
  • the RPU determines that more than one unique value was selected in the column of cities and, thus, processing continues.
  • the RPU performs the Boolean OR operation on the row use sets for London and Paris.
  • a select vector is generated (563, FIG. 16B) , which has five binary bits and bits one through four are set to "1".
  • the RPU determines that more than one column of the relation was involved in the SELECT, i.e., suppliers names and cities.
  • the RPU determines that no more values in other columns need to be selected from the relation.
  • the select vectors associated with the suppliers names (571, FIG. 16A) and cities (563, FIG.
  • the resultant entity select vector indicates that the first and third binary bits are set to "1", indicating that the first and third rows of the suppliers relation contain information on whether Smith is associated with Paris and/or London and whether Blake is associated with Paris and/or London.
  • the actual rows in the relation can be reconstructed to display to the user in one of two ways. First, the system can use the entity use vectors associated with each column of the relation to map the row numbers determined by the resultant select vector, to the ordinal positions in the appropriate value sets.
  • the RPU 22 could trace back from the various row use vectors to the entity select vectors and back to the appropriate value sets to determine the actual unique values in the first and third rows of the relation.
  • the purpose of the RECONSTRUCT operation is to generate the values associated with a particular column of a relation for the user to ascertain.
  • the binary representation of a relation is constructed and stored in memory 18. If the user of the system wishes to see the actual relation and the values depicted in the relation, then the RECONSTRUCT operation can be per- formed for reconstructing and displaying the relation to the user.
  • the flow diagrams in FIGS. 17A and 17B are for reconstructing and displaying various columns specified by the user or an applications program.
  • the user will specify one or more columns of a relation to be displayed by the system.
  • the user might request the Suppliers ID column of the relation for suppliers at 63 of FIG. 2, which is currently stored in its binary representation in memory 18 (FIG.
  • the first step in performing the RECONSTRUCT operation is in block 565, in which the user specifies various columns of a relation to be reconstructed. As stated above, an applications program may also specify particular columns of the relation to be projected. -For example, when the SELECT operation is performed, the resultant binary representation can then be reconstructed and displayed to the user via the RECONSTRUCT operation (FIG. 17A) .
  • block 566 calls the routine DISPLAY/RECONSTRUCT (FIG. 17B) to reconstruct one of the specified columns.
  • the DISPLAY/RECONSTRUCT routine (FIG.
  • 17B essentially performs the necessary steps for obtaining the values and for placing the values in the proper rows in the column.
  • the RPU determines whether there are any more columns that need to be displayed. If there are more' columns to be displayed, then processing continues at block 566. Blocks 566 and 567 are performed until all of the columns specified by the user or applications program have been reconstructed. If all of the columns have been reconstructed, then processing continues at block 568, in which the RPU returns to the calling program.
  • a flow ⁇ diagram of the DISPLAY/RECONSTRUCT routine is depicted.
  • the RPU 22 obtains ' the entity select vector associated with the particular column to be displayed.
  • the first row use vector associated with the row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18, is transferred to the RPU 22.
  • the RPU 22 performs a Boolean AND operation on the row use vector obtained in block 577 with a row select vector.
  • the row select vector is created by performing a query operation on the relation thereby selecting which rows of the relation the user or application program wishes to display.
  • the row select vector is a new binary bit vector and each binary bit corresponds to a row of the column or columns to be displayed. A binary bit set to "1" indicates that the corresponding row needs to be displayed.
  • the result of the AND operation is a new vector Z which depicts the rows of the column which contain a particular value associated with the row use vector.
  • the results of the AND operation are sent to memory 18 for future processing.
  • the RPU 22 determines whether the resultant vector Z is "0". If the resultant vector Z is "0", then during block 581 a "0" is placed in the first binary position of a new vector called the index vector.
  • the index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the row use set. Each bit indicates whether the unique value associated with the row use vector exists in the relational column to be displayed. If a binary bit in the index vector is set to "0", the unique value associated with the row use vector does not exist in the column to be displayed; Whereas, if the binary bit is set to "1" in the index vector, then the unique value associated with the row use vector exists one or more times in the column. Processing continues at block 593, during which the next row use vector of the row use set is obtained.
  • processing continues at block 585.
  • the RPU 22 sets the binary bit in the index vector, which is associated with the current row use vector, to "1".
  • the resultant bit vector Z is stored in memory 18. The resultant vector Z is later used in the reconstruction process.
  • the RPU 22 clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z.
  • the purpose of this step is to shortcut the processing of the row use vectors in the row use set. Stated differently, when the row select vector is cleared, all of the values in the rows of the column have been determined. The row use vectors which have been processed with the row select vector contain all of the values to be displayed in the column. Then during block 591, the RPU 22 (28, FIG. 1A) determines whether the row select vector has been completely cleared, or stated differently, all of the binary bits have been set to "0".
  • processing continues at blocks 597, 601, 603, 605, 607, 609 and 611 to reconstruct the column with the values associated with the row use vectors in the row use set. However, if not all of the binary bits in the row use vector are set to "0", then processing continues at block 593. During block 593, the RPU 22 gets the next row use vector in the row use set currently being pro ⁇ Waitd. During block 595, the RPU 22 determines whether the end of the row use set has been reached. If the end of the row use set has been reached, then processing continues at blocks 597, 601, 603, 605, 6077 609 and 611 to reconstruct the column. However, assuming that the end of the row use set has not been reached, then processing continues at blocks 577, 579, 585, 587, 589, 591, 593 and 595 until all of the row use vectors of the row use set have been processed.
  • processing continues at block 597.
  • the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Each binary bit set to "1" indicates which row use vectors reference unique values which are to be displayed in the column.
  • the ordinal position of the binary bit set to "1" in the entity select vector is determined. Then, during block 603, the value associated with the ordinal position obtained in block 601 is obtained from the value set.
  • the RPU 22 finds the appropriate location in the index vector associated with the value. Then during 607, the appropriate resultant Z vector stored during step 587 is retrieved. The resultant vector Z indicates which rows of the column contain the unique value associated with the row use vector, and the unique value is placed in the column at the appropriate row locations.
  • the RPU 22 determines whether any more values are left for processing. Assuming that there are more values, then processing continues in blocks 601, 605 and 607 until all of the values have been placed in the proper rows of the column. Once all of the values have been placed into the column, then processing returns to the calling program during block 611. 1. Detailed Example o Performing
  • FIGS. 17A, 17B, 18A, 18B, 18C and 18D a detailed example for reconstructing the column for Suppliers IDs in the Suppliers relation (FIG. 2) is now discussed. More particularly, it is assumed that only the binary representation of the column Suppliers IDs exist in the RDMS 10. The binary representation of the column may be from a result of a SELECT operation or it may have been previously stored after processing by the BBVP 14. In either case, the binary representation of the column Suppliers IDs exist in memory 18 and now the user or an applications program needs to display the actual values of the column. Although, this example is for reconstructing and displaying only one column of a supplier for the suppliers relation, the PROJECT operation could be sequentially performed to supply all of the columns of the supply relation.
  • Results Table for depicting the results of the RECONSTRUCT operation for reconstructing or supplying the Suppliers ID column (80, FIG. 2) of the suppliers relation (63, FIG. 2) is shown.
  • Each row (800-848) of the Results Table depicts a result of the routines shown in FIGS. 17A and 17B.
  • the Results Table is. separated into seven columns. From left to right, the first column of the Results Table shows the entity select vector for the selected column to be displayed. The second column shows the row use vector set associated with the specified column to be displayed. The third column is the row use vector currently being processed, and the fourth column is the row select vector specified by the user application program for determining which rows of the column are to be displayed.
  • the fifth column is the result of ANDing the row use vector and the row select vector together; the result is called vector Z.
  • the sixth column depicts the reconstruction of the index vector for displaying which row use vectors of the row use set have associated unique values in the column.
  • the last column is for the reconstruction of the column to be displayed.
  • FIGS. 17A and 17B the operation performed by the RPU 22 (FIG. 1A) for displaying and reconstructing the Suppliers ID column (80, FIG. 2) is now discussed.
  • the user or application program selects the various column or columns which are to be reconstructed and displayed as a result of the query operation.
  • the user has selected the Suppliers ID column (80, FIG. 2) .
  • the Suppliers ID column only exists in the secondary memory (18, FIG. 1A) in the form of a binary representation or row use set.
  • the DISPLAY/RECONSTRUCT routine (FIG. 17B) is called for finding the values and reconstructing the Suppliers ID column.
  • the RPU 22B during block 571, the RPU 22
  • FIG. 1A finds the entity select vector associated with the Suppliers ID column in the memory 18 (FIG. 1A) .
  • the entity select vector stored in memory 18 (FIG. 1A) is transferred via bus 30 (FIG. 1A) to RPU 22 (FIG. 1A) .
  • the RPU 22 finds the row use set associated with the Suppliers ID column in memory 18 and transfers via a bus to the RPU 22 (FIG. 1A) .
  • the RPU 22 obtains the first row use vector of the row use set for the Suppliers ID column (804, FIG. 18A) .
  • the user wishes to display the first four rows of the Suppliers ID column.
  • a row select vector having four binary bits set to "1" in the first four ordinal .positions, i.e. "1 1 1 1 0".
  • the row use vector (804, FIG. ISA) and the row select vector (806, FIG. 18A) are transferred to the RPU 22 (FIG. 1A) .
  • the row use vector (804, FIG. 18A) and the row select vector (806, FIG. 18A) are ANDed together to determine a resultant binary vector Z (808, FIG. 18A) .
  • the resultant binary bit vector Z depicts the rows of the Suppliers ID column in which the unique value associated with the row use vector (804, FIG. 18A) is to reside.
  • the resultant binary vector Z contains a binary bit set to "1" in the first position which means that the unique value associated with the row use vector (804, FIG. 18A) will be placed in only the first ordinal position of the Suppliers ID column.
  • the RPU 22 determines that the result of the AND operation is not "0", thus processing continues to block 585.
  • the RPU 22 via BBVP 14 sets the first binary bits of the index vector to "1" (810, FIG. 18A) to indicate that the unique value associated with the row use vector (804, FIG. 18A) exists at least once in the Suppliers ID column.
  • the resultant binary bit vector Z is stored in memory 18 for future reconstruction of the column.
  • the binary bit set in the resultant binary bit vector Z indicate the rows of the Suppliers ID column with the unique value SI associated with the row use vector (804, FIG. 18A) reside.
  • the binary bits set to "l" in the row select vector which match the binary bits in the resultant vector Z are set to "0" (812, FIG. 18A) .
  • the row select vector is a value determined if all of the binary bits have been set to "0".
  • the row select vector contains three more binary bits set to "1" (812, FIG. 18A) , thus it continues at block 593.
  • the RPU 22 obtains the next row use vector of the row use set (814, FIG. 18B) .
  • the row use vector (814, FIG. 18B) and the row select vector (816, FIG. 18B) are ANDed together during block 577.
  • the resultant vector Z is shown at 818 of FIG. 18B.
  • the resultant vector Z is evaluated to determine if all the binary bits of the resultant vector have been set to "0". Not all of the binary bits of the resultant vector are "0" (the resultant vector is "0 1 0 0 0" (818, FIG. 18B) ) , and thus during block 585, the second binary bit of the index vector is set to "1" (820, FIG. 18B) .
  • the RPU 22 (FIG. 1A) stores the resultant vector Z in memory 18 for future processing. Then during block 589, the binary bits of the row select vector which were set to "1" and matched the binary bit set to "1" in the resultant vector Z are set to "0"
  • the row select vector is evaluated to determine if all the binary bits have been set to "0".
  • the row select vector still has two binary bits set to "1" (821, FIG. 18C) , and during block 579, it is evaluated to determine if all the binary bits are set to "0".
  • the resultant vector Z contains one bit set to "1", and thus processing continues at block 585.
  • the RPU 22 sets the third binary bit of the index vector to "1" (828, FIG. 18C) .
  • the resultant vector Z is stored in memory 18 for future processing.
  • the binary bit of the row select vector which matched the binary bit of the row use vector is set to "0" (830, FIG. 18C) .
  • the row select vector is evaluated to determine if all the binary bits are set to "0".
  • the row select vector still has one binary bit set to "1" (834, FIG. 18C) , thus it continues at block 593.
  • the RPU 22 obtains the next row use vector of the row use set (832, FIG. 18C) .
  • the row use vector (832, FIG. 18C) and the row select_vector (834, FIG. 18C) are ANDed together during block 577.
  • the resultant vector Z is shown at block (836, FIG. 18C) .
  • block 579 it is determined that not all of the binary bits of the resultant vector are "0" (the resultant vector is "00010" (836, FIG. 18C) .
  • the fourth binary bit of the index vector is set to "1" (838, FIG. 18C) .
  • the resultant vector Z is stored for future processing.
  • the binary bit of the row select vector which were set to "1” and match the binary bit set to "1" in the resultant vector Z are set to "0" (840, FIG. 18D) .
  • the row select vector is evaluated to determine if all the binary bits had been set to "0".
  • All the binary bits of the row select vector are set to "0" (830, FIG. 18D) , and thus processing continues to block 597.
  • the RPU 22 determines the ordinal positions of the binary bits set to "1" in the index vector. Specifically, the first, second, third and fourth binary bits of the index vector are set to "1". Thus the first, second, third and fourth row use vectors of the row use sets are associated with unique values which exist in the Suppliers ID column.
  • RPU 22 (FIG. 1A) determines the ordinal positions of the entity select vector which correspond with the row use vectors having corresponding binary bits set to "1" in the index vector.
  • the values associated with each ordinal position of the entity select vector are obtained from the value set of Supplier IDs. Specifically, the values SI, S2, S3 and S4 are obtained. Then, during block 605 and 607, the first resultant binary vector Z is obtained from temporary storage, and the value SI is placed in the proper ordinal position of the column for Suppliers IDs. And during block 609, the RPU 22 (FIG. 1A) determines whether there are any more values left for processing. There are three more values left for processing, and thus block 605 and 607 are performed. During block 605 and 607, the second resultant vector Z is obtained and the value S2 is placed in the second ordinal position in the column (844, FIG. 18D) .
  • block 609 it is determined that there are more values left for processing, and thus block 605 and 607 are performed.
  • the third resultant vector associated with the third resultant vector Z which is associated with the value S3 is obtained from memory.
  • the value S3 is placed into the third ordinal position of the relational column for Suppliers IDs (846, FIG. 18D) .
  • block 609 it is determined that there is still one more value to process, and thus block 605 and 607 are performed.
  • the fourth resultant vector Z stored in temporary memory is obtained and the value S4 is placed into the fourth ordinal position of the column for Suppliers IDs (848, FIG. 18D) .
  • the RPU 22 (FIG. 1A) determines that there are no more values for processing and thus returns to the calling RECONSTRUCT routine (FIG. 17A) at block 567.
  • the RPU 22 determines whether there are any more columns to be displayed to the user. For this example, it is assumed that only the Suppliers ID column is to be reconstructed and displayed. However, if more columns of a particular relation were to be displayed, then processing would continue at blocks 566 and 567 until all of the columns were displayed. Also, the same row select vector would be used each time the DISPLAY/RECONSTRUCT routine (FIG. 17B) was performed. Assuming that there are no more columns to be displayed, then processing returns to the calling program during block 568. E. JOIN
  • a JOIN is a SELECT over the Cartesian product of more than one relation of the relational database.
  • the data shown in Table A above comes from the two relations; suppliers 63 and parts 65 (FIG. 2) .
  • the names of the relations are listed in the FROM clause, and connection between the two relations (63, 65 FIG. 2) is listed in the WHERE clause (i.e., the fact that the city values must be equal) which is called the JOIN predicate.
  • the JOIN is used to combine relations based on equivalent values in the column if specified by the JOIN predicate. In this case, the specified columns are the "city" columns of each relation.
  • the JOIN pairs each of the N rows of a first relation; e.g., the SUPPLIERS relation (63, FIG.
  • the comparison operator in a JOIN predicate be equality.
  • the EQUIJOIN by definition produces a result containing two identical columns as shown in Table A. If one of these two columns is eliminated, the result is called NATURAL JOIN.
  • the JOIN operation is the restriction of the Cartesian product of two or more relations.
  • the Cartesian product of a set of N relations is a new relation consisting of all possible rows "r", such that ⁇ r r" is the concatenation of all rows from the participation of relations. Once the Cartesian product is generated, all rows that do not satisfy the "JOIN predicate" are eliminated from the Cartesian product. What is left is the EQUIJOIN result relation.
  • the complete table contains thirty rows. Now, all the rows' Cartesian product in which S.CITY is not equal to P.CITY are eliminated and what is left is the EQUIJOIN result as shown earlier.
  • the JOIN relation is efficiently represented by binary bit vectors and second, the JOIN relation is constructed without having to create a cross product relation.
  • FIG. 19 represents the depiction of a JOIN relation from the following query: SELECT S.ID#,S.STATUS,S.CITY,P.ID#
  • This query is a projection of the JOIN because the P.CITY is not mentioned in the SELECT clause of the query. Like the EQUIJOIN example discussed above, this query requires that data come from two relations, namely the suppliers relation (63, FIG. 19) and the Parts relation (65, FIG. 19) . Both relations are named in the FROM clause and the connection between the tables is through the CITY columns in the WHERE clause.
  • the result of the JOIN for displaying the columns CITY, SUPPLIERS ID#s, STATUS, and PART ID # « s is shown at 628 of FIG. 19.
  • the entity select vectors at 600 and 602 are binary bit vectors that indicate which rows of the particular relation associated with the entity select vector, participate in the JOIN relation. More particularly, each binary bit has an ordinal position, which corresponds to a row of the relation. Binary bits 601, 603, 605, 607 and 609 correspond to the five rows of the suppliers relation 63. When the binary bit is set to "1" the particular row associated with the binary bit participates in the JOIN relation.
  • binary bit 601 indicates that the first row of the SUPPLIERS relation 63 participates in the JOIN relation.
  • the entity select vector 602 having binary bits 611, 613, 617, 619 and 621, indicates that the first, second, fourth, fifth and sixth rows of the PARTS relation 65 participate in the JOIN relation.
  • Vectors 600 and 602 act just like the entity select vectors discussed in FIGS. 4, 5 and 6. . The only difference is that the ordinal positions of the bits in the entity select vectors 600 and 602 do not correspond to ⁇ unique values in a value set. Instead, the binary bits of entity select vector 600 and 602 refer to row locations in a particular relation. Like the entity select vectors in FIGS. 4, 5 and 6, an implied mapping correspondence exists between each binary bit in the entity select vector to and a particular row use vector in an associated row use set. The implied mapping scheme is illustrated by the dotted lines 608, 610, 612 and 614, which show that the binary bits of the entity select vector 600 indicate the correspondence of the rows of the suppliers relation to the row use vectors 615, 617, 619 and 621, respectively.
  • the binary bits of the entity select vector 602 which correspond to the row use vectors of the Parts relation 65, are mapped in an implied manner to the row use vectors 623, 625, 627, 629 and 631, as shown by the dotted lines 616, 618, 620, 622 and 624.
  • the row use sets of the JOIN relation perform a dual task of representing the values in the rows of the JOIN relation and for depicting more than one column, of the JOIN relation.
  • the row use set 604 represents the columns S.CITY 638, S. ID# 636 and S.STATUS 634.
  • the row use set 606 represents the P.ID#, 630, the column of the JOIN relation 628.
  • the columns 638, 636, 634 and 630 depict the result of the JOIN.
  • the entity select vector 600 and the row use set 604 represent all of the binary information necessary to construct the suppliers relation portion 626 of the JOIN relation, 628; namely, columns 638, 636 and 634. The following is a detailed discussion on how this representation is achieved.
  • a dotted line 608 maps the row use vector 615 to the binary bit 601 of the entity select vector 600.
  • the binary bits of the row use vector 615 indicate the row positions of the columns 638, 636 and 634, which contain a particular value in the first row of the suppliers relation 63.
  • Binary bit 601 cor ⁇ responds to the values London, 20, Smith and SI.
  • To build the S.CITY column of the JOIN relation only the value London is referenced.
  • the three binary bits set to "1" in the row use vector 615 represent three occurrences of the value London in the S.CITY column 638 of the suppliers portion 626 of the JOIN relation.
  • the first three bits set to "1" in the row use vector 615 indicate three occurrences of the value SI.
  • the first three binary bits set to "1" in the row use vector 615 represent occurrences of the value 20.
  • the first three rows of the suppliers portion 626 of the JOIN relation 628 are characterized by the first three bits of the row use vector 615.
  • the next three rows of the suppliers relation 626 are characterized by the row use vector 621 of the- row use set 604.
  • the replication of the three bits in row use vector 615 indicates that the three values [LONDON, SI, 20] of the first table of the Supplier relation occur in rows 1, 2, 3 of the JOIN relation.
  • the remaining values and the columns of the Suppliers portion 626 are indicated by the row use vectors 617 and 619, which contain binary bits set to "1" in the seventh, eighth, ninth and tenth rows of both row use vectors. It should be noted that although the columns S.CITY, S.ID# and S.STATUS of the Suppliers relation are indicated by the query, the column SNAME in the SUPPLIERS relation, could just as easily have been represented by the row use set 604.
  • the values of the Parts relation are mapped into the JOIN relation 628 by the row use set 606 and the entity select vector 602 in exactly the same fashion as for Suppliers.
  • FIGS . 20 and 21 represent a more detailed view of the JOIN relation 628 (FIG. 19) . More particularly, the SUPPLIERS relation 63 (FIG. 19) is depicted by its row use sets at 63 (FIG. 21) and the Parts relation 65 (FIG. 20) , as depicted by its row use sets 65 (FIG. 20) . In addition, the entity select vectors 600 and 602 are shown corresponding to the row use sets 630 and
  • FIGS. 20 and 21 depict the value sets referred to in the SUPPLIERS and PARTS relations with their associated entity select vectors.
  • FIG. 19 a detailed discussion of the SUPPLIERS relation portion 626 (FIG. 19) is now presented.
  • the row use set 604 for the JOIN relation (628, FIG. 19) is mapped back to the Suppliers relation by the entity select vector 600.
  • the entity select vector 600 is repeated for each column of the Suppliers relation which has one or more values in the suppliers portion of the JOIN relation.
  • the implied mapping from each row use vector of the row use set 604 is shown by the dotted lines 608, 610, 612 and 614.
  • the row use sets for representing the columns of the suppliers relation are shown in FIG. 4.
  • the first three binary bits set to "1" in the row use vector 615 of the row use set 604 are mapped to the binary bit 601 of the entity select vector 600.
  • the entity select vector 600 corresponds to the row use set 260.
  • the first binary bit 601 is set to "1" indicating that the value in the first row of the S.ID# column of the suppliers relation is present in the JOIN relation.
  • a Boolean AND operation is performed on each row use vector 184, 186, 188, 190 and 192 to determine which row use vector contains a corresponding "1" bit in the first position.
  • Row use vector 184 contains a binary bit set to "1" in the first position.
  • Row use vector 184 maps back to the first binary bit of the entity select vector 176 for the suppliers relation.
  • the first binary bit of entity select vector 176 corresponds to the value SI in the value set for suppliers identifiers 160.
  • the first three binary bits set to "1" in the row use vector 615 indicate that the value SI is in the first three rows of the S column in the JOIN relation.
  • the first three binary bits set to "1" in the row use vector 615 are mapped to the entity select vector 600 associated with the row use sets 264 and 266, corresponding to the S.STATUS and S.CITY columns of the suppliers relation.
  • the row use vector 615 represents the first three values of the S.CITY column 638 and the S.STATUS column 634 of the prior portion of the JOIN relation. Referring to FIG. 21, a more detailed view of the
  • Parts relation portion of the JOIN relation is shown. Specifically, the mapping of the part ID #'s to the P.ID# column 630 of the JOIN relation is shown.
  • Row use vector 623 of the JOIN relation contains binary bits set to "1" in the first and fourth positions. These binary bits correspond to the first and fourth positions of the P.ID# column 627 of the JOIN relation.
  • Row use vector 623 is impliedly mapped to the first binary bit 611 of the entity select vector 602.
  • the entity select vector 602 corresponds to the row use set 304.
  • a Boolean AND operation is performed on the entity select vector in each row use vector of the row use set 304 to determine which row use vector contains the binary bit set to "1" in the first position.
  • the left most row use vector contains a binary bit set to "1" and this row use vector maps back to the first binary bit position of entity select vector 284.
  • the first binary bit of the entity select vector 284 is set to "1", indicating that the first row of the value set 282 contains the unique value mapped into the Parts relation 65 and into the JOIN relation.
  • binary bit 611 indicates that the value PI is mapped into the row use vector 623 of the JOIN relation, specifically, that of the first and fourth rows.
  • the remaining rows of the P.ID# column 627 of the JOIN relation are depicted -by the row use vectors 625, 627, 629 and 631 of the row use set 666.
  • FIG. 22G a detailed discussion on the operations performed by the RPU 22 (FIG. 1A) for performing a JOIN operation and constructing the JOIN relation is now discussed.
  • the routine EQUI- JOIN which builds a binary representation of the JOIN relation
  • FIG. 22B is a flow diagram of the routine BUILD ROW USE SETS for constructing the particular row use sets associated with each column of the JOIN relation.
  • FIG. 22C is a flow diagram for the routine CONSTRUCT JOIN ROW USE VECTORS for controlling the overall construction of each row use vector of a row use set in the JOIN relation.
  • FIG. 22A the routine EQUI- JOIN, which builds a binary representation of the JOIN relation.
  • FIG. 22B is a flow diagram of the routine BUILD ROW USE SETS for constructing the particular row use sets associated with each column of the JOIN relation.
  • FIG. 22C is a flow diagram for the routine CONSTRUCT JOIN ROW USE
  • FIG. 22D is a routine EVALUATE ROW USE SETS for determining the number of occurrences of the unique values participating in the JOIN operation.
  • FIG. 22E is a routine called PRODUCTS for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors for a particular row use set of the JOIN relation.
  • FIG. 22F is a routine called NUMS for determining the number of times a particular bit pattern repeats itself in a row use vector in the JOIN relation.
  • FIG. 22G is a routine called GENERATE BIT STRING for building the row use vector in the JOIN relation.
  • the binary representation of the one or more relations to be JOINed are found in the memory 18 of the RDMS 10.
  • the binary represented relations are stored until the RPU 22 is ready for processing.
  • the relations are sent via bus 48 to the BBVP 14.
  • the RPU 22 uses the relations in the BBVP 14, the RPU 22 performs the EQUI/NATURAL JOIN operation (FIG. 22A) to create a binary representation of the resultant JOIN relation.
  • S.CITY P.CITY
  • the RPU 22 performs a Boolean AND operation on the entity select vectors obtained to determine a resultant binary bit vector x.
  • the resultant binary bit vector indicates which values of the particular value set are common to all of the entity select vectors involved in the JOIN operation. Specifically, binary bits set to "1" in the resultant bit vector "x" indicate the values of the value set which are common to all the columns represented by the obtained entity select vectors.
  • the RPU 22 determines which binary bits of each entity select vector correspond to the binary bit set to "1" in the resultant bit vector x. For each binary bit set to "1" in the entity select vector that corresponds to a binary bit set to "1" in the resultant bit vector x, the RPU 22 obtains the row use vector in the associated row use set during block 658. Then, during block 660, for each row use set associated with each entity select vector, the Boolean operation OR is performed on the selected row use vectors of the associated row use set.
  • the resultant vectors are referred to as JOIN entity select vectors, which characterize values belonging to one or more JOIN columns in the JOIN relation.
  • a row use set corresponding to each JOIN entity select vector is constructed. Specifically, during block 664, the BUILD ROW USE SET routine (FIG. 22B) is called to construct each row use set corresponding to each JOIN entity select vector of the JOIN relation. When all of the row use sets for the JOIN relation have been constructed, processing returns to the calling routine in block 668.
  • routine BUILD ROW USE SETS The purpose of this routine is to construct the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns of the resultant JOIN relation.
  • the RPU 22 selects the first unique value i in the resultant bit vector x. In other words, the RPU 22 selects the first value represented by the occurrence of a "1" bit in the resultant bit vector x. Then, during block 673, a variable "START ROW" is set equal to zero. This variable indicates the start position of the first row in the JOIN row use vector in the JOIN row use set being generated and will be discussed in more detail along with the description of FIG. 22G. Then, during block 674, the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C is called.
  • This routine is performed by the RPU 22 to determine the characteristics of a particular row use vector of the JOIN relation corresponding to the value i and to build the row use vector or vectors associated with the value in the JOIN relation.
  • the RPU 22 determines if there are any more unique values which are indicated in the resultant vector x. If there are, then the next unique value, i, is obtained at block 678. Processing continues at the CONSTRUCT JOIN ROW USE VECTORS routine to build the JOIN row use vector(s) associated with the next unique value. Assuming . that there are no more unique values represented in the resultant bit vector x, then processing returns to the calling program (EQUI/NATURAL JOIN, FIG. 22A) at block 680.
  • the CONSTRUCT JOIN ROW USE VECTORS routine is now discussed. As stated above, the purpose of this routine is to determine the characteristics of the row use vectors of the JOIN relation and to build the row use vectors for a particular - row use set in the JOIN relation.
  • the routine EVALUATE ROW USE VECTORS is called. The purpose of this routine is to determine the number of occurrences of a particular value which participates in the JOIN operation. A more detailed discussion of this routine will be presented shortly with reference to FIG. 22D. Then, during block 686 the PRODUCTS routine (FIG. 22E) is called.
  • the PRODUCTS routine calculates a series of product terms which characterize the formation of bit patterns in each of the row use vectors of a particular row use set of the JOIN relation. A more detailed discussion of this routine will be presented with reference to FIG. 22E. Then, during block 688 the NUMS routine (FIG. 27F) is called. The NUMS routine determines the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. A more detailed discussion will be shortly presented with reference to FIG. 22F. Assuming that all the calculations for determining the characteristic of a row use set have occurred, processing continues at block 690 during which the first input column "j" (where "j" is set equal to 1) is obtained. Then, in block 692, the row use set associated with the first input column j is obtained.
  • the GENERATE BIT STRING routine (FIG. 22G) is called for constructing the row use vectors associated with a particular value in the row use set of the JOIN relation.
  • the GENERATE BIT STRING routine (FIG. 22G) evaluates the calculations of the PRODUCTS (FIG. 22E) and NUMS (FIG. 22F) routines to determine the characteristics of the bit patterns in the row use vectors and constructs these bit patterns in the row use vectors of the JOIN relation.
  • the RPU 22 determines if there are any more input columns which need to be processed. Assuming that there are still other input columns to be processed, block 698 increments the variable "j" by 1.
  • FIG. 22D is a flow diagram of the EVALUATE ROW USE VECTORS routine, discussed below in more detail.
  • the purpose of this routine is to determine the number of occurrences of a particular value which participates in the JOIN operation.
  • the RPU 22 selects the first column and obtains the row use set of the column.
  • the RPU 22 obtains the row use vector of the RUS (j) which corresponds to the unique value i of the bit vector x.
  • the row use vector is referred to as V j .
  • the number of binary bits set to "1" in the row use vector Vj is determined.
  • the number of binary bits set to "1" in Vj corresponds to the number of occurrences of the particular value i in the current column over which the JOIN is performed.
  • the number of occurrences calculated for this particular row use vector Vj is placed in the variable Cj .
  • Cj is used by the PRODUCTS routine (FIG. 22E) , to be discussed.
  • the RPU 22 determines whether there are any more input columns. Assuming that there are still more input row use vectors, processing continues at block 709, during which the variable j is incremented by l. Processing continues at block 705, 706 and 707 until all of the input columns are processed.
  • FIG. 22E is a flow diagram for the routine
  • PRODUCTS As stated earlier, the purpose of this routine is for calculating a series of product terms which characterize the formation of bit patterns in each of the row use vectors of the JOIN relation.
  • an array called PRODS is set equal to the series
  • PRODS ⁇ (l,C ⁇ * C 2 * C 3 * . . . * C n ) , (2,C 2 * C 3 * . . . * C n ), (3,C 3 * . . . * C n ), . . . (N - 1, C n - 1 * C n ) ,
  • Cj is equal to the number of occurrences of value i in column j participating in the JOIN operation. For example, suppose the JOIN operation is performed over the CITY columns for two relations and the value of London is found to be present in both columns. Assume that the CITY column in the first relation contains two occurrences of the value London and the CITY column in the second relation contains three occurrences of the value London. Then C ] _ is set equal to 2 and C 2 is set equal to 3. Therefore, PRODS(1) is equal to C- ⁇ * C 2 , which is equal to 6. This number is used by the GENERATE BIT STRING routine (FIG.
  • processing block 710 determines the characteristics of the bit patterns in the row use vectors of the JOIN relation.
  • processing block 710 processing continues at block 712, to return processing to the calling program or the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
  • the NUMS routine is for determining the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation. Specifically, during block 716 the following series is calculated
  • NUMS (1, PRODS(1)/PRODS(1) )
  • this routine determines' the characteristics of a bit' pattern in a JOIN row use vector associated with a particular value i.
  • An OFFSET value is determined for specifying the number of binary "0" bits in the row use vector ahead of the first binary "1" bit in addition to the zero-bits specified by START- ROW.
  • the offset value is equal to an initial value of zero.
  • RPU 22 obtains the input bit vector associated with the variable Vj .
  • RPU 22 obtains the first binary "1" bit in the Vj bit vector.
  • a variable K is set equal to the ordinal position of the selected (726, FIG. 22G) bit of Vj .
  • Processing continues at block 730, during which the Create Output vector "W" is generated.
  • the output vector "W" at a position in the row use set corresponds to the bit position k, in the entity select vector (the destination entity select vector corresponding to column j) .
  • the characteristics of the output bit vector W are determined by calculating NUMS (j), PRODS (j + 1) and PRODS (j).
  • NUMS (j) indicates the number of repetitions of a bit pattern having PRODS (j + l) "l" bits.
  • PRODS (j) indicates the total number of bits in the bit pattern associated with the output bit vector W.
  • FIG. 19 depicts a JOIN operation for the following query:
  • the result of this JOIN operation for columns CITY, SUPPLIERS ID#s, STATUS and PART ID#s is shown at 628 of FIG. 19.
  • the binary representation for the new JOIN relation is shown at 604 and 606. More particularly, the SUPPLIERS relation portion of the JOIN relation 628 is shown at 604 and the PARTS relation portion of the JOIN relation is shown at 606.
  • the purpose of this discussion is to construct the binary representation of the JOIN relation for the query above.
  • Fig. 23 is a Results Table depicting the JOIN operation performed on the SUPPLIERS relation and the PARTS relation for the specific query above.
  • the second column is the entity select vector associated with the column of the first relation (or SUPPLIER relation) over which the JOIN operation is performed.
  • the third column is the entity select vector for the column in the second' relation (or PARTS relation) over which the JOIN operation is performed.
  • the fourth column is the resultant bit vector x generated by Boolean ANDing the entity select vectors for the two columns over which the JOIN operation is being performed.
  • the fifth column is the selected input row use vectors of the SUPPLIERS relation.
  • the sixth column is the selected input row use vectors for the PARTS relation.
  • the seventh column is the entity select vector 1, corresponding to the first column of the JOIN relation which also refers to the SUPPLIERS relation portion of the JOIN relation.
  • the eighth column is the entity select vector 2 corresponding to the second column of the JOIN relation which also refers to the PARTS relation portion of the JOIN relation.
  • the ninth column is the first JOIN row use set corresponding to entity select vector 1, and the tenth column is the second JOIN row use set corresponding to entity select vector 2.
  • the entity select vectors for the CITY columns of the SUPPLIERS relation e.g., the entity select vectors for the CITY columns of the SUPPLIERS relation
  • the second column of the Results Table of FIG. 23 depicts the entity select vector for the CITY column of the SUPPLIERS relation
  • the third column of the Results Table of FIG. 23 depicts the entity select vector associated with the CITY column of the PARTS relation.
  • the RPU 22 performs a Boolean AND operation on the entity select vectors for the SUPPLIERS and PARTS relations to obtain a resultant binary bit vector x (852, FIG. 23A) .
  • the resultant binary bit vector shown in the third column of Fig.
  • the RPU 22 determines which binary bits of each entity select vector correspond to the binary bits set to "1" in the resultant vector x.
  • the row use vectors corresponding to the values London (L) and Paris (P) are obtained (854, FIG. 23A) .
  • the row use vectors corresponding to the values London and Paris in • the CITY column for the PARTS relation are obtained (856, FIG. 23) .
  • the Boolean OR operation is performed.
  • the resultant vectors of the Boolean OR operations are entity select vectors for the output or JOIN row use sets of the JOIN relation being formed.
  • the entity select vector 1, for the Boolean OR of the bit vectors for London and Paris is shown at 858 of FIG. 23A.
  • the entity select vector 2 for the Boolean OR of the bit vectors for London and Paris is shown at 860 of FIG. 23A.
  • the routine BUILD ROW USE SETS is called for constructing the row use sets associated with the JOIN entity select vectors. Referring to FIG. 22B, the BUILD ROW USE SETS routine for constructing the row use sets (i.e., 604 and 606 of FIG. 19) corresponding to the columns of the resultant JOIN relation is shown.
  • the RPU 22 selects the first value represented to be present by the first occurrence of a bit set to "1" in the resultant bit vector x.
  • the first unique value of the resultant relation which corresponds to a bit set to "1" is
  • variable START-ROW is set equal to zero.
  • CONSTRUCT JOIN-ROW USE VECTORS routine is called. Processing continues at block 684 of the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) .
  • the EVALUATE ROW USE VECTORS routine (FIG. 22D) is called.
  • the RPU 22 obtains the first row use set associated with the S.CITY column over which the JOIN operation is performed. Particularly, the RPU 22 obtains the row use set corresponding to the S.CITY column of the Supplier relation.
  • the variable j corresponding to this particular column is set equal to 1.
  • Processing continues in block 705, during which the row use vector of the row use set for S.CITY column is obtained for the unique value London which corresponds to the first binary bit set to "1" in the resultant binary bit vector x.
  • the variable V ⁇ _ is set equal to the row use vector "10010" (854, FIG. 23A) .
  • the RPU 22 sets the variable C ⁇ _ equal to 2, the -number of "1" bits in the bit vector V ⁇ .
  • the RPU 22 sets the variable C 2 equal to 3, the number of "1" bits in the bit vector V 2 .
  • the RPU 22 determines that there are no more input columns over which the JOIN operation is to be performed. Thus, processing returns during block 711 of FIG. 22D to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 686.
  • PRODUCTS routine (FIG. 22E) is called.
  • FIG. 22E a series of numerical products which characterize the formation of the bit patterns in each of the row use 1 vectors of the JOIN row use set is constructed.
  • NUMS 22F is called to determine the number of times a particular bit pattern repeats itself in a row use vector of the JOIN relation.
  • the NUMS determination is 10 a. series of divisions for calculating the number of repetitions of a particular pattern of "1" bits in a row use vector of the JOIN relation. Specifically, NUMS is equal to the following series:
  • NUMS (1, 1) (2, 2) . 15 Processing returns during block 718 to the CONSTRUCT JOIN ROW USE VECTORS routine (FIG. 22C) at block 690.
  • 25 OFFSET is set equal to an initial value of zero.
  • the RPU 22 performs a count function on the first bit set to "1" in the binary bit vector Vn . Then, at
  • variable K is set equal to the ordinal position of the first binary bit set to "1" in V ] _; the ordinal position is 1. Processing continues at block
  • the output vector W corresponds to the bit
  • NUMS (1) indicates the number of repetitions of the bit string of the output vector W corresponding to the first occurrence of London in the bit vector V ⁇ _.
  • PRODS (2) indicates that there are three "1" bits in the bit pattern of the output vector W.
  • PRODS (1) indicates the total number of bits, six, in the bit pattern associated with the output bit vector W.
  • the position of the first binary bit set to "1" in the output bit vector W is zero.
  • the output vector W is "111000" (862, FIG. 23).
  • Processing continues in block 732, during which the RPU 22 determines that there is a second "1" bit in the bit vector V ⁇ .
  • the next bit of V ] _ is obtained.
  • the position of the selected bit of V ! is 4.
  • K is set to the 4. .
  • the variable OFFSET is set equal to the initial value of zero.
  • the bit vector V 2 is obtained. This is the row use vector corresponding to London in the P.CITY row use set.
  • the first bit set to "1" of the bit vector V 2 is selected.
  • the position of this selected bit of V 2 is determined to be 1.
  • the output vector W corresponding to the bit position 1 of the entity select vector 2 (Destination entity select vector corresponding to P.CITY column) is determined. The characteristics of the output vector W are determined by calculating NUMS (2), PRODS (3) and PRODS (2).
  • NUMS (2) is the number of repetitions of the generated bit string which is equal to 2.
  • the bit string consists of PRODS (2) bits which is equal to three bits.
  • the number of "1" bits in the bit pattern is given by PRODS (3) which is equal to one "1" bit.
  • PRODS (3) which is equal to one "1" bit.
  • This series commencing with the first bit set to "1” begins at position zero as indicated by the START ROW + OFFSET, where START ROW and OFFSET are both zero.
  • the bit pattern of output vector W is "100100" (866, FIG. 23).
  • Processing continues at block 732, during which the RPU 22 determines that there are still "1" bits in the bit vector V 2 to be evaluated. Specifically, bits at positions 4 and 6 of the bit vector V 2 need to be evaluated.
  • processing continues at block 736, during which the variable OFFSET is .determined to be equal to 1.
  • the output bit vector W is "010010" (868, FIG. 23) .
  • Processing continues at block 732, during which the RPU 22 determines that there is still one "1" bit in the bit vector V 2 to be evaluated.
  • processing continues at block 736, during which offset is incremented by the value PRODS (3) .
  • PRODS (3) is equal to 1 and thus the value of offset is equal to 2.
  • the next bit of the bit vector V is obtained and processing continues at block 728.
  • the position of the newly selected bit of the bit vector V 2 is 6 and the variable K is set to 6.
  • the output vector W corresponding to bit position 6 in the entity select vector 2 is determined.
  • PARTS relation 65 such that the SUPPLIER CITY follows the PARTS CITY in alphabetical order.
  • the command for this query is:
  • the command for the GREATER THAN JOIN operation requires that the data come from two relations, namely, the Suppliers relation (63, FIG. 2), and the Parts relation (65, FIG. 2) .
  • both relations are named in the FROM clause, and the express connection between the tables is the City column in the WHERE clause.
  • the result of the GREATER THAN JOIN is for displaying all the columns of the Suppliers relation with all the columns of the Parts relation.
  • the process for creating the binary representation of the GREATER THAN JOIN relation is the same as for the EQUIJOIN operation. Namely, the BUILD ROW USE set (FIG. 223) is called to physically construct the columns of the JOIN relation.
  • FIG. 24 is a flow diagram which depicts the steps for evaluating and preparing the data prior to performing the BUILD ROW USE SETS routine (FIG. 22B) , which constructs the binary representation of the JOIN relation.
  • a generic routine for performing the GREATER THAN JOIN operation is shown. Specifically, the JOIN operation is constructed for all the values of column A (i.e. all the values of the City column of the Suppliers relation) which are greater than all the values of column B (i.e. all the values of the City column for the Parts relation) . It should be noted that this routine could be modified for any other of the predicate JOINs such as LESS THAN, LESS THAN OR EQUAL TO, and GREATER THAN OR EQUAL TO.
  • a sort operation is performed on all the values of column A to order the values from the greatest value to the least greatest value.
  • a binary bit vector D is generated.
  • the binary bit vector D contains a number of binary bits equal to the number_ of values in the value set over which the GREATER THAN JOIN operation is performed.
  • the binary bit vector D has its binary bits, set to "1" at all the ordinal positions corresponding to the values of the value set, which are less than the value y of column- A.
  • the Boolean AND operation is performed on the binary bit vector D with the entity select vector corresponding to column B.
  • the resultant vector contains binary bits set to "1" at the ordinal positions corresponding to the values of column B, which are less than the value y of column A.
  • the resultant vector is checked to see if it is "0". If the resultant vector is "0", then processing returns to the calling routine during block 1012 because none of the values of B is less than the value of A. Therefore, the GREATER THAN JOIN operation cannot be satisfied, and thus, the user is notified. Assuming that the resultant vector is not "0", then processing continues at block 1014.
  • the ordinal positions of the binary bit set to "1" in the result vector calculated in block 1008 are determined.
  • the ordinal positions correspond to the ordinal positions of the entity select vector, which correspond to the values of column B, which, are less than the values of column A.
  • all of the row use vectors associated with the ordinal positions of column B, which are associated with the binary bits set to "1" in the result vector are obtained.
  • the row use vector for value y of column A is also obtained.
  • the row use vectors associated with column B are placed into a temporary area, along with the row use vector for column A.
  • the Boolean OR operation is performed between the row use vectors associated with column B to generate the entity select vector for the column B portion of the JOIN relation.
  • entity select vector for the column A portion of the JOIN relation is a row use vector associated with the value y of column A.
  • the entity select vector for the input column B depicts all of the values of B, which are less than the value y of column A.
  • the BUILD ROW USE SETS routine (FIG. 22B) will construct the first several rows of the JOIN relation, which will consist of the value y of column A and all of the values of column B, which are less value y.
  • the JOIN operation is performed over two relations and a resultant JOIN relation is generated.
  • the resultant JOIN relation (FIGS. 19, 20 and 21) was a binary representation.
  • the system can automatically perform the DISPLAY/RECONSTRUCT FOR A JOIN operation routine (FIG. 25A) to generate the particular columns of the JOIN relation for the user to ascertain.
  • the DISPLAY/RECONSTRUCT FOR JOIN operation FIG.
  • FIG. 25A a row use vector corresponding to the rows of the JOIN relation to be constructed is created. Then, during block 1104, an index vector associated with the JOIN relation is created.
  • the index vector is a binary bit vector in which each binary bit corresponds to a row use vector of the JOIN row use set. Each bit indicates whether the unique value associated with the row use vector exists in- the JOIN column, which is to be displayed.
  • the unique value associated with the row use vector does not exist in the JOIN column to be displayed.
  • the index vector contains a quantity of binary bits equal to the number of row use vectors of the JOIN row use set.
  • the . row use set for the JOIN column to be displayed is obtained.
  • the first row use vector associated with the JOIN column row use set is obtained. More particularly, the first row use vector, which is currently stored in memory 18 (FIG.
  • the RPU 22 via BBVP 14 performs a Boolean AND operation on a row use vector with the row select vector.
  • the row select vector is a binary bit vector, and each binary bit corresponds to a row of the column or columns to be displayed by the system.
  • a binary bit set to "1" in the row select vector indicates that the corresponding row needs to be displayed by the system.
  • the result of the AND operation is a binary bit vector Z, which depicts the rows of the column which contain a particular value associated with the current row use vector.
  • the results of the AND operation are sent via bus 48 back to memory 18 (FIG.
  • the RPU 22 determines whether the resultant vector is "0". More particularly, the resultant vector Z contains all binary bits set to "0". If the binary bit vector is "0", then during block 1112, a "0" is placed in the first binary bit position of the index vector to indicate that the unique value associated with the row use vector does not exist in the JOIN column to be displayed. However, if the resultant bit vector Z is not equal to "0", then processing continues to block 1114. The RPU 22 via BBVP 14 sets the binary bit of the index vector, which is associated with the current row use vector, to "1". In block 1116, the resultant bit vector Z is stored in memory 18. The resultant bit vector Z is identified for this particular row use vector presently being processed, and it will later be used in the reconstruction process.
  • the RPU 22 via BBVP 14 clears the binary bits in the row select vector that match the binary bits set to "1" in the resultant vector Z.
  • the purpose of this step is. to "shortcut" the processing of the row use vectors in the row use set. Stated differently, when the JOIN row select vector is cleared, all of the values in the rows to be displayed for the column can be determined.
  • the RPU 22 (FIG. 1A) determines whether the JOIN row select vector has had all its binary bits set to "0". If the JOIN row select vector contains only binary bits set to "0", then processing continues at block 1126.
  • processing continues at block 1122.
  • the RPU 22 gets the next JOIN row use vector in the JOIN row use set currently being processed.
  • the RPU 22 determines whether the end of the JOIN row use vector has been reached. If the end of the JOIN row use vector has been reached, then processing continues at block 1126. However, assuming that the end of the row use vector has not been reached, then processing continues at blocks 1108, 1110, 1114, 1116, 1118, 1120, 1122 and 1124 until all of the JOIN row use vectors of the JOIN row use set have been completely processed.
  • processing continues at block 1126.
  • the RPU 22 determines whether the present column for display references a value set or a relation.
  • the DISPLAY/RECONSTRUCT routine for the JOIN operation starts with the row use set of the JOIN column and then references the relation corresponding to the row use set in the JOIN relation.
  • the column to be displayed is a JOIN column, and it does not reference a value set. Therefore, processing continues at block 1112, during which the REFERENCE RELATION routine (FIG. 25B) is called.
  • the RPU 22 obtains the entity select vector associated with the JOIN column.
  • the entity select vector for the JOIN column is used as a row select vector for obtaining values from the referenced relation. Processing returns to the DISPLAY/RECONSTRUCT FOR JOIN operation routine (FIG. 25A) at block 1104.
  • an index vector for the referenced relation is created. More particularly, the index vector contains a number of binary bits is equal to the number of row use vectors of the reference relation.
  • the RPU 22 (FIG. 1A) obtains a row use set for a column of the relation to be displayed. The first row use vector associated with the row use set is obtained.
  • the RPU 22 instructs the BBVP 14 to perform a Boolean AND operation on the row use vector obtained in block 1106 with the row select vector obtained during the REFERENCE RELATION routine (FIG. 6B) .
  • the result of the AND operation is a vector Z which displays the rows of the column which contain a particular value associated with the row use vector.
  • the RPU 22 determines whether the resultant vector Z is "0". If the resultant vector Z is "0", then during block 1112, a "0" is set in the first binary bit position of the index vector. Processing continues at block 1122, during which the next row use vector associated with the column in the reference relation is obtained. Returning to block 1110, if the result of the Boolean operation performed in block 1108 is "0", then processing continues at block 1114. During block 1114, the RPU 22 via BBVP 14 (FIG. 1A) sets the binary bit in the index vector, which is associated with the current row use vector to "1". During block 1116, the resultant bit vector Z is stored in memory 18 (FIG. 1A) .
  • the RPU 22 instructs the BBVP 14
  • the RPU 22 instructs the BBVP 14 to determine whether the row select vector has been completely cleared. Stated differently, whether all the binary bits have been set to "0". The row select vector contains only binary bits set to "0", and processing continues at decision block 1126. Otherwise, assuming that all of the binary bits in the row use vector are not set to "0", then processing continues at block- 1122. During block 1122, the RPU 22 (FIG.
  • processing continues at block 1126.
  • the RPU 22 determines whether the column currently being evaluated references a relation or a value set. The currently column presently being evaluated references a value set, and thus, processing continues at block 1130.
  • the REFERENCE VALUE SET routine (FIG. 25C) is called.
  • the REFERENCE VALUE SET routine obtains the values from the value set and places the values in the proper rows of the JOIN relation column. More particularly, referring to block 1142, the RPU 22 instructs BBVP 14 (FIG. 1A) to determine the ordinal positions of the binary bits set to "1" in the index vector for the referenced column. Each binary bit set to "1" indicates which row use vectors reference unique values to be displayed in the referenced column. In addition, for each row use vector which has a corresponding binary bit set to "1" in the index vector, the ordinal position of the corresponding binary bit set to "1" in the entity select vector associated with the value set is determined.
  • This process is performed for each of the row use vectors which has a corresponding binary bit set to "l" in the index vector for the referenced column.
  • the values associated with the ordinal positions obtained in the block 1142 are retrieved from the referenced value set.
  • the RPU 22 finds the appropriate location in the index vector associated with each of the values retrieved in block 1144.
  • the appropriate resultant Z vector associated with each of the row use vectors is retrieved.
  • the resultant Z vector indicates which rows of the referenced column contain the unique value associated with the row use vector.
  • the RPU 22 (FIG. 1A) determines whether any more values are left for processing.
  • processing continues at blocks 1146 and 1148 until all the values from the value set have been placed in the proper rows of the referenced column. Assuming that all of the values have been placed in the proper rows of the column, then processing continues at block 1152.
  • the corresponding JOIN row use set is determined. More particularly, the corresponding binary bit set to "1" in the index vector for the JOIN column is determined.
  • the appropriate vector Z temporarily stored at an earlier step is obtained. The resultant vector Z indicates the row positions of the JOIN column where the value from the referenced column should be placed.
  • the RPU 22 determines whether any more values from the referenced column need to be processed. If more values need to be processed, then processing continues at blocks 1152, 1153 and 1155 until all the values of the referenced column have been processed. Assuming that all the values have been processed, then processing returns to the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) . During block.1132, processing returns to the calling program. 1. Example of the DISPLAY/RECONSTRUCT Operation For A JOIN Relation
  • FIGS. 20, 25A, 25B, 25C, 26A, 26B, 26C, 26D and 26E a detailed example for performing the DISPLAY/RECONSTRUCT program on the JOIN column for Suppliers IDs in the JOIN relation (604, FIG. 29) is now discussed. More particularly, it is assumed that only a binary representation of the JOIN column for Suppliers IDs exists in the RDMS 10. The binary representation of the JOIN relation is a result of the JOIN operation, or it may have been previously stored after processing of the BBVP 14 (FIG. 1A) . In either case, the binary representation of the JOIN column exists in memory 18 and now the user or applications program needs to display the actual values of the JOIN column.
  • this example is for reconstructing and displaying only one column of the JOIN relation (604, FIG. 20), the remaining columns of the JOIN relation could be reconstructed by sequentially performing the PROJECT operation.
  • the PROJECT routine (FIG. 17A) calls the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 20A, 20B and 20C) .
  • FIGS. 21A, 21B, 21C, 21D and 21E a results table for depicting the results of the DISPLAY/RECONSTRUCT FOR JOIN routine (FIGS. 25A, 25B and 25C) for reconstructing and supplying the Suppliers ID column of the JOIN relation (604, 626 of FIG. 20) is shown.
  • Each row of the results table depicts a result of the routines shown in FIGS. 25A, 25B and 25C.
  • the results table is separated into 12 columns from left to right.
  • the first column of the results table shows the row select vector associated with the JOIN column to be displayed.
  • the second column shows the row use set associated with the JOIN column to be displayed.
  • the third column is the row use vector of the row use set which is currently being processed, and the fourth column is the index vector associated with the row use set corresponding to the JOIN column.
  • the fifth column is a resultant binary bit vector Z.
  • the resultant binary bit vector Z is determined by ending a JOIN relation row use vector and the row select vector.
  • the sixth column depicts the entity select vector associated with the referenced relation.
  • the seventh column is an index vector associated with the row use set corresponding to the referenced relational column.
  • the eighth column depicts the row use set corresponding to the referenced relational column.
  • the ninth column is for the index vector associated with the row use set for the referenced relational column.
  • the tenth column is for the resultant bit vector Z.
  • the resultant bit vector Z is determined by ending a row use vector of the row use set with the entity select vector.
  • the eleventh column is a depiction of the creation of the referenced relational column.
  • the last column depicts the JOIN relational column, which is to be displayed.
  • the referenced relation obtains the entity select vector (600, FIG. 20) associated with the JOIN column.
  • the entity select vector is used as a row select vector for specifying the rows of the referenced relation which have values in the JOIN column (1184, FIG. 26C) .
  • processing returns to the DISPLAY/RECONSTRUCT FOR JOIN routine (FIG. 25A) at block 1104.
  • Block 1104 creates an index vector for the row use sets corresponding to the Supplier ID column (1181, FIG. 20) .
  • the row use set associated with the Supplier ID column is retrieved (1186, FIG. 26C) .
  • the first binary bit vector of the row use set (1186, FIG. 26C) is ANDed with the entity select vector to form a resultant vector Z (1192, FIG. 26C) .
  • the resultant vector is evaluated to determine if all the binary bits are set to "0".
  • One bit in the resultant vector Z is set to "1", and thus, processing continues at block 1114.
  • the binary bit in the index vector associated with the current row use vector is set to "1" (1194, FIG.
  • the resultant vector Z associated with the current row use vector is stored in memory 18 (FIG. 1A) .
  • the binary bit set to "1" in the row select vector that matched the binary bit set to "1" in the resultant vector Z are cleared (1196, FIG. 26D) .
  • the next row use vector of the row set for the Supplier ID column is obtained (1198, FIG. 26D) .
  • the RPU 22 (FIG. 1A) determines if the end of the row use set has been reached. The end of the row use set has not been reached, and thus, processing continues at block 1108.
  • sets S 1234 and T 1236 define the set of all ordered pairs (S, T) such that s is an element of S and t is an element of T.
  • a "mapping" exists from S 1234. to T 1236. More than one element of set S 1234 may map onto unique elements of T 1236; however, there will never exist more than one element of T into which maps the elements of S.
  • the set S 1234 is the "domain"
  • the set T 1236 is the "range”.
  • FIG. 27 shows the many-to-one mapping. Specifically, si and s2 map to the value tl. The purpose of the entity use vector is to facilitate this relationship between ordinal positions of elements in one set to another set.
  • the entity use vector is a vector whose elements are values expressed in ordinal positions of elements within another set.
  • the entity use vector constitutes a function (or a mapping) between either a value in a value set or a relational row in another or the same relation.
  • the mapping may apply specifically to a relational column, whether it be a single physical column or a physical column consisting of many conceptual columns.
  • the implied ordinal position within the entity use vector corresponds to a row number in the "domain”.
  • the associated value of the element is the entity location (ordinal position of the value in the value set or row position of the relational row being referenced) of the "range”.
  • the value 5 resides. Essentially, the value 5 maps the value in the first row of the city column 174 to the fifth row of the domain 166. Referring to the dotted line 1249, the mapping of the first row of the city column 174 to the fifth row of the value set is shown. Likewise, the fourth row of the city column 174 is also mapped -to the fifth row position of the value set 166. Thus, the first row and the fourth row of the city column 174, both map to the same row of the value set 166. In this way, the many-to-one mapping can be achieved via the entity use vector.
  • the second row and the third row of the city column 174 map to the same row of the value set 166 may be implied mapping at 1247.
  • the fifth row of the city column 174 maps back to the first row of the value set 166.
  • ___ column 174 corresponds to the value 5 in the entity use vector at 1237 which corresponds to the value London.
  • the second row of the city column 174 corresponds to the entity use vector value 8 which corresponds to the value Paris.
  • the third row of the city column 174 corresponds to the value 8 of the entity use vector and to the value Paris.
  • the fourth row corresponds to the value 5 in the entity use vector and to the value London.
  • the fifth row of the city column 174 corresponds to value 1 in the entity use vector which corresponds to the value Athens.
  • the entity use vector will be used specifically during the DISPLAY/RECONSTRUCT operations for the RDMS 10.
  • the entity use vector facilitates the steps for determining the values in the particular columns of a relation and hence facilitates the DISPLAY/RECONSTRUCT process.
  • FIG. 29 a depiction of how entity use vectors may be used in the setting of a JOIN relation is shown. More particularly, the entity use vectors 1238, 1240, 1242, 1244, and 1246 have been added to the binary representation of the JOIN relation of FIG. 20.
  • the entity use vectors 1238, 1240, 1242 and 1244 are identical to the entity use vectors of the Supplier relation in FIG. 28.
  • the new entity use vector of FIG. 29 is at 1246.
  • This entity use vector maps the rows of the JOIN columns to the rows of the input relations. More particularly, the values of the entity use vector correspond to the row numbers of the referenced relation.
  • the first three rows of the entity use vector, 1250, 1251, 1252, respectively, correspond to row 1 of the Supplier relation.
  • the first three rows of the JOIN relation will contain values from the first row of the city column.
  • the city column has associated with it an entity use vector 1244 which has a value 5 in the first row corresponding to the first row of the city column.
  • the value 5 maps back to the fifth row of the value set 166.
  • the fifth row of the value set 166 contains the value London. Therefore, the first three rows of the entity use vector, 1250, 1251, and 1252, correspond to the value London as shown by the physical representation of the S.CITY column of the JOIN relation.
  • the entity use vectors associated with the JOIN relation provide a mapping between the columns of the Supplier relation and the columns of the JOIN relation. As stated earlier, the columns are mapped via the entity use vectors to the value sets of the relational database.
  • the entity use vector will be used during the DISPLAY/RECONSTRUCT process of the JOIN relation.
  • the entity use vector will substantially improve the processing time for generating the actual representation of the JOIN relation by not having to perform the multitude of steps associated with the DISPLAY/RECONSTRUCT operations as discussed earlier. Again, a more detailed discussion on the DISPLAY/RECONSTRUCT will be presented shortly.
  • the entity use vectors will be stored in byte or multi-byte form in memory 18.
  • the RPU 22 causes the appropriate entity use vectors to move from the memory 18 via bus 46 to MVP 15.
  • the RPU 22 evaluates the entity use vectors via the DISPLAY/RECONSTRUCT routines (FIGS. 30 and 31).
  • the relation has been reconstructed and it is ready for display to the external device 12 • for storage or to display 3 (FIG. 1) for the user to ascertain.
  • the entity use vectors are constructed during the load operation performed by the BINARY REPRESENTATION routine at block 356.
  • the system loads the file representation of the relations into external device 12, where they reside until summoned by the RDMS system 10.
  • a particular column is retrieved by referring to its system identifier as discussed earlier and in more detail in part VII.
  • the first value of the particular column is brought to the RPU 22; as discussed earlier, a row use vector associated with the value is built by the BBVP 14. Additionally, a bit is set to "1" in the first position of the row use vector to indicate that the value occupies the first row of the first column of the relation.
  • the first value is brought to the RPU 22, it is also evaluated by the MVP 15.
  • an entity use vector associated with the column is built.
  • the first position of the entity use vector is assigned- a numerical value to indicate the row of the value set corresponding to the input value.
  • the MVP 15 As each value is input from the column, the MVP 15 generates a corresponding value associated with the row position in the value set which contains the particular value.
  • a new entity use vector is created. This process continues until all of the columns input from the external device have associated entity use vectors. As the entity use vectors are constructed, they are sent via bus 46 to memory 18, where they reside until they are called for processing via the DISPLAY/RECONSTRUCT routine at FIGS. 25A, 25B and 25C.
  • FIG. 30A a flow diagram for the DISPLAY/RECONSTRUCT - WITH ENTITY USE VECTORS is shown.
  • the purpose of this routine is to efficiently reconstruct and display the values of a specified column or columns. More particularly, referring to block 1270, the RPU 22 (FIG. 1A) causes the entity use vector associated with the particular column to be displayed to move from memory 18 via bus 46 to MVP 15. Then during block 1272, the RPU 22 determines whether the entity use vector associated with the column references a value set or another relation. Assuming that the entity use vector references another relation, then processing continues at block 1273, during which the REFERENCE RELATION (FIG. 3OB) is called.
  • REFERENCE RELATION FIG. 3OB
  • the REFERENCE VALUE SET routine (FIG. 3OB) is called.
  • the REFERENCE RELATION routine (FIG. 30C) is called when the column to be displayed is, for example, a column of the JOIN relation.
  • the REFERENCE VALUE SET routine (FIG. 30C) is called when the column to be displayed is a column of a base relation (one which references one or more value sets directly) . In either case, when the column has been reconstructed and is ready for display, processing returns to the calling program at block 1276.
  • FIG. 3OB a flow diagram for the REFERENCE RELATION routine is shown.
  • the entity select vector associated with the column is obtained.
  • This entity select vector is used as a row select vector for obtaining the values from a REFERENCE RELATION.
  • the entity select vector 600 (FIG. 29) which is associated with the JOIN column depicting the row use set of the JOIN relation 604 (FIG. 29) could be such a REFERENCE RELATION.
  • Processing continues at block 1282, during which processing returns to the DISPLAY/RECONSTRUCT- WITH ENTITY USE VECTORS routine (FIG. 29A) .
  • a flow diagram for the REFERENCE VALUE SET routine is now discussed. More particularly, during block 1286, the values from the value set ' are obtained corresponding to the ordinal positions expressed by the entity use vector elements. Then, during block 1288, a value is placed in the appropriate row of the column. The ordinal position of the value in the column corresponds to the ordinal position of the corresponding element in the entity use vector. During block 1290, the next value of the value set is obtained and in block 1292, the RPU 22 (FIG. 1A) determines whether all the values have been processed.
  • processing continues at block 1288, 1290, 1292, until all of the values are placed in the appropriate row positions of the column. Once all the values have been placed in the column, processing continues at block 1294. During block 1294, it is determined whether there is another column to go through. More particularly, whether there is a JOIN column which needs to be reconstructed, for example, the CITY column 638 (FIG. 29) of the JOIN relation. Assuming that a JOIN relation needs to be reconstructed, then processing continues at blocks 1288, 1290, 1292, 1294 until all the values associated with the JOIN column have been placed into the proper rows of the column.
  • processing continues at block 1296-, during which processing returns to the DISPLAY/RECONSTRUCT - WITH ENTITY USE VECTORS routine (FIG. 30A) at block. 1276. During block 1276, processing returns to the calling program and the display of the column has been completed.
  • FIG. 31 a representation of a relational database is shown.
  • the top portion of FIG. 31 depicts seven domains of unique values referenced by two relations (Supplier and Parts) .
  • the third relation, labeled JOIN relation references both the Supplier and Parts relations as shown. All three relations depicted in FIG. 31 were thoroughly discussed in the previous sections of the specification. Specifically, the Supplier relation is detailed in FIG. 4, the Parts relation is detailed in FIG. 5, and the JOIN relation is detailed in FIG. 19.
  • the three relations are assumed to be in their binary representations and stored in memory 18 (FIG. 1A) .
  • the RDMS 10 maintains track of all of the columns of the relational database (FIG. 31) via an identification scheme, called the System relation.
  • the System relation logically connects all of the necessary information for describing a column of any of the relations.
  • the System relation for the Supplier/Parts relational database (FIG. 31) is shown.
  • the System relation is broken up into four columns. From left to right, the column identifier (CID) , the relation identifier (RID) , the attribute identifier (AID) , and the domain identifier (DID) .
  • the CID is a number which distinguishes a column from all of the other columns of the relational database.
  • the RID identifies the relation in which a column logically resides. For example, column 80 (FIG. 2) resides in relation 63 (FIG. 2) .
  • the AID defines the order or position in which the column resides in a particular relation. For example, column 80 (FIG.
  • each element of the Supplier/Parts relational database is identified by a CID number.
  • the Supplier/Identifier domain is characterized by a CID number "1" and an RID number "1". Domains do not have AID or DID numbers because the relation only has a single column, and it is itself a domain. Normally, a single column in the relational database would have associated with it an AID number of 1; however, in the preferred embodiment, the domain is always considered to have an AID of 0.
  • the Parts Identifier domain is characterized by a CID number "2" and an RID number "2". The domains are more fully depicted in FIG. 33 with their associated CID and RID numbers.
  • the CID and RID numbers for the domains are shown in the first seven rows of the system relation (FIG. 32) .
  • the CID and the RID number any of the domains for the Supplier/Parts relation (FIG. 31) can be referenced and obtained.
  • the Supplier relation is identified by a special row of the System relation, which generates a virtual column (as- seen by the user) for identifying the Supplier relation.
  • Each relation of the relational databases has a virtual column. The user never sees this column. This column
  • the virtual column contains row numbers associated with the rows of the relation. This column by convention does not have associated with it an AID or a DID number. Thus, in the eighth row of the system relation (FIG. 32) , the AID and DID numbers are set to 0.
  • the first column (which the user sees) of the Supplier relation is the Supplier ID column and it has associated with it a CID number 9.
  • the Supplier ID column is part of the Supplier relation, and thus, its RID number is 8.
  • the Person Name column of the Supplier relation has a CID number 10 and it, too, is part of the Supplier relation, and thus, has an RID number of 8.
  • the Status column of the Supplier relation has a CID number 11 and it is part of the Supplier relation, and thus, has an RID number 8.
  • the last column of the Supplier relation is the City column and it has a CID number of 12 and an RID number 8.
  • the Supplier ID column has an AID number of 1, which identifies the Supplier ID column as the first column of the Supplier relation.
  • the Supplier ID column is associated ' with the Supplier Identifier domain, and thus, the DID number associated with the Supplier column is 1.
  • the CID number for a Supplier Identifier domain is equal to 1, and this corresponds to the DID number for the Supplier ID column.
  • the Person Name column of the Supplier relation has an AID number of 2 and a DID number of 3. More particularly, the AID number 2 means that the Supplier Name column is the second column of the Supplier Relation and the DID number 3 means that the Person Names column is associated with the Person Names domain.
  • the Status column of the Supplier Relation has an AID number 3 and a DID number 7.
  • the AID number 3 means that the Status column is the third column of the Supplier relation and the DID number 7 means that the Numbers domain is associated with the Status column.
  • the City column has AID number 4 and DID number 5 associated with it.
  • the AID number 4 means that City column is the fourth column of the system relation and the DID number 5 means that the City domain is associated with the City column of the Supplier relation.
  • the Parts relation also has a virtual column which is identified by the CID number 13. Additionally, the JOIN relation is identified by a virtual column having a CID number 19. The columns of the Parts relation are identified by the CID, RID, AID and DID numbers in the same way the columns are identified for the Supplier relation.
  • the CID number 20 corresponds to a JOIN column which references the Supplier relation having RID 8 (shown by the dotted lines) . More particularly, the DID number associated with the JOIN column (CID 20) is 8, which identifies the Supplier relation (RID 8) as the domain to the JOIN column.
  • the AID number associated with the JOIN column having CID 20 is 1, which means that this ' column is the first column of the JOIN relation.
  • the second column of the JOIN relation having CID number 21 references the Parts relation. More particularly, the DID number associated with the JOIN column having CID 21 is 14, which is the RID number for the Parts relation.
  • FIGS. 34, 35, 36, 37A, 37B, 37C and 37D four constructs for developing each column referenced by the CID numbers in the system relation (FIG. 32) are shown. Specifically, a set containing domains and entity select vectors (FIG. 34) , an entity use set (FIG. 35) , a row select set (FIG. 36) , and a row use vector set (FIGS. 37A, 37B, 37C and ' 37D) are all used to build a particular column of the relational database. Essentially, each construct is a "set of sets".
  • the set contains 21 elements, and each number below each element corresponds to a different CID number or row of the System relation (FIG. 32) .
  • the first element, 1, of the set corresponds to the first row of the System relation (FIG. 32) in which the CID number is 1, RID number is 1, AID number is 0, DID number is 0.
  • the first element of this set characterizes all of the elements of the value set for Supplier IDs in a sorted fashion.
  • FIG. 34 shows the domains of the relational database depicted in FIG. 34 sorted by a lexical ordering scheme. However, any ordering scheme may be used, such as, order by entry, or by size (number of letter or number or both) etc.
  • FIG. 35 shows the set of entity use vectors for mapping each value in each column of each relation
  • FIG. 31 back to the ordinal position of the associated unique value in the domain.
  • Elements 1 through 7 (FIG. 34) of the entity select set are lexically ordered which do not require entity use vectors for mapping because the domains are ordered to begin with. If the values in the domains (elements 1 through 7, FIG. 25) are in order, entity use vectors can be used for mapping the values in the column in the relations (elements 9-12, 14-18, 20 and 21) .
  • the column having CID 9, RID 8, AID 1 and DID 1 is the Supplier ID column of the Supplier relation.
  • This column is characterized by the entity select set (FIG. 34) , entity use set ' (FIG. 35) , row select set (FIG. 36) and row use set (FIGS. 37A, 37B, 37C and 37D) .
  • the entity select vector associated with the Supplier ID column is referenced by the vector element 9.
  • the vector element 9 corresponds to the entity select vector "1010111000", which characterizes a subset of the Supplier identifier domain comprising those values which appear one or more times in the Supplier ID column.
  • the entity use vector associated with the Supplier ID column is referenced by the ninth element of the set.
  • the entity use vector for the ninth element characterizes a mapping between the values of the Supplier Identifier domain and the Supplier ID column.
  • the values of the column are shown in the order they occur in the domain and thus, the ordering shown in the entity use vector represents a direct ordinal numbering scheme.
  • the entity use vector represents a mapping between the row positions of the column and the values of the Supplier Identifier domain.
  • the row select vector associated with the Supplier ID column is referenced by the ninth element of the vector.
  • the ninth element of the vector corresponds to the row select vector "11111" which means that all of the rows of the column are valid.
  • FIGS. 37A and 37C the row use set associated with the Supplier ID column is referenced by
  • the row use set corresponding to the ninth element also represents a mapping from a unique value from the Supplier's domain to a set of rows of the Supplier ID column.
  • the set of rows for the Supplier ID column is characterized by the row use vector.
  • the row use set contains a series of row use vectors for providing a mapping between the sorted value set and the unsorted value set.
  • the function performed by the row use vector associated with the first element of the vector (FIG. 37A) is the same as the entity use vector associated with the first element of the vector in the entity use set (FIG. 35) .
  • the entity use vector is more efficient than the row use vector for mapping an unsorted value set to a sorted value set in the row use set.
  • the row use set is more efficient than the entity use vector for mapping the sorted value set to the unsorted value set.
  • the relational database system could perform with either one, the row use set or the entity use set; however, for efficiency, both are maintained by the preferred embodiment of the invention.
  • the ninth elements of the row use set and the entity use set are shown in (FIG. 37C) and (FIG. 35) , respectively.
  • the column at rows 8, 13 and 19 of the System relation is also characterized by the constructs depicted in FIGS. 34, 35, 36, 37A, 37B, 37C and 37D. More particularly, referring to FIG. 34, the eighth element of the set corresponds to the entity select vector for the column having CID number 8. The entity select vector referenced by the eighth element of the vector has five binary bits set to "1", which indicates that the relation contains five valid rows. Referring now to FIG. 35, the entity use vector associated with the column having CID 8 is referenced by the eighth element of the set. The entity use vector associated with the eighth element is empty because a mapping would not add any useful information for characterizing the column. Referring to FIG.
  • the eighth element of the row select set is depicted for representing the valid rows of the column. Note that there is the binary bit vector having five binary bits set to "1" corresponding to the five binary bits set to "1" in the entity select vector (FIG. 34) .
  • the row use vector associated with the column is referenced by the eighth element of the set as shown in FIG. 37B.
  • the row use vector is empty because a mapping would not add any further useful information for characterizing the column.
  • the row use set is more efficient for mapping a unique value from a value set to one or more rows of a column than the entity use vector and vice versa.
  • the system maintains both constructs to create a maximum efficiency for mapping rows of a column to and from a value set.
  • the RPU 22 creates an empty row in the system relation (FIG. 32) for the domain identifiers during block 1288. Then, during block 1290, .the RPU 22 (FIG. 1A) sets the CID number equal to the row number of the system relation. For example, the first domain identified by the RPU 22 (FIG. 1A) would have a CID number of 1 because it occupies the first row of the system relation (FIG. 32). During block 1292, the RPU 22 sets the RID number equal to the CID number, and during block 1294, the AID number and DID number are set equal to 0.
  • the RPU 22 In block 1296, the RPU 22 generates an element associated with the entity select set (FIG. 34) , the entity use set (FIG. 35) , the row select set (FIG. 36) , and the row use set (FIG. 37A, 37B, 37C and 37D) . During a later step in this routine, the necessary information for characterizing each of the columns is loaded during block 1332 (FIG. 38D) .
  • the RPU 22 determines whether any more domains need to be identified by the system.
  • seven different "create domain" commands have been specified for identifying each of the seven domains in the relational database.
  • processing will return to blocks 1286, 1288, 1290, 1292, 1294, 1296 and 1298, during which the RPU 22 will identify each of the domains of the relational database. Assuming that all of the domains have been identified with their respective CID, RID, AID, DID and element locations in the vectors, processing will continue at block 1300.
  • the application identifies each table of the relational database. Specifically, the application provides the system with the following commands: CREATE TABLE - SUPPLIER
  • Part ID Part Name; Color; Weight; City
  • the RDMS enters another row into the system relation (FIG. 32) for identifying the relation with a column.
  • the CID number is set equal to the row number of the system relation, and during block 1306, the AID number and a DID number are set equal to 0.
  • a new element is entered to each of the sets corresponding to the entity select set (FIG. 34) , the entity use set (FIG. 35) , the row select set (FIG. 36) , and the row use set vector (FIG. 37) .
  • the RPU 22 sets the variable CURRENT AID NUMBER equal to 1.
  • the RPU 22 (FIG. 1A) generates a new row in the system relation (FIG. 32) .
  • the CID number is set equal to the row of the system relation
  • the RID number is set equal to the RID number of the column for identifying the relation corresponding to the current column.
  • the AID number is set equal to the "current AID number” and during block 1322, the "current AID number” is incremented by 1.
  • the DID number for the current column is set equal to the CID number of the remaining reference by the current column. Processing continues at block 1326 (FIG. 39D) where new elements are added to the entity select set (FIG.
  • the RPU 22 determines whether any more columns need to be created for the particular relation specified during block 1300. If more columns need to be identified by the RPU 22 (FIG. 1A) , then processing continues at blocks 1310, 1312, 1314, 1316, 1318, 1320, 1322, 1324, 1326 and 1328 until all the columns of the specified relation have been identified. Assuming that all the columns of the specified relation have been identified, then processing continues at block 1330.
  • the RPU 22 determines whether any more relations need to be identified. Assuming that more relations need to be identified, then processing returns to block 1300, where the next relation to be created by the next CREATE TABLE instruction, listed above, is specified. Blocks 1300 through 1330 are performed until all of the relations in the relational database have been identified and the columns associated with each of the tables also have been identified. Assuming that all the tables and their related columns have been identified, then processing continues at block 1332. During block 1332, the RPU 22 instructs the file associated with each table of the relational database to be transferred from external device 12 to the RDMS 10 via 30 to the RPU 22.
  • the RPU 22 instructs the BBVP 14 to generate the entity select vector (FIG. 34) , the row select vector (FIG. 36) and the row use set (FIG. 37A) associated with each column. Simultaneously, the RPU 22 instructs the MVP 15 to generate the entity use vector (FIG. 35) associated with each of the columns.
  • the entity select vector, entity use vector, row select vector, and row use set are retrieved by the RPU 22 via the system identifiers (i.e. CID, DID, AID, RID identifiers). Then, during block 1334, the RPU 22 determines whether any more files associated with the relations need to be loaded into the RDMS 10.
  • processing returns to block 1332; otherwise, processing continues at block 1336. Assuming that all the files have been loaded into the RDMS 10, then processing returns to the calling program at block 1336. At this time, all of the columns and the relations of the relational database have been identified and all of the characterizing vectors associated with each of the columns are also organized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Appareil et/ou procédé utilisant un ordinateur, destiné à créer une base de données relationnelles (63, 65 et 67), la base de données relationnelles contenant une pluralité de relations et chacune des relations contenant une ou plusieurs colonnes et lignes. Une colonne a une ou plusieurs valeurs dont toutes ont une caractéristique commune. Chaque valeur de la colonne correspond à une des files de la relation. Chaque ligne contient une ou plusieurs valeurs et chaque valeur provient d'une colonne différente. Les valeurs de chaque ligne ont une ou plusieurs caractéristique(s). La création d'une base de données relationnelles se déroule en trois étapes. En premier lieu, on forme un ensemble contenant une pluralité de valeurs uniques, pour chaque caractéristique de la base de données relationnelles. En second lieu, on constitue un ou plusieurs sous-ensembles de chaque ensemble contenant les valeurs uniques de la relation, pour chaque relation de la base de données relationnelles. Chacun des sous-ensembles contient une ou plusieurs valeurs uniques d'un des ensembles. En troisième lieu, on constitue les relations de la base de données relationnelles. Plus particulièrement on forme, pour chaque sous-ensemble associé à une relation particulière, une des colonnes de la relation. Chaque colonne contient une ou plusieurs de chaque valeur unique dans le sous-ensemble, et chaque valeur unique de la colonne apparaît dans une ou plusieurs ligne(s) de la relation.
EP19880910209 1987-10-09 1988-10-07 A relational database representation with relational database operation capability Withdrawn EP0398884A4 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10744787A 1987-10-09 1987-10-09
US107447 1987-10-09
US23875488A 1988-08-29 1988-08-29
US238754 1988-08-29

Publications (2)

Publication Number Publication Date
EP0398884A1 true EP0398884A1 (fr) 1990-11-28
EP0398884A4 EP0398884A4 (en) 1992-08-12

Family

ID=26804791

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19880910209 Withdrawn EP0398884A4 (en) 1987-10-09 1988-10-07 A relational database representation with relational database operation capability

Country Status (4)

Country Link
EP (1) EP0398884A4 (fr)
AU (1) AU632267B2 (fr)
CA (1) CA1338601C (fr)
WO (1) WO1989004013A1 (fr)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE68926422T2 (de) * 1988-09-23 1996-11-07 Ibm Datenbankverwaltungssystem
IT1275529B (it) * 1995-07-14 1997-08-07 Alcatel Italia Emulatore per un database relazionale in linguaggio sql
EP0912922B1 (fr) * 1995-12-01 2003-04-09 Sand Technology Systems International, Inc. Procede et systeme d'execution d'operation booleenne sur une chaine binaire en traitant les bits par groupes de taille maximale
DE19715723A1 (de) * 1997-04-15 1998-11-12 Dci Datenbank Fuer Wirtschafts Array-Verfahren
EP1049030A1 (fr) 1999-04-28 2000-11-02 SER Systeme AG Produkte und Anwendungen der Datenverarbeitung Méthode et appareil de classification
AU7100600A (en) * 1999-09-03 2001-04-10 Whamtech, L.P. Index relational processor
EP1182577A1 (fr) 2000-08-18 2002-02-27 SER Systeme AG Produkte und Anwendungen der Datenverarbeitung Mémiore associative
US9177828B2 (en) 2011-02-10 2015-11-03 Micron Technology, Inc. External gettering method and device
DK1288792T3 (da) 2001-08-27 2012-04-02 Bdgb Entpr Software Sarl Fremgangsmåde til automatisk indeksering af dokumenter
EP1422636A1 (fr) * 2002-11-25 2004-05-26 Sun Microsystems, Inc. Méthode et système de génération d'un jeu de données structuré
US7467155B2 (en) * 2005-07-12 2008-12-16 Sand Technology Systems International, Inc. Method and apparatus for representation of unstructured data
US9152883B2 (en) 2009-11-02 2015-10-06 Harry Urbschat System and method for increasing the accuracy of optical character recognition (OCR)
US9158833B2 (en) 2009-11-02 2015-10-13 Harry Urbschat System and method for obtaining document information
US9213756B2 (en) 2009-11-02 2015-12-15 Harry Urbschat System and method of using dynamic variance networks
US8321357B2 (en) 2009-09-30 2012-11-27 Lapir Gennady Method and system for extraction
US9218379B2 (en) 2013-03-15 2015-12-22 Informatica Llc Method, apparatus, and computer-readable medium for efficiently performing operations on distinct data values

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0079465A2 (fr) * 1981-11-13 1983-05-25 International Business Machines Corporation Méthode pour la mise en mémoire et pour l'accès d'une base de donnée relationnelle

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4276597A (en) * 1974-01-17 1981-06-30 Volt Delta Resources, Inc. Method and apparatus for information storage and retrieval
US4068298A (en) * 1975-12-03 1978-01-10 Systems Development Corporation Information storage and retrieval system
US4318184A (en) * 1978-09-05 1982-03-02 Millett Ronald P Information storage and retrieval system and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0079465A2 (fr) * 1981-11-13 1983-05-25 International Business Machines Corporation Méthode pour la mise en mémoire et pour l'accès d'une base de donnée relationnelle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IEEE FOURTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, WEST BERLIN, GERMANY , SEPTEMBER 13-15, 1978 September 1978, pages 499 - 516; RON ASHANY: 'APPLICATION OF SPARCE MATRIX TECHNIQUES TO SEARCH, RETRIEVAL, CLASSIFICATION AND RELATIONSHIP ANALYSIS IN LARGE DATA BASE SYSTEMS - SPARCOM' *
See also references of WO8904013A1 *

Also Published As

Publication number Publication date
EP0398884A4 (en) 1992-08-12
CA1338601C (fr) 1996-09-17
WO1989004013A1 (fr) 1989-05-05
AU632267B2 (en) 1992-12-24
AU2710088A (en) 1989-05-23

Similar Documents

Publication Publication Date Title
US6633883B2 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
US6711563B1 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
US7058621B1 (en) Method for extracting information from a database
EP0398884A1 (fr) Representation de base de donnees relationnelles capable d'operations sur une base de donnees relationnelles
US6944619B2 (en) System and method for organizing data
EP1217541A1 (fr) Procédé de traitement de demandes dans une base de données, base de données et produit logiciel pour mise en oeuvre de ce procédé
US7246124B2 (en) Methods of encoding and combining integer lists in a computer system, and computer software product for implementing such methods
JPS61223941A (ja) 化学構造の検索方法
JP3452531B2 (ja) データ・マイニングする方法およびシステム
CN110389953B (zh) 基于压缩图的数据存储方法、存储介质、存储装置和服务器
EP2003577A2 (fr) Procédés de codage et de combinaison de listes d'entiers dans un système informatique, système informatique et produit logiciel pour mettre en oeuvre de tels procédés
AU2002232035A1 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
Vaidya et al. Design and architectural implications of a spatial information system
CN114138735A (zh) 一种Janusgraph数据快速批量装载的方法
Fry et al. Towards a formulation and definition of data reorganization
Xingjian A database design method for finite element analysis
WO2018117901A1 (fr) Système de traitement d'informations
Spiegler TOWARD A UNIFIED VIEW OF DATA Bridging Data Structure and Content
AU2008202360A1 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
Lee Database design for decision support systems
CN113434540A (zh) 支持左外连接的应用驱动数据生成中依赖关系生成方法及系统
Weldon et al. An attribute-based file organization for a relational database
Samanek Partial-match retrieval using multi-level superimposed codes: Jaroslav (Joe) Samanek
Zaoui et al. Representation and Storage of Colours Images in Databases using a Rtree Generic Structure
Hawthorn Additionally, a category-based access method is introduced. This access method is a variation of B-trees, but its use results in smaller indices.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19900403

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LI LU NL SE

A4 Supplementary search report drawn up and despatched

Effective date: 19920623

AK Designated contracting states

Kind code of ref document: A4

Designated state(s): AT BE CH DE FR GB IT LI LU NL SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19940503