DATABASE MANAGEMENT SYSTEM
Background of the Invention
The increase in available inter-connectivity amongst computers made possible
by the Internet has increased the demand for tools to allow larger number of users to
access the wealth of information available in corporate databases. While today's
network technology has provided flexible connectivity for tens of millions of users
contemporary database management systems are unable to manage and deliver
responses to the increasing number of users in an efficient fashion.
Traditionally, a remote user would send a query to a database and the database
would respond with the answers to the query. The inherent problem with this type of
structure is that as the database must process every query it receives and transmit the
answers to the each individual query as it is received. Even when a query from the
same user covers substantially the same data, these systems have to transmit the
query to a database and wait for an answer.
A partial solution to this problem is to download the database record or a portion
thereof to a local storage device and then query the locally stored data. While this
removes the problem of re-querying a remote database, it increases the resources
necessary to download and store the database records. In many instances there is no
practical way to download the entire database or even a portion of it. If it is possible to
download a portion of the database it is necessary to rebuild the index of the database
in order to search the retrieved portion. If subsequent portions are retrieved it is again
necessary to rebuild the index. This would also necessitate the installation of a
complete database system on the user's computer.
Another problem with traditional database systems is that portions of the data
are often compressed in order to save resources on the database server. In order to
retrieve the data requested by a user, contemporary systems must uncompress the
data in order to answer a query or transmit a portion of the database to a requesting
client computer.
What is needed is an alternative method for providing the. data necessary to
respond to query. It would be useful if the data could be manipulated and transmitted in
a compressed format and was usable in its compressed form and only uncompressed
when the actual values needed to be read on a users computer. It would further the
usefulness of such a method if two or more subsets of data taken from a single data
source could be efficiently queried without having to rebuild the index for the
combination of the subsets.
Summary of the Invention
In one embodiment of the present invention an Internet enabled data source, a
QueryObject™ in the preferred embodiment, works with client applications in the
following manner. A client application, such as a spread sheet, is used to format a
query using a well known query language such as SQL or MDX.. The query is then
transmitted to a data redirector which converts the query into the mathematical
coefficients used by the QueryObject™ database and then transmit the coefficients via
a supported transport mechanism (HTTP, Port 80, VPN, etc.) to an Internet
QueryObject™ (IQO) Web Server. The IQO Web Server extracts a subset of the of the
data structure (a data fragment) being queried. This is distinct from traditional relational
and object oriented databases which merely output a record set satisfying the query.
The subset of the data fragment is a fully functional unique data structure (as opposed
to a partition or segment of the data structure) limited only to the information requested
in the query. The fragment is an answer to the query across the entire data source.
The data fragment is then transmitted back (again utilizing any of a number of well
known transport mechanisms including FTP, VPN, and Port 80) to the requesting
client.
The data fragment is then stored on the clients computer, typically in attached
hard disk or other permanent memory means. A data extract function is then used to
return the traditional row set answers to the calling application.
For any subsequent requests, the query will first be applied to the locally stored
data fragment to see if it contains the complete answer set to the query. If the complete
answer set is available in the locally stored data fragment then the answer is returned
directly to the calling application. If the locally stored data fragment contains only a
partial answer set or a null set a request for the differential missing data (encompassing
a second data fragment) is created and sent to the IQO Server. The IQO Server then
creates a second data fragment containing the necessary missing data and transmits
the data fragment back to the requesting client where it in turn is combined with the
previously retrieved data fragment and the row set answers are extracted from the
resulting fragment and returned to the calling application. Subsequent queries will
again be first directed to the locally stored data fragment and then a query will be sent
to the primary data structure and the necessary data returned as new data fragments
which are in turn merged into the local data fragment. This cycle can repeat in this
fashion until the entire data structure has been copied over to the local storage device if
needed. In the preferred embodiment the interaction between the client and server
computers is governed by well known transmission protocols.
The ability to combine data fragments taken from a single data structure comes
from the fact that data fragments contain a series of coefficients and offsets which
identify the dimensional relationship between the data elements in the fragments and
the original data structure (or QueryObject™ in the preferred embodiment).
In another embodiment of the present invention the primary data structure is
segmented over several server computers and the query request is routed to the data
server containing the portion of the data structure necessary to generate the data
fragment necessary to satisfy the query.
These and other features and advantages of the present invention will be
presented in more detail in the following specification of the invention and the
accompanying figures which illustrate by way of example the principles of the invention.
Brief Description of the Drawings
The present invention will be readily understood by the following detailed
description in conjunction with the accompanying drawings in which:
Figure 1 is an overview description of the operational model of the present
invention.
Figure 2 illustrates one embodiment of the present invention in which an Internet
client queries a web server which in turn transmits data form one or more data servers.
Figure 3 illustrates the data structure of the present invention.
Figure 4 is a flow chart illustrating the method of the present invention.
Detailed Description of the Invention
Reference will now be made to the preferred embodiment of the invention. An
example of the preferred embodiment is illustrated in the accompanying drawings.
While the invention will be described in conjunction with that preferred embodiment, it
will be understood that it is not intended to limit the invention to one preferred
embodiment. On the contrary, it is intended to cover alternatives, modifications and
equivalents as may be included within the spirit and scope of the invention as defined
by the appended claims. In the following description, numerous specific details are set
forth in order to provide a thorough understanding of the present invention. The
present invention may be practiced without some or all of these specific details. In
other instances, well known process operations have not been described in detail in
order to not unnecessarily obscure the present invention.
Referring to Fig. 1 , in operation of the method of the present invention, a
database query is formatted by a query tool 4 installed on client computer 2 and
transmitted to server computer 6 where it is received by a server query manager 8. In
the preferred embodiment any of a number of well known query tools (a spreadsheet or
other specialized software) may be used to format the request. In the preferred
embodiment client computer 2 connects to server computer 6 via the Internet or LAN
connection. The client query manager 17 formats the query generated by query tool 4
prior to its transmission and acts as an interface between query tool 4 and local data
structure 16 (the local query object). In this capacity the client query manager 17 is
functioning as a data redirector which converts the query into the mathematical
coefficients used by the local data structure of the present invention and then transmit
the coefficients via a supported transport mechanism (HTTP, Port 80, VPN, etc.) to
server computer 6 when the necessary data is not stored locally. Server query
manager 8 searches the server data structure 10 (the server query object) and
transmits to client computer 2 a first data fragment 12 which satisfies the first query.
The first data fragment 12 is transmitted to the client computer again utilizing any of a
number of well known transport mechanisms including FTP, VPN, and Port 80. The
first data fragment 12 is stored on a local storage device 14 and integrated into the
client data structure 16 by client query manager 17 which also passes the answer back
to query tool 4.
When a second query is made, client query manager 17 directs the query to the
local data structure 16. If the query can be satisfied by the data residing in local data
structure 16 the data is returned to query tool 4 through query manager 17. If the
second query can not be satisfied by data stored in local data structure 16 then the
second query is forwarded to server computer 6 where the server query manager 8
retrieves the data is from the server data structure 10 and transmits a second fragment
18 to client computer 2. Again the data fragment is stored on the local storage device
14 and integrated into the client data structure 16 by the client query manager 17 which
passes the entire answer (derived from the two fragments 12 and 18 in local data
structure 16) to query tool 4.
If there is a third query which can not be satisfied by the local data structure 16
then the query is again directed to server computer 6 which returns a third data
fragment 20 containing the additional information required to satisfy the third query.
Fig. 2 illustrates an alternative embodiment of the present invention wherein a
client computer 30 connects to web server 32 across the Internet which in turn
connects to a first data server 34 and a second data server 36. In this embodiment the
client computer 30 transmits queries to the web server 32 and the web server redirects
the queries to the appropriate data server 34 or 36. In yet a further alternative
embodiment, the web server 32 provides storage for a client data structure and merely
returns answers to the client computer 30.
Fig 3 illustrates the data structure used in the present invention. The data
structure is illustrated generally at 40 and is comprised of a plurality of files including a
HBF file 42a, a CRN file 44 and one or more metric files 46, 48, 50 and 52. The HBF
File contains information describing the topology of the data contained within the data
structure. The metric files contain the data for every metric within the data structure 40.
A metric is the intersection of two or more dimensions within a data structure (the metric
files contain the answers to the questions). For example, if a data structure contains
gender, state residence, and income as dimensions then one potential metric would be
the number of men residing in New York. The HBF file 42 contains the information
about the dimensions of the data structure and provides a pointer to the exact location
in the metric files 46, 48, 50 and 52 where the information about the intersection of
dimension information exists. In particular the HBF file (see 42b) is comprised of a
header portion 54 which includes general file information (such as the file name, date of
creation, file size, last modified date), and an index of compound keys 56 and offset
pointers 58. Each compound key is unique and corresponds to a specific query (a set
of dimensional constraints and desired metrics). As described above (referring to the
description of Fig. 1 ) the client query manager transforms the query generated by the
query tool into a compound key. The compound key is a unique non-repeating solution
to a polynomial equation wherein the coefficients of the equation are normalized values
which identify the dimensional constraints of the query and the desired metric. The use
of a unique value for each query avoids the "collision" problem often found in traditional
index structures. The associate offset pointer 58 provides information used to
determine the location of the data within metric file 46b satisfying the query associated
with the compound key 56.
Metric files, and in particular metric file 46b, have a header 60 which includes
typical file header information (name, file size, date created, etc.) and information about
the structure of the metric file 46b which defines the compression used within the file.
The combination of this compression information and the offset pointer 58 specifies the
actual physical location of the data satisfying the query within the file.
Returning to the query: How many of Men in New York are customers? First a
compound key is generated from the query (the normalized values for Male and New
York Residence are used as coefficients in the compound key generator polynomial).
In this example, the resulting compound key 56 is 123 and the corresponding offset
pointer 58 is 287. Taken in conjunction with the file compression information contained
in the metric file header 60 the physical location of the data 62 is identified and the
answer 800 may be transmitted to the querying application. In the preferred
embodiment a data fragment containing the compressed answer, along with the
compression information and the HBF file information would be transmitted. The HBF
file information would be integrated into a local HBF file and compressed data would be
integrated into the proper metric file and the local metric file header information would
be updated. The integration of the HBF information and the data into the proper metric
file (or the addition of a new metric file to the local data structure) is handled by the
local query manager 17 (see Fig. 1 ).
While the present invention uses a polynomial equation to generated compound
keys, any technique which produces unique values for each query may be used. While
the preferred embodiment uses compression technology to reduce file size, an
alternative embodiment may be implemented where the offset pointer 58 directly
identifies the location of the data 62 corresponding to the compound key 56.
Referring to Fig. 4, a user operating client computer 70 formulates a query 74
using any number of well known query tools. A compound key generator then
generates the a compound key 76 associated with the user's query. In the preferred
embodiments the compound key is a unique number resulting from the use normalized
values corresponding to query terms for example, (male =1 female=2 for the gender
coefficient of the polynomial). In step 78 the offset pointer corresponding to the
compound key is identified in the HBF file. In step 80 the offset pointer is used to
identify the physical location of the data within the metric file. Files are identified by
name and by an internal control ID assigned when the data structure is initially built or
modified. In step 82 the data is read and returned to the querying application on client
computer 70 along with the information necessary to adjust the metric and HBF files in
the data structure on the client computer. At step 84 the data is integrated into the
locally stored metric files and the local HBF file is adjusted to add the compound key
and offset pointer identifying the integrated data. The data satisfying the query is then
passed to the originally querying application in step 86.
Although the foregoing invention has been described in some detail for the
purpose of clarity of understanding, it will be apparent that certain changes and
modifications may be practiced within the scope of the appended claims. Accordingly,
the present embodiments are to be considered illustrative and not restrictive, and the
invention is not to be limited to the details given herein, but may be modified within the
scope and equivalents of the appended claims.