WO2001008050A1

WO2001008050A1 - Database management system

Info

Publication number: WO2001008050A1
Application number: PCT/US2000/019110
Authority: WO
Inventors: Matthew Doering; Peter Parlewicz
Original assignee: Queryobject Systems Corporation
Priority date: 1999-07-21
Filing date: 2000-07-13
Publication date: 2001-02-01
Also published as: AU6212200A

Abstract

A method that allows the transmission of portions of a complete computer data structure in response to queries received from a client computer. The computer data structure is made up of files describing the overall dimensions of the computer data structure and files containing the stored computer data structure. The method further allows individual fragments of the same computer data structure to function as fully independent computer data structures which may be combined with other fragments from the same primary computer data structure to form a larger computer data structure fragment.

Description

DATABASE MANAGEMENT SYSTEM

Background of the Invention

The increase in available inter-connectivity amongst computers made possible

by the Internet has increased the demand for tools to allow larger number of users to

access the wealth of information available in corporate databases. While today's

network technology has provided flexible connectivity for tens of millions of users

contemporary database management systems are unable to manage and deliver

responses to the increasing number of users in an efficient fashion.

Traditionally, a remote user would send a query to a database and the database

would respond with the answers to the query. The inherent problem with this type of

structure is that as the database must process every query it receives and transmit the

answers to the each individual query as it is received. Even when a query from the

same user covers substantially the same data, these systems have to transmit the

query to a database and wait for an answer.

A partial solution to this problem is to download the database record or a portion

thereof to a local storage device and then query the locally stored data. While this

removes the problem of re-querying a remote database, it increases the resources

necessary to download and store the database records. In many instances there is no

practical way to download the entire database or even a portion of it. If it is possible to

download a portion of the database it is necessary to rebuild the index of the database

in order to search the retrieved portion. If subsequent portions are retrieved it is again necessary to rebuild the index. This would also necessitate the installation of a

complete database system on the user's computer.

Another problem with traditional database systems is that portions of the data

are often compressed in order to save resources on the database server. In order to

retrieve the data requested by a user, contemporary systems must uncompress the

data in order to answer a query or transmit a portion of the database to a requesting

client computer.

What is needed is an alternative method for providing the. data necessary to

respond to query. It would be useful if the data could be manipulated and transmitted in

a compressed format and was usable in its compressed form and only uncompressed

when the actual values needed to be read on a users computer. It would further the

usefulness of such a method if two or more subsets of data taken from a single data

source could be efficiently queried without having to rebuild the index for the

combination of the subsets.

Summary of the Invention

In one embodiment of the present invention an Internet enabled data source, a

QueryObject™ in the preferred embodiment, works with client applications in the

following manner. A client application, such as a spread sheet, is used to format a

query using a well known query language such as SQL or MDX.. The query is then

transmitted to a data redirector which converts the query into the mathematical coefficients used by the QueryObject™ database and then transmit the coefficients via

a supported transport mechanism (HTTP, Port 80, VPN, etc.) to an Internet

QueryObject™ (IQO) Web Server. The IQO Web Server extracts a subset of the of the

data structure (a data fragment) being queried. This is distinct from traditional relational

and object oriented databases which merely output a record set satisfying the query.

The subset of the data fragment is a fully functional unique data structure (as opposed

to a partition or segment of the data structure) limited only to the information requested

in the query. The fragment is an answer to the query across the entire data source.

The data fragment is then transmitted back (again utilizing any of a number of well

known transport mechanisms including FTP, VPN, and Port 80) to the requesting

client.

The data fragment is then stored on the clients computer, typically in attached

hard disk or other permanent memory means. A data extract function is then used to

return the traditional row set answers to the calling application.

For any subsequent requests, the query will first be applied to the locally stored

data fragment to see if it contains the complete answer set to the query. If the complete

answer set is available in the locally stored data fragment then the answer is returned

directly to the calling application. If the locally stored data fragment contains only a

partial answer set or a null set a request for the differential missing data (encompassing

a second data fragment) is created and sent to the IQO Server. The IQO Server then

creates a second data fragment containing the necessary missing data and transmits the data fragment back to the requesting client where it in turn is combined with the

previously retrieved data fragment and the row set answers are extracted from the

resulting fragment and returned to the calling application. Subsequent queries will

again be first directed to the locally stored data fragment and then a query will be sent

to the primary data structure and the necessary data returned as new data fragments

which are in turn merged into the local data fragment. This cycle can repeat in this

fashion until the entire data structure has been copied over to the local storage device if

needed. In the preferred embodiment the interaction between the client and server

computers is governed by well known transmission protocols.

The ability to combine data fragments taken from a single data structure comes

from the fact that data fragments contain a series of coefficients and offsets which

identify the dimensional relationship between the data elements in the fragments and

the original data structure (or QueryObject™ in the preferred embodiment).

In another embodiment of the present invention the primary data structure is

segmented over several server computers and the query request is routed to the data

server containing the portion of the data structure necessary to generate the data

fragment necessary to satisfy the query.

These and other features and advantages of the present invention will be

presented in more detail in the following specification of the invention and the

accompanying figures which illustrate by way of example the principles of the invention. Brief Description of the Drawings

The present invention will be readily understood by the following detailed

description in conjunction with the accompanying drawings in which:

Figure 1 is an overview description of the operational model of the present

invention.

Figure 2 illustrates one embodiment of the present invention in which an Internet

client queries a web server which in turn transmits data form one or more data servers.

Figure 3 illustrates the data structure of the present invention.

Figure 4 is a flow chart illustrating the method of the present invention.

Detailed Description of the Invention

Reference will now be made to the preferred embodiment of the invention. An

example of the preferred embodiment is illustrated in the accompanying drawings.

While the invention will be described in conjunction with that preferred embodiment, it

will be understood that it is not intended to limit the invention to one preferred

embodiment. On the contrary, it is intended to cover alternatives, modifications and

equivalents as may be included within the spirit and scope of the invention as defined

by the appended claims. In the following description, numerous specific details are set

forth in order to provide a thorough understanding of the present invention. The

present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in

order to not unnecessarily obscure the present invention.

Referring to Fig. 1 , in operation of the method of the present invention, a

database query is formatted by a query tool 4 installed on client computer 2 and

transmitted to server computer 6 where it is received by a server query manager 8. In

the preferred embodiment any of a number of well known query tools (a spreadsheet or

other specialized software) may be used to format the request. In the preferred

embodiment client computer 2 connects to server computer 6 via the Internet or LAN

connection. The client query manager 17 formats the query generated by query tool 4

prior to its transmission and acts as an interface between query tool 4 and local data

structure 16 (the local query object). In this capacity the client query manager 17 is

functioning as a data redirector which converts the query into the mathematical

coefficients used by the local data structure of the present invention and then transmit

the coefficients via a supported transport mechanism (HTTP, Port 80, VPN, etc.) to

server computer 6 when the necessary data is not stored locally. Server query

manager 8 searches the server data structure 10 (the server query object) and

transmits to client computer 2 a first data fragment 12 which satisfies the first query.

The first data fragment 12 is transmitted to the client computer again utilizing any of a

number of well known transport mechanisms including FTP, VPN, and Port 80. The

first data fragment 12 is stored on a local storage device 14 and integrated into the client data structure 16 by client query manager 17 which also passes the answer back

to query tool 4.

When a second query is made, client query manager 17 directs the query to the

local data structure 16. If the query can be satisfied by the data residing in local data

structure 16 the data is returned to query tool 4 through query manager 17. If the

second query can not be satisfied by data stored in local data structure 16 then the

second query is forwarded to server computer 6 where the server query manager 8

retrieves the data is from the server data structure 10 and transmits a second fragment

18 to client computer 2. Again the data fragment is stored on the local storage device

14 and integrated into the client data structure 16 by the client query manager 17 which

passes the entire answer (derived from the two fragments 12 and 18 in local data

structure 16) to query tool 4.

If there is a third query which can not be satisfied by the local data structure 16

then the query is again directed to server computer 6 which returns a third data

fragment 20 containing the additional information required to satisfy the third query.

Fig. 2 illustrates an alternative embodiment of the present invention wherein a

client computer 30 connects to web server 32 across the Internet which in turn

connects to a first data server 34 and a second data server 36. In this embodiment the

client computer 30 transmits queries to the web server 32 and the web server redirects

the queries to the appropriate data server 34 or 36. In yet a further alternative embodiment, the web server 32 provides storage for a client data structure and merely

returns answers to the client computer 30.

Fig 3 illustrates the data structure used in the present invention. The data

structure is illustrated generally at 40 and is comprised of a plurality of files including a

HBF file 42a, a CRN file 44 and one or more metric files 46, 48, 50 and 52. The HBF

File contains information describing the topology of the data contained within the data

structure. The metric files contain the data for every metric within the data structure 40.

A metric is the intersection of two or more dimensions within a data structure (the metric

files contain the answers to the questions). For example, if a data structure contains

gender, state residence, and income as dimensions then one potential metric would be

the number of men residing in New York. The HBF file 42 contains the information

about the dimensions of the data structure and provides a pointer to the exact location

in the metric files 46, 48, 50 and 52 where the information about the intersection of

dimension information exists. In particular the HBF file (see 42b) is comprised of a

header portion 54 which includes general file information (such as the file name, date of

creation, file size, last modified date), and an index of compound keys 56 and offset

pointers 58. Each compound key is unique and corresponds to a specific query (a set

of dimensional constraints and desired metrics). As described above (referring to the

description of Fig. 1 ) the client query manager transforms the query generated by the

query tool into a compound key. The compound key is a unique non-repeating solution

to a polynomial equation wherein the coefficients of the equation are normalized values which identify the dimensional constraints of the query and the desired metric. The use

of a unique value for each query avoids the "collision" problem often found in traditional

index structures. The associate offset pointer 58 provides information used to

determine the location of the data within metric file 46b satisfying the query associated

with the compound key 56.

Metric files, and in particular metric file 46b, have a header 60 which includes

typical file header information (name, file size, date created, etc.) and information about

the structure of the metric file 46b which defines the compression used within the file.

The combination of this compression information and the offset pointer 58 specifies the

actual physical location of the data satisfying the query within the file.

Returning to the query: How many of Men in New York are customers? First a

compound key is generated from the query (the normalized values for Male and New

York Residence are used as coefficients in the compound key generator polynomial).

In this example, the resulting compound key 56 is 123 and the corresponding offset

pointer 58 is 287. Taken in conjunction with the file compression information contained

in the metric file header 60 the physical location of the data 62 is identified and the

answer 800 may be transmitted to the querying application. In the preferred

embodiment a data fragment containing the compressed answer, along with the

compression information and the HBF file information would be transmitted. The HBF

file information would be integrated into a local HBF file and compressed data would be

integrated into the proper metric file and the local metric file header information would be updated. The integration of the HBF information and the data into the proper metric

file (or the addition of a new metric file to the local data structure) is handled by the

local query manager 17 (see Fig. 1 ).

While the present invention uses a polynomial equation to generated compound

keys, any technique which produces unique values for each query may be used. While

the preferred embodiment uses compression technology to reduce file size, an

alternative embodiment may be implemented where the offset pointer 58 directly

identifies the location of the data 62 corresponding to the compound key 56.

Referring to Fig. 4, a user operating client computer 70 formulates a query 74

using any number of well known query tools. A compound key generator then

generates the a compound key 76 associated with the user's query. In the preferred

embodiments the compound key is a unique number resulting from the use normalized

values corresponding to query terms for example, (male =1 female=2 for the gender

coefficient of the polynomial). In step 78 the offset pointer corresponding to the

compound key is identified in the HBF file. In step 80 the offset pointer is used to

identify the physical location of the data within the metric file. Files are identified by

name and by an internal control ID assigned when the data structure is initially built or

modified. In step 82 the data is read and returned to the querying application on client

computer 70 along with the information necessary to adjust the metric and HBF files in

the data structure on the client computer. At step 84 the data is integrated into the

locally stored metric files and the local HBF file is adjusted to add the compound key and offset pointer identifying the integrated data. The data satisfying the query is then

passed to the originally querying application in step 86.

Although the foregoing invention has been described in some detail for the

purpose of clarity of understanding, it will be apparent that certain changes and

modifications may be practiced within the scope of the appended claims. Accordingly,

the present embodiments are to be considered illustrative and not restrictive, and the

invention is not to be limited to the details given herein, but may be modified within the

scope and equivalents of the appended claims.

Claims

We claim:

1) A method for managing a database system comprising the steps of:

transmitting a first query from a first computer to a second computer associated

with a data structure; wherein said data structure comprises a plurality of data elements

organized in a manner such that each data element has a determined dimensional

relationship to the other data elements in said data structure; and

receiving on said first computer a first subset of said data structure including at

least said data elements satisfying said first query.

2 ) The method of claim 1 wherein said first subset of said data structure retains said

determined dimensional relationship.

3) The method of claim 1 further comprising the steps of:

transmitting a second query from said first computer to said second computer;

and

receiving on said first computer a second subset of said data structure including

at least said data elements satisfying said second query; wherein said data elements in

said second subset of data elements maintain the same dimensional relationships to

said data elements in said first subset of data elements as said data elements . 4) The method of claim 2, wherein said first subset of said data structure is stored in a

memory associated with said first computer.

5) The method of claim 4, wherein said first computer queries said first subset of said

data structure stored in said memory associated with said first computer before

transmitting said query to said second computer.

6) The method of claim 1 , wherein said data structure comprises: a first file containing a

plurality of dimensional elements defining the dimensions of a data structure; and a

second file containing stored data elements.

7) The method of claim 1 , wherein said data structure comprises: a first file containing a

plurality of keys defining the structure of a second file and said second file containing

stored data elements.

8) A method for managing a database system comprising the steps of:

receiving a first query from a first computer on a second computer associated

organized in a manner such that each data element has a determined dimensional

relationship to the other data elements in said data structure; and transmitting to said first computer a first subset of said data structure including

at least said data elements satisfying said first query.

9) The method of claim 8, further comprising the steps of:

receiving a second query from said first computer on said second computer;

wherein said second query is not satisfied by said first subset of said data structure;

and

transmitting to said first computer a second subset of said data structure

including at least said data elements satisfying said second query; wherein said data

elements in said second subset of data elements maintain the same dimensional

relationships to said data elements in said first subset of data elements as said data

elements .

10) The method of claim 9, wherein said first subset of said data structure is stored in a

memory associated with said first computer.

11 ) The method of claim 10, wherein said first computer queries said first subset of said

data structure stored in said memory associated with said first computer before

transmitting said query to said second computer. 12) The method of claim 8, wherein said data structure comprises: a first file containing

a plurality of dimensional elements defining the dimensions of a data structure; and a

second file containing stored data elements.

13) A method for retrieving information stored in a computer memory comprising:

a) receiving a query from a first computer on a second computer;

b) locating an offset pointer in a first file for a data set stored in a second file;

whereby said data set contains data elements satisfying said query;

c) determining the location of said data set; and

d) transmitting said data set to said first computer.

14) The method of claim 13, further comprises the step of:

converting said query into a compound key argument.

15) The method of claim 14, wherein said compound key argument is a unique number

calculated from the terms of said query.

16) The method of claim 15, wherein the location of said data set includes determining

the physical location of the data set with a data storage device associated with said first

computer or said second computer. 17) The method of claim 13, further comprises the step of:

integrating said data set into a third file associated with said first computer.

18) The method of claim 17, further comprises the step of:

modifying a fourth file associated with said first computer; whereby said

modifications of said fourth file describes the structure of said data set integrated into

said third file associated with said first computer; whereby said data set maintains its

dimensional relationship with the totality of data sets contained in said second file.

19) The method of claim 18, wherein said first file is a list of offset pointers within said

second file indexed by a plurality of compound key arguments.

20) The method of claim 19, wherein said second file is one of a plurality of files storing

data sets; wherein there is a data set storing file of the same type as said second file for

each metric within a data structure.

21 ) The method of claim 20, wherein each of said offset pointers is associated with an

identifier which identifies which of said plurality of files storing data sets contains the

data set satisfying the query associated with said compound key. 22) The method of claim 17, wherein said third file is a fragment of said second file;

wherein the structure of said second file remains intact within said fragment.

23) The method of claim 18, wherein modifying said fourth file is includes creating said

fourth file if said four file was not previously created.

24) The method of claim 18, wherein said fourth file contains the same dimensional

information about said fragment as said first file contains about a data structure

comprising said plurality of files storing data set.