Method and Apparatus for Information Management in a Computer Database
The invention relates to a method and apparatus for information management in a computer database. The implementation of mouse-driven computer applications in a windows environment has materially simplified the general use of computers. Facility with a keyboard and knowledge of computer languages are no longer necessary. These benefits are felt particularly at managerial level and here it is important to be able to derive database information for monitoring and decision making purposes. Formulation of queries is an important managerial function.
A database is designed to operate in a particular domain. For example, the domain of a database may be the domain of financial planning over a given time period in an organisation which has several projects running and which incurs expenses of several kinds. The domain has a structure, being composed of a number of dimensions, each of which has its own structure. Typical dimensions in the above mentioned domain are ORGANISATION, where a set of departments are working on a number of projects; and TEMPORAL, where a financial year may be sub-divided into quarters and months. The above described dimensions happen to be hierarchical, and may be represented as tree structures, where a department branches to its projects and a quarter branches to its months .
Formulation of effective database queries demands a knowledge of the database structure. Some dimensional structures, for example that of the TEMPORAL dimension, may be obvious. In general, however, the structure is not obvious. In the past this has meant that the designer of the database has been in the best position to interrogate it because of his knowledge of its structure. This is unsatisfactory where it is an object to allow effective managerial
interrogation of the kind now being made possible by mouse-driven applications in a windows environment.
Another limitation of existing databases is the difficulty encountered in database customisation. For example, it may be desirable to add a new project when a department begins a new enterprise. Hitherto, such an addition has required a relatively complicated procedure, largely because the underlying structure of the database is hidden. The present invention seeks to alleviate these difficulties.
According to one aspect of the invention there is provided a method for information management in a computer database having a number of records in a domain which has a number of dimensions, the method consisting in operating applications in a mouse-driven windows environment" and producing graphical representations of each dimension ,such representations including the relation of components of the dimension with particular regard to any hierarchical structure, the levels of such hierarchies being identifiable in the graphical representations; the method further consisting in operating a database interrogation program which allows direct reference by mouse to the representations to formulate queries . In a preferred embodiment the components of the dimensions are represented by headings and sub-headings and the interrogation program operates by a method consisting in providing a blank elementary select table, adding user-determined row and column headings for the table from those available in the dimensions, thereby defining rows and columns and empty cells at the crossings of the rows and columns, and executing a program which fills in the cells with data corresponding to the set intersections of the row and column headings.
According to another aspect of the
invention there is provided apparatus for implementing the above method comprising a computer with a storage medium holding a database, the database having a number of records in a domain which has a number of dimensions, the computer being programmed for operating applications in a mouse-driven windows environment and producing graphical representations of each dimension, such representations including the relation of components of the dimensions with particular regard to any hierarchical structure, the levels of such hierarchies being identifiable in the graphical representations; the program further allowing direct reference by mouse to formulate queries .
With the method and apparatus described above in accordance with the invention the user is afforded a view of the database which has meaning to him. It can be regarded as a user-centred model of the domain. The domain and dimension structures are revealed and interrogation of the database is allowed at any appropriate hierarchy level. The invention allows implementation of the concept of a model-driven database. In a preferred embodiment the computer employs a declarative programming language. The prologue language is an example. The knowledge base is exhaustively partitioned into theories. Each theory contains a set of procedures, each of which is a set of clauses. Each dimension of the database is a theory of the system and is presented to the user pictorially. In customising the database the user interacts via a customising program with the appropriate dimension presentation. The customising program modifies the theory. for example, to add a project to a department in the ORGANISATION dimension mentioned above the user adds a clause to the theory representing that dimension, each node in the hierarchy being represented by a clause in the respective dimension theory. The modification is
reflected by a revised dimension and presentation.
Whenever a user acts to use a dimension, for example to query the database, the corresponding theory is referenced to guide the actions of the system. Thus, every action is conducted via the theory which embodies the user-centred model. The system is model-driven.
There follows a description, with reference to the accompanying drawings, of a graphical database interrogation system including a database organised to embody the invention. It is to be understood that the database to be described embodies the model-driven attributes described above. In the drawings :-
Figure 1 is a schematic view of apparatus in accordance with the invention comprising a programmed computer with a database;
Figure 2 is a diagram of the screen presentation for implementing the method of the invention with the apparatus of Figure 1;
Figure 3 is a multi-part diagram where at (a) to (f) are shown successive stages in the building of the query table of Figure 2;
Figure 4 is a diagram illustrating the contents of a record in the database of the apparatus of Figure 1;
Figure 5 is a diagram illustrating the bit string comparison technique employed in the preferred method in accordance with the invention; and
Figure 6 is a diagram of a modified TEMPORAL tree structure.
Referring to Figure 1 there is shown a general purpose computer conveniently of the personal computer (PC) kind and preferably part of a network of similar machines, although this is not illustrated. The computer 1 has a screen 2, a mouse 3, a keyboard 4 and a hard disk 5. The computer runs under, a Windows
environment and the mouse is used in conventional manner to select items from pull-down menus and by reference to icons.
A database is stored on the hard disk 5. In this illustration the database is for the domain of financial planning over a given time period in an organisation which has several projects running and which incurs expenses of several kinds. Thus, the database has a set 6 of records, each of which has a record number, an expense value and fields which give location and other information. A more detailed description of a record will be given later.
In addition, the database stored on the hard disk 5 has five dimensions into which each record falls. Each domain has a tree structure with nodes and branches. The five dimensions are: STATUS; ORGANISATION; EXPENSE TYPE; TEMPORAL; and LOCATION.
STATUS DIMENSION
The status dimension has two primary nodes; DYNAMIC and FIXED. The DYNAMIC node branches to two further nodes: COMMITTED and ACCRUED. The FIXED node branches to nodes TARGET and GL. Each node represents a status of an expense (record). The status of an expense will change as it passes through the system.
ORGANISATION DIMENSION
The organisation dimension has a node AIM which branches to project nodes CROMWELL; TOCCATA; HARPO and DM. These nodes represent projects to which the expenses are assigned.
EXPENSE TYPE DIMENSION
The TYPE node branches to SOFTWARE; CONSULTANCY; RENT and TRAVEL. These are the types of the expenses.
TEMPORAL DIMENSION
This is a calendar dimension having a node FY90 (Financial Year 1990) which branches to
"quarter" nodes Ql; Q2; Q3 and Q4. The quarter nodes branch to respective "month" nodes: J; F; M; etc.
LOCATION DIMENSION
The LOCATION node branches to system location nodes to where the expense has been assigned: MAILBOX; MYDATA; and BIN.
Referring now to Figure 2 there is shown a screen presentation typical in the execution of the invention. The screen has multiple windows. At the left-hand side is a set of windows representing the dimensions described above: window 7 for the STATUS dimension; window 8 for the ORGANISATION dimension; window 9 for the EXPENSE TYPE dimension; window 10 for the TEMPORAL dimension and window 11 for the LOCATION dimension.
At the right hand side is a window 12 which shows a typical expense record with the name TRAVEL 1. At the top part of the record are shown the nodes of the dimension which the record occupies, namely TEMPORAL JAN 90; STATUS: COMMITTED; ORGANISATION: HARPO; EXPENSE TYPE: TRAVEL and LOCATION: MYDATE. This information is stored in the record on the database. Other fields in the record gives the illustrated details of the expense and its value.
At the centre of the screen is a window 13 which has a free form query select table. The table in this illustration has been set up to interrogate the database by filling in column headings and sub-headings with LOCATION: MYDATA; TEMPORAL: Ql- 09; STATUS: COMMITTED, ACCRUED and TARGET and by filling in the row headings and sub-headings with ORGANISATION, AIM, CROMWELL, TOCCATA, HARPO and DM. The table created is thus a 3 x 4 table with twelve cells at the crossings of the rows and columns. By clicking the mouse on EVALUATE, the program is triggered to fill in the cells with the totals of the values of the records which are the intersections of
the sets dictated by the table.
The column and row headings and sub-headings are selected from the node names of the dimensions by means of clicking and dragging with the mouse. Any combination can be selected and it is immaterial whether the headings are in rows or columns. From the illustrated column headings it will be seen that headings may be stacked - e.g. "Mydata"; "Ql-09" and the three status headings. The program evaluates the intersection of the sets thus defined and adds the values of the expenses to displa!y them in the cells.
If the database were essentially a non-numerial database then the cell would display the number of records in the intersection. The records could be listed by expanding the cell into another window or by scrolling in the cell, for example.
In Figure 3 there is illustrated the successive stages in building the query table 13 of
Figure 2. Initially, the user is presented with an elementary select table 14, Figure 3(a). By clicking and dragging with the mouse to the column query 25 the user inserts the "Mydata" heading from the LOCATION dimension, Figure 3(b). Then he inserts the heading
"Ql-09" from the TEMPORAL dimension, Figure 3(c). Next he selects sub-headings "Committed", "Accrued" and
"Target" from the STATUS dimension as sub-headings,
Figure 3(d). These column headings are thus stacked and already an intersection set of records corresponding to the logical AND of the column headings has been established in the computer memory in a manner to be described.
Figures 3(e) and 3(f) show the successive stages of selecting the row heading "AIM" and sub-headings "CROMWELL"; "TOCCATA"; "HARPO" and "DM" . When a select table has been used it is stored so that it may be reused or modified to give a new select table.
Referring now to Figure 4 there is shown the format of a record. The record has a number according to its address in the record file. This is not explicitly stored in the record itself. At the c beginning of the record is a text field 16 for record type, a number field 17 for record value, and then follow a locator field 18 and free fields 19. The locator field has five addresses corresponding to the nodes which the record occupies in the five dimensions. o Thus, for the example of the record at 12 in Figure 2, the locator addresses would be for Ql-09 (TEMPORAL) ; COMMITTED (STATUS) ; HARPO (ORGANISATION) ; TRAVEL (EXPENSE TYPE) and MYDATA (LOCATION). The free fields 19 carry the detailed information shown at 12 in Figure 5 2.
As the appropriate data is entered on the "record sheet" shown as 12 in Figure 2, so the dimension leaf nodes are updated. Each dimension leaf node carries a record of the record numbers appropriate 0 to that node. This information is stored in a particularly useful way, as illustrated in Figure 5. Here there is a node list which is a field of successive bytes having a total number of bits corresponding to the total number of records in the 5 database. If a particular record is appropriate to the node then the respective bit number of the node list bit string is turned on (status 1) . If not it has status 0.
As successive headings and sub¬ 0 headings are selected for the query table, the respective node bit strings are operated upon to produce further .bit strings. Thus, if a higher level heading such as Ql is selected, the appropriate leaf node bit strings (JFM) are ORed to produce an 5 intermediate union bit string. Any such intermediate bit strings are finally ANDed with each other and with leaf node bit strings as appropriate to produce further
bit strings which represent the records belonging to the intersection of the node sets. This process is repeated for successive selections and it will be seen that in this way bit strings representative of the record sets for the cells of the table can be rapidly derived. It is then a simple matter to scan the values of the records for insertion of the totals in the cells .
As an illustration, Figure 5 shows a first node bit string 20 with records 1, 3, 6, 7 and 8 present; a second node bit string 21 with records 2, 3,
5, 6 and 9 present and the intersection bit string 22 of these with records 3 and 6 present.
In addition to deriving the intersection of bit string sets, the computer language allows the union of sets and other set operations to be executed. For example, NOT OR and XOR instructions can be executed. In order to facilitate the introduction of other instructions, including quantitative instructions such as "greater than", "equals", and "less than", the screen of Figure 2 has a natural language query box 24 whereby query text can be entered by keyboard. Conveniently such queries operate as the results of a free form query as displayed. In customising the database there are practical restrictions on where a new leaf node may be added without incurring the penalty of rewriting records and node bit lists. These restrictions may be summarised by the condition that a leaf node may not be branched from a node which has a non-null bit list. Since only leaf nodes have bit lists in the preferred embodiment this means that leaf nodes may be added to (a) non-leaf nodes or (b) leaf nodes with null bit- lists . In general terms, within the database every record is associated with a point in a multi¬ dimensional space. Associated with every dimension of
this space is a tree structure. Each leaf node in this tree structure corresponds to a point along the axis. A non leaf node in this tree structure is defined to correspond to the union of the points represented by its children nodes. Thus this tree structure may be used to select single points, or previously defined sets of points. For the sake of illustration there is shown in Figure 6 a modified tree structure of the TEMPORAL dimension of the above described embodiment. Referring to Figure 6 we can interpret the tree as follows:
Records may have one of the set of values (Jan, Feb, .... Dec) for this dimension. Should the node Qtr 1 be chosen in a query, then this corresponds to the set of values (Jan, Feb, Mar). Similarly, the node 1990 corresponds to the set (Jan, Feb, Mar, Apr, ... , Dec) .
Whenever a record is added to the database , or moved, the user is constrained to select values from the leaf nodes of the classification trees for each of the dimensions . This is achieved by having the user traverse a menu tree that is driven from the classification tree. The menu tree will only permit the user to select a value associated with an existing leaf node.
Assuming we have an n-dimensional space. This requires that we have n tree structures. Each record has a 'locator1 that defines a point in the n- dimensional space. That is, we can regard each record as having an additional n fields that define the coordinates along each of the n dimensions. i.e. we have record (dim 1, dim2 , ..., di n , additional information)
It is possible to modify the tree structures, although care must be taken to ensure that a leaf value
is not removed that has records associated with it. The user to add a new leaf node, with a value distinct from the existing leaf nodes, only to an existing non- leaf node. Once added, the user may associate an existing record with the new value (i.e. move the point) or to associate a new record with the new value. Alternatively, the user may promote a leaf node to a non-leaf node (assuming there are no records associated with the promoted node) and then to add a new leaf node. Neither of these transformations compromises any existing records. It will be seen that by a combination of creating a new leaf node and moving records from an existing leaf node the new leaf node, the existing leaf node may be emptied and thus be prepared for promotion to a non-leaf node.
The database is organised so that it is efficient to obtain the set of records that have a particular value for a particular dimension. A number of mechanisms have been implemented to organise the database. For example, a value could simply be associated with each tree leaf node. If the records having that value for the particular axis are required, then the appropriate request find all records such that dimension 'j' has value 'val ' may be issued to the underlying storage manager. The underlying storage manager would return the appropriate set of records which would be returned to the caller. Alternatively, we may wish to cache the results so that next time the request is issued, the same result may be returned. Requests such as
Return the set of records associated with leaf 'a' AND with leaf 'b' may now be executed by the following algorithm: a. Obtain the set of values associated with leaf 'a' — Set A b. Obtain the set of values associated with leaf
'b' — Set B c. Sort SetA — SortedSetA (if SetA is unsorted) d. Sort SetB — SortedSetB (if SetB is unsorted) e. Set Result - empty set f. If sortedSetA is empty then return Result to caller g. If SortedSetB is empty then return Result to caller h. Let ValA be the first value in SortedSetA, and RestSortedSetA is the rest of SortedSetA Let ValB be the first value in SortedSetB, and RestSortedSetB is the rest of SortedSetB i. If ValA ValB then
SortedSetA = RestSortedSetA. else if ValA ValB then SortedSetB = RestSortedSetB else (ValA = ValB) Add ValA to Result. SortedSetA = RestSortedSetA SortedSetB = RestSortedSetB j. go to step f.
However, the ' foregoing method is not the preferred method. Preferably, as described above, every record in the data base is issued with a unique numeric identifier then we can hold a set of record identifiers for every leaf node. This set is defined by having a bit array. If record i is in the set of records, then the bit at location i in the bit array would have the value 1, otherwise would have the value 0. Requests such as
Return the set of records associated with leaf 'a' AND with leaf 'b' can now be executed by intersecting the two bit arrays, that is, by finding all the bit array locations that have a value of 1 in both arrays. This operation is very efficient for existing computer hardware. Most current computer languages permit two
integer values to be 'AND' ed together. This 'AND' operation typically operates on 16 or 32 bits at a time. In C this would be performed by NewVal = A & B;
The AND operation operates upon every binary digit in the two values in parallel and performs the following mapping
A B NewVal
Thus if A contains the value 0101010101010101 and B contains the value 0011001100110011
then A AND B would be value 0001000100010001
It will be seen that the presentation of the user-centred models in accordance with the invention in the above application has the effects (a) of revealing the database structure and (b) of limiting database query headings to legal components of the dimensions.