DOCUMENT MANAGEMENT SYSTEM AND METHOD
FIELD OF INVENTION
The invention relates to a document management system and method, particularly but not exclusively suitable for use in conjunction with publishing electronic documents on a network, for example the Internet.
BACKGROUND TO INVENTION
The low cost of data storage hardware has led to the collection and display of large volumes of data. The worldwide web, for example, is a distributed database providing access to tens of millions of different documents. Commercial organisations now realise the importance of implementing strategies for the display of these documents over the Internet if they are to remain competitive. There is also a trend toward moving the core business of each commercial organisation to the web.
To maintain its competitive edge, a company must be able to create, deliver, personalise and manage timely information, from all areas of the business. This flow of information allows online connections with customers, suppliers, employees and partners, resulting in new business opportunities.
Many companies use professional web developers to assist them in moving their business to the web. Web solutions require a client to have a web system built from the ground up, making initial developments costly and slow. The client is then reliant on the developer to maintain and enhance a website on an on-going basis, incurring large costs and delays in timely content publishing.
It would be particularly desirable to provide such an organisation with an affordable and cost-effective document management system enabling transfer of electronic documents over the Internet.
SUMMARY OF INVENTION
In broad terms in one form, the invention comprises a document management system comprising one or more documents stored in a memory and accessible over a network, each document having a document identifier and a document network address; a class hierarchy having a plurality of category nodes within a tree data structure; a publication rule base comprising a plurality of publication rules, one or more of the publication rules comprising a document identifier, a category, and a publication period; and a publication manager arranged to retrieve one or more publication rules from the publication rale base and to publish each document identified in the rule during the publication period identified in the rule.
In broad terms in another form the invention comprises a document management method comprising the steps of maintaining in a memory a class hierarchy having a plurality of category nodes within a tree data structure; maintaining in a memory a publication rule base comprising a plurality of publication rules, one or more of the publication rules comprising an identifier of a document accessible over a network, a category, and a publication period; retrieving one or more publication rules from the publication rule base; and publishing each document identified in the rule during the publication period identified in the rule.
In broad terms in another form the invention comprises a document management computer program comprising a class hierarchy maintained in a memory, the class hierarchy having a plurality of category nodes within a tree data structure; a publication rule base comprising a plurality of publication rules, one or more of the publication rales comprising an identifier of an electronic document accessible over a network, a category, and a publication period; a publication manager arranged to retrieve one or more publication rules from the publication rule base and to publish each document identified in the rule during the publication period identified in the rule.
BRIEF DESCRIPTION OF THE FIGURES
Preferred forms of the document management system and method will now be described with reference to the accompanying figures in which:
Figure 1 shows a block diagram of a system in which one form of the invention may be implemented;
Figure 2 shows the preferred system architecture of hardware on which the present invention may be implemented;
Figure 3 shows a preferred class hierarchy from Figure 1;
Figure 4 is a preferred implementation of the class hierarchy of Figure 3;
Figure 5 illustrates examples of preferred publication rules of Figure 1;
Figure 6 illustrates a document graph;
Figure 7 illustrates a cover of the document graph in Figure 6;
Figure 8 illustrates a document graph of documents in a file structure;
Figure 9 illustrates a document type cover graph of the document graph in Figure 8;
Figure 10 illustrates a document type graph for an online pavilion website; and
Figure 11 illustrates an instance document graph for which figure 10 is a cover.
DETAILED DESCRIPTION OF PREFERRED FORMS
Figure 1 illustrates a block diagram of the preferred system 10 in which one form of the present invention may be implemented. The system includes one or more clients 20, for example 20A, 20B and 20C which each may comprise a personal computer or workstation described below. Each client 20 is connected to a network or networks 30. It is envisaged that network 30 could comprise a local area network or LAN, a wide area network or WAN, an Internet, Intranet, wireless access network, telecommunication network, or any combination of the foregoing.
The system 10 further comprises one or more servers or data repositories 40 for example 40A, 40B and 40C. Each server 40 is connected to the network 30 as shown in Figure 1 and could comprise a personal computer or workstation but could also comprise several workstations connected by separate private networks.
The system 10 further comprises electronic documents 50 for example 50A, 50C and 50C maintained on a server 40. Each electronic document 50 could comprise a web page, text document, multimedia content, software programs, graphics, audio signals, videos and so forth. Each document 50 preferably includes a unique document identifier and/or a unique network address, by which the document is indexed.
In general, a document 50 is displayed on client 20 after transmittal over the network 30. In some cases a user on client 20 could transmit a document request over the network 30. The network 30 and servers 40 route the request to the most appropriate server 40 on which the required document is stored. The document request preferably specifies the network address of that document. If a document is located, the document is retrieved from the appropriate server 40 and transmitted over the network 30 to the user on client 20. If the document 50 cannot be found, or cannot be found within a pre-specified "time out" period, an error message is displayed to the user 20 instead of the document.
In other circumstances, a third party could transmit the document to the client 20 rather than the client 20 requesting the document. For example, a third party could send a document in the form of an email to a client 20 over the network 30.
The invention provides a method and system of effectively managing documents 50 spread over various servers 40. The invention also provides a method of managing the publication of such documents by specifying rules and other characteristics for the publication.
The system 10 includes a class hierarchy indicated generally at 60. The class hierarchy could be installed on computer memory and interfaced to the network 30. The class hierarchy preferably comprises a series of categories forming hierarchical structures. Each category has a relation to other categories enabling navigation between these categories. Each category preferably includes associated characteristics. These associated characteristics could include for example a publishing location. Where a category is concerned with sport, for example, a document published in the sport category could be transmitted to a list of sports commentators and other sports copying services.
It is also envisaged that the class hierarchy includes category inheritance. For example, ball sports and track and field sports could be defined as separate categories in a parent/child relationship with a parent category sport. Documents published to either ball sports or track and field sports could inherit the publishing characteristics of the parent category sport.
The system 10 further comprises a publication rule base indicated at 70 which could be implemented on computer memory and interfaced to the network 30. These rales in general specify start and end times for publishing documents 50. Each rule also preferably specifies a category to which each document 50 will be published.
The system 10 further comprises a rule modifier 80 which enables publication rales stored in the publication rale base 70 to be altered. The rule modifier could comprise a set of instructions stored on computer readable medium or could comprise computer software
executing under appropriate operating system and application software. For example, the rule modifier 80 could enable the start and end times or publication period of a document 50 to be altered.
The system 10 may further comprise a document type graph 85, an event generator 90 and an event handler 100 which will be described in more detail below.
Figure 2 shows the preferred system architecture of a client 20 or server 40. The computer system 200 typically comprises a central processor 202, a main memory 204 for example RAM and an input/output controller 206. The computer system 200 also comprises peripherals such as a keyboard 208, a pointing device 210 for example a mouse, track ball or touch pad, a display or screen device 212, a mass storage memory 214 for example a hard disk, floppy disk or optical disc, and an output device 216 for example a printer. The computer system 200 could also include a network interface card or controller 218 and/or a modem 220. The individual components of the system 200 could communicate through a system bus 222 or could be implemented as individual components in a network.
Figure 3 illustrates a conceptual view of a class hierarchy 60. As shown in Figure 3, the class hierarchy comprises a root node N0. The route node N0 has parent/child relationships with three different category nodes, namely Nls N2 and N3. NΪ could represent an education category, N could represent a sport category and N3 could represent a news category. N2 is shown as in a parent/child relationship with two further sub-categories N4 and N5. N could represent for example a track and field category within sport category N2 and category N5 could represent a balls sport category as a child of the sport category N . N5 is shown as in a parent/child relationship with a further node N6. Node N6 could represent baseball, being a child of ball sports N5 which in turn is a child of sport N2.
Figure 4 illustrates one practical implementation of class hierarchy of Figure 3. In one form, the class hierarchy could be implemented as a table in a relational database as defined below:
table relations { symbol 1, symbol2, steps}
As shown in Figure 4, the preferred form class hierarchy 60 is implemented with three fields, namely symboll 62, symbol2 64 and steps 66. The class hierarchy is preferably defined by a plurality of parent/child relationships, with field 62 representing the parent category in a parent/child relationship and field 64 representing the child in a parent/child relationship.
The relations table shown in Figure 4 preferably includes a transitive closure rule that ensures that if a parent node is related to a child node via a path of intermediary categories, then the relations table has an entry for the parent node, the child node, and the number of intermediary categories or steps between the parent and the child indicating the closeness of the relationship.
Each category is preferably itself represented as a self-referential reference. For example, category N0 is represented in a parent/child relationship with category N0 as both the parent and the child with the number of steps equalling 0. Node Ni and the remaining nodes are similarly defined as a closed parent/child relationship.
Node Ni is a direct child of node N0 and so the relations table includes a parent/child entry of No and Ni with a steps value equalling 1. Similarly, parent/child relationships also exist for No-N2 and N0-N3 with steps values of 1.
The relations table further includes a parent/child relationship N0-N4 with a step value of 2 and N0-N5 with a step value of 2 also.
The benefit for including an entry for every parent/child pair in the class hierarchy 60 is that database queries are made more effective. If for example, parent/child pair relation N0
and N6 were not included in the relations table, in order to determine that N6 is a sub- category of No, it would be necessary to traverse the entire tree with the pairs N0-N2, N2-N5 and N5-N6 to find the same information. By including all parent/child pairs, it can quickly be ascertained with one query of the relations database that category N6 is a sub-category of N0.
It is envisaged that the relations table could include management functionality to ensure that there are no cycles in the resulting graph realised by the functor taking direct graphs to undirected graphs.
Figure 5 illustrates an example of preferred form publication rules as stored in the publication rule base 70. The publication rale base is preferably implemented as a table in a relational database as defined below:
table publications
{document category start end}
The table in the publication rale base 70 preferably comprises a document field 72, a category field 74, a start field 76 and an end field 78.
The document field 72 stores a reference to individual documents 50. Values in the document field could include the network address of a document 50 or some other unique identifier. The category field 74 specifies a category to which a document specified in field 72 is to be published. The start field 76 and end field 78 together define a publication period for a document specified in field 72 and a category specified in field 74.
In the example shown in Figure 5, document "docl" is specified as being published in category N5 ball sports between 9.00 am on 1 November 2000 and 5.00 pm on 3 November
2000. The next rule in the rale base 70 specifies that this same document docl be published in category N3 news between 5.00 pm on 3 November 2000 and 5.00 pm on 4 November 2000. The first two rules in the publication rale base 70 effectively specify that a document is to be published in one category for a further duration then transferred to another category for a shorter duration.
The third rule in the rule base illustrates that another document, doc2, be published in category NI entertainment between 9.00 am on 5 December 2000 and 9.00 am on 20 January 2001.
By selecting appropriate rules to store in the publication rale base 70, each document 50 could have a well-defined life cycle with a publication start date and a publication end date. Document 50 could effectively be embargoed until a particular publication date or could be tagged with an expiry date after which the document 50 will no longer be available over the network 30 or would otherwise be placed in an archive. The publication rule base 70 provides a user with flexibility to ensure that categories in the class hierarchy 60 can be moved, ordered and added to without affecting existing publications.
In one preferred form, the system 10 includes an event generator 90 as shown in Figure 1. The event generator 90 could comprise a set of instructions stored on computer readable medium or could comprise computer software executing under appropriate operating system and application software. The creation or deletion of a publication could result in the event generator 90 generating an external event. An event handler 100 could be arranged to, for example, drive a proactive delivery mechanism such as email, SMS, or fax for a particular published document 50 once the publication period for a particular document commences. The event handler 100 could comprise a set of instractions stored on computer readable medium or could comprise computer software executing under appropriate operating system and application software.
Similarly when a publication period specified in a publication rale expires, the event generator 90 could generate a further external event indicating that the publication period has expired. On expiry of a rule, the relevant document could be either deleted or archived.
The system 10 could further comprise a rule modifier 80. The rale modifier could be used to modify publication rales in the publication rule base 70. For example, a publication rale could be modified to change its start and/or end date more than once, thereby generating multiple start and end events.
In a further preferred form, the system 10 could also include stractures to define and control user privileges for a particular document or range of documents and also to define rales according to which logical associations may be formed between documents. This requires a further means of organising documents into document types, which is not necessarily equal to the category hierarchy described above.
Documents, especially those which are published electronically are not usually discrete items of information but are generally related to other documents in some kind of structure for example in a hierarchical file system or through links on web pages.
These interconnections between documents can be represented as a document graph with the nodes of the graph representing individual documents and the edges of the graph representing the relationships between them. Figure 6 shows an example document graph.
In order to maintain consistent style and logic in the final publication it is useful to implement one or more rales which define the ways in which the documents in the document graph may be related or linked. It is also useful to implement one or more rules defining which users have privileges to access which documents or nodes on the document graph and what those privileges entitle the user to do for example view, modify or delete a particular document.
The task of implementing these rales on the document graph is somewhat complicated because the document graph changes with time. At a certain point in time tl the document graph may have a completely different form from the one it will have at a successive point in time t2 which could be an hour, a day or a week after tl. The document graph is therefore an evolutionary graph.
To define user privileges for example we want to know if a user has a particular privilege for a particular document in the document graph. Traditionally each information node has applied to it individual access privileges (either for a user or a group). The function that determines a given privilege to a given user against a given document is called a privilege function p. Privilege values for given users against a given document may be provided by direct assignment: the user is directly assigned the privilege to that item. It is often necessary to impose a collection of inheritance rules, such as for example a rale that states that if the user has a privilege for a given document then they have the privilege for all the document's children and their children and so on. This can be modeled by a privilege function on the document graph, but maintaining the function is difficult since one must update its value not only as direct assignments are made, but also as the graph changes (in order to maintain the integrity of the inheritance rales).
Inclusion in the system 10 of a document type graph 85 is intended to solve the above problem. The document type graph 85 could be installed on computer memory and interfaced over a network. This graph is conceptually a "cover" of the document graph. Figure 7 illustrates a graph which is a cover of the document graph in figure 6. The term "cover" in this context means that there exists a mapping function m which can map every node of the document graph to a corresponding node in the cover graph, the cover graph being called the document type graph. If there is an edge between any two nodes in the document graph, for example nodes a and b in figure 6, then there will also be an edge between m(a) and m(b) as is shown in figure 7. In general the mapping function m will be used to map documents in the document graph to document types in the document type graph. The function m is therefore known as the type map.
While the document graph is an evolutionary graph it is preferable that the document type graph be a stationary cover of the document graph, meaning that while the document type graph is always a cover of the document graph no matter its state, the document type graph itself does not change.
This stationary graph can be used to implement privilege (and association) rules. Checking for a given privilege for a given user at a given node in the document graph now requires checking for that privilege for that user at the corresponding node in the document type graph. As the stationary cover graph abstracts the documents into document types it is also possible to implement the inheritance of privileges in the document type graph providing that the privileges for all documents of that type remain the same.
Traditionally when implementing inheritance in a case like this a recursive function would be used to return the value of the privilege function p for a given node. The node itself would initially be checked for the privilege and then if it was not found, all parents of the node would be located and then the process repeated for each parent until either the privilege was found or all parents had been checked.
However this simple recursion is not especially useful for implementing fine-grained inheritance. For example, Figure 8 illustrates a document graph (FILES YSTEM) of several documents in a hierarchical file system. Figure 9 illustrates a document type cover graph (TYPE) for the document graph in Figure 8.
The personal directory My Documents in Figure 8 contains 4 files, namely two ordinary user-created files Filel and File2 and two system files SysFilel and SysFile2. Quite clearly the TYPE graph in Figure 9 is a cover of the FILESYSTEM graph by the type map that maps My Documents to Directory, Filel and File2 to File and SysFilel and SysFile2 to SysFile.
A privilege function p is defined to mean, for example, "has been directly assigned read privilege to". If a user J, for example, has been assigned read privileges to My Documents,
then p is modified so that p(J, Directory) = 1 = true. A desirable rule could be p(J, Directory) = true implies p(J, File) = 1 = true BUT p(J, SysFile) = 0 = False.
To implement the rule, the function p could be modified directly. But rather than doing this a "truth function" r is introduced such that given any edge (a,b) then r(a,b) = 1 means that b inherits privileges from a while r(a,b) = 0 means that b does not inherit privileges from a.
In Figure 9 for example a truth function can be imposed on the TYPE graph such that r(Directory, File) = 1 and r(Directory, SysFile) = 0. Then to determine if user J has read privilege at SysFilel, p(J, SysFile) is checked for truth on the document type cover graph. In this case it is not so all parent nodes of SysFile (in this case Directory) are checked. P(J, Directory) is True so we then use the truth function r to see whether SysFile inherits the privileges of Directory. However in this case the truth function returns false so J does not have privilege at SysFilel even though J does have privilege at My Documents.
This combination of the basic privilege function and a truth function can be referred to as an adjoined privilege function. A more detailed description of the composition of the adjoined privilege function follows.
Let Gt={N , At} be an evolutionary graph and Bt a set of users. A privilege function is a map pt:Bt χ Nt-→{0,1}, which is time-dependent. The privilege function represents a single privilege that an actor may have on a node in the graph (such as removal, modification or publication). The privilege function then represents who has that privilege on which nodes.
Let (c, G) be a stationary cover of Gt. Assuming a truth function r:A→{0,l } on the edges in G, then the adjoined privilege function pr t:BxNt- {0,l} can be formed, defined by the recursive rule,
pr t|xt(b,m) = MAX {pt(b, m), MAX{n inχtst(n,m) inNt} prt (b,n)-r(c(n), c(m))}
What is happening here is that the value of the adjoined privilege function at (b, n) is
determined by the value of the same privilege function at all nodes that relate to n, conditional on the value of r for that edge. So r is the inheritance rule that says that b has privilege at n provided that b has privilege at n or that b has privilege at m and m relates to n and n inherits from m.
The restriction Xt, a subset of Nt, allows one to impose local restriction, noting that pr t|χt as a function of Xt is monotonically increasing over the partially ordered set of subsets of N (by virtue of the MAX).
A simple evaluation algorithm for adjoined functions is described below:
function evaluate (g:Graph,c:Cover,r:TrathFunction,p:PrivilegeFunction,m:Node) { stack:FIFO; marked:Set
if (p(m)) return true; stack.push (m); while (! stack, empty) { m = stack.pop; marked.put (m);
List 1 = g.getParents (m); while (LhasMoreElements) { Node n = l.nextElement; if (Imarked.contains (n)) { if (r(c(n), c(m))) { if(p(n)) return true; stack.push (n);
}
}
}
} return false;
}
FIFO is a first in first out stack, Graph is a graph implementation (appropriately restricted by X if necessary), Cover is the covering map into the covering graph, TruthFunction and PrivilegeFunction are the truth and privilege functions being used.
This procedure essentially follows the algorithm:
1) Given a node m, determine if privileges are given, if so then done.
2) Obtain parent nodes that one inherits the privilege from and apply algorithm to each one until done.
The key factor here is the determination of inheritance. This is performed using the truth function as applied to the covering graph. If one thinks of the graph on which the privilege function is being evaluated as an information graph then the covering graph provides a stable typing of nodes in the information graph and the truth function implements the inheritance rules. This decouples changes in the information graph which are absorbed in the covering map.
Suppose we have a family of privilege functions pi, ...,pn and a family of truth functions r^ j):A→{0,l} that implement cross privilege inheritance (so that read privilege is given to a child whenever the parent has write privilege) The extended adjoined privilege function is then defined as follows:
pri|xt(b,m) = MAX {pι(b, m), MAX{1 <j<n}MAX{ninxtst(n,m)inAt} Prj (b,n)-r(i )(c(n), c(m))}
This enables inheritance across a range of privileges (note that the time variable t has been omitted for clarity.
The use of privilege functions on the document type covering graph can also be used to implement graph deltas.
A graph delta comprises at least two sets of edges or associations in a graph, in this case in the document graph. One of the sets of edges represents edges to be removed while the other represents edges to be added. If a rule is imposed that the document graph be totally connected, then changes in the nodes can be implicitly omitted.
If a particular graph delta is proposed by a user then access rales based on privileges in the document type covering graph can be imposed which determine whether or not a given user may implement a given graph delta.
To implement these rales a construction property and a destruction property is assigned to each edge in the document type covering graph. Each property has two values, start and end which are a subset of all available privileges {pl,...pn}, including the empty set. To determine if a user can implement a given delta, each element in the removal set of the delta is considered and the action for that element is validated. This is done by checking that the user has at least one privilege in each of the start and end sets of the associated edge's destruction property in the document type graph. If the set is empty then truth is by default. A similar validation is performed for the addition elements of the delta. If all elements are valid, then the change can occur.
The existence of a document type graph which covers the document graph has other applications besides the implementation of privilege functions. In particular it can be used to create association rules between different content types that impose association formation rules among content items. Specifically, it can be required that only certain types of content can be children of other types of content.
Additionally the document type covering graph can be used to implement publication or instance conditions. If a is a document that maps to the document type c(a) 'then a publication condition or instance condition is an association rule between c(a) and another type t that requires the existence of a content item b such that c(b) = t and an edge exists in the document graph between a and b.
The instance condition can define required association properties which must be adhered to by the a→b association or edge. If the association properties are sufficient to define an association and a content item, then auto-generation can occur. When a content item is created and associated to another, then the instance conditions may require the existence of association with other content items. These conditions may be fulfilled by actually creating the missing items.
Figure 10, for example, shows a document type graph for an online pavilion website. The site manages one or more pavilions of document type Pavilion, each with one or more exhibits of document type Exhibit. Each exhibit has zero or more articles of document type Article and zero or more case studies of document type Case Study. In a management console a user may wish to be able to add an exhibit to a pavilion and have two sub- directories "Articles" and "Case Studies" automatically generated. Two further document types Article Directory and Case Study Directory are therefore required. Then whenever an exhibit item is created, auto-generation ensures that there are two sub-items named Articles of type Article Directory and Case Studies of type Case Study Directory.
Figure 11 shows the document graph for one pavilion in such a website. The pavilion originally comprises three exhibits El, E2 and E3. Each of the exhibits includes a specific Case Study directory and Article Directory. These directories contain zero or more case studies or articles respectively.
When a new exhibit is added, for example En, any instance conditions for exhibits found in the document type graph are located and applied. In this case a case study directory (case studies n) and an article directory (articles n) are created to fulfil the instance conditions.
The foregoing describes the invention including preferred forms thereof. Alterations and modifications as will be obvious to those skilled in the art are intended to be incorporated within the scope hereof, as defined by the accompanying claims.