DATA STORAGE AND RETRIEVAL USING UNIQUE IDENTIFIERS
FIELD OF THE INVENTION
THIS INVENTION relates to data storage and retrieval. It relates in particular to a method of storing a plurality of documents in a database and to a method of retrieving data from a database. Further, it relates to an arrangement of data in a database.
DESCRIPTION OF THE PRIOR ART
The storage of data in the form of digital images of documents has played an increasing role in recent years with improvements in computer technology. A typical application of the storage of documents in the form of digital images is in the medical field e.g. the storage of a claim document submitted by a doctor to a medical aid fund. When such claims are received by the medical aid fund, a digital image of the original claim document is stored in a storage medium e.g. a CD ROM or the like. Selected information is also manually read from the claim document and entered into an independent storage medium thereby creating an abridgement of the original document. Retrieval of the information from the independent storage medium is generally fairly rapid. However, in the event of full details on the claim being required, the digital image of the original document is usually required. This normally entails obtaining a document reference from the abridgement and retrieving the original document by means of a independent computing or data management system using conventional search techniques. The result is that, due to the substantial size of the database in which the digital images are stored, the retrieval of the digital image of the original document may take an unacceptably long period of time.
It is an object of this invention to offer a solution to this problem. It is however to be appreciated that not only medical records, but also insurance records, or any other bulk record systems, which are conventionally stored in large electronic databases are to be borne in mind for the purposes of this specification.
SUMMARY OF THE INVENTION
According to the invention, there is provided an arrangement of data in a database, the arrangement including a selected number of file locations, each file location including a document which includes a unique primary identification number and the file location being identified from the unique primary identification number; and a plurality of groups of the file locations, each group including NFG files in the group and including a unique secondary identification number which is defined by the absolute value of the first unique primary identification number of a file location in said group of file locations divided by NFG files.
Accordingly, the path to a selected document may be derived from the unique primary identification number.
The number of file locations may be a preselected number of file locations which corresponds to a number of documents which are capable of being stored in at least a particular section of the database.
The file locations may include digital images of the documents which are captured by means of a conventional scanner.
The file locations may be defined by a preselected number of base directories, each group of the file locations including NFG base directories and each base directory being designated by a primary identification number. The
number of base directories in each group of file locations may be less than about 1000, preferably less than about 250.
Each secondary identification number may be associated with a directory used in a conventional computer system, the directory being designated by the secondary identification number and each base directory being defined by a sub-directory of said directory. The primary identification number is typically the document number and the documents are preferably sequentially numbered.
Thus, each unique secondary identification number may be associated with, typically being the name or label of, a directory of a conventional directory /sub-directory arrangement used in conventional computer systems and each unique primary identification number may be associated with, typically being the name or label of, a sub-directory of said directory. Accordingly, each file location may be a sub-directory in which a digital image of the document is stored and which is labelled or named with the unique primary identification number associated with the document. The unique primary identification number is typically the document number.
The database is typically arranged in a hierarchical or so-called "root" structure of directories and in which the file locations are each defined by a subdirectory at a base level LB in the hierarchical structure. Each group of directories at one level above the base level (level LB + 1 ) in the structure may include NFGLB + 1 sub-directories each of which has a secondary identification number designated by the absolute value of the document number of the first document in the group divided by NFGLB + 1.
Each level LB + n may include a plurality of groups of directories, each group of directories including NFGLB + n directories at an immediately lower level
LB + n_ι . Each group of directories at level LB + n may include a unique secondary identification number which is defined by the absolute value of the unique
secondary identification number of the first sub-directory in the group of directories at level LB + n.1 divided by NFGLB + n.
The number of groups of directories at level LB + n is typically between about 2 and about 10 times the number of groups of directories at level LB + n.1. Preferably, an even number of groups of directories NFGLB + n is provided at each level LB + n.
Further in accordance with the invention, there is provided a method of storing a plurality of documents in a database which includes a selected number of file locations, each file location including a document which includes a unique primary identification number; and a plurality of groups of the file locations, each group including NFG files in the group and including a unique secondary identification number which is defined by the absolute value of the first unique primary identification number of a file location in a particular group of file locations divided by NFG files, the method including identifying the unique primary identification number of each document to be stored in the database; determining the secondary identification number by taking the absolute value of the primary identification number of the document to be stored and dividing it by NFG; and storing the document in a file location in the form of a directory which is identified from the unique primary and secondary identification numbers.
The database may be arranged in a hierarchical directory structure.
Accordingly, the method may include iteratively identifying an associated directory of a group of directories at one level higher (level LB + n + 1 ) by dividing the unique secondary identification number of the directory at level LB + n by the number of directories NFGLB + n + 1 in the group.
The method include scanning an original copy of the document to obtain a digital image thereof, and storing the digital image of the document in the file location.
Still further in accordance with the invention, there is provided a method of identifying a path to one of a plurality of file locations in a database which includes a selected number of file locations, each file location including a document which includes a unique primary identification number and the file location being identified from the unique primary identification number; and a plurality of groups of the file locations, each group including NFG files in the group and including a unique secondary identification number which is defined by the absolute value of the first unique primary identification number of a file location in a particular group of file locations divided by NFG files, the method including identifying the unique primary identification number of the document to be retrieved from the database; and dividing the unique primary identification number by the number of file locations NFG in the group and taking the absolute value of the result to obtain the unique secondary identification number of the group in which the document lies thereby to identify the path to the selected document.
The database may be arranged in a hierarchical directory structure.
Accordingly, the method may include iteratively identifying an associated directory of a group of directories at one level higher (level LB + n + 1 ) by dividing the unique secondary identification number of the directory at level LB + n by the number of directories NFGLB + n + 1 in the group.
Further in accordance with the invention, there is provided a method of retrieving data from a database which includes a selected number of file locations, each file location including a document which includes a unique primary identification number and the file location being identified from the unique primary identification number; and a plurality of groups of the file locations, each group
including NFG files in the group and including a unique secondary identification number which is defined by the absolute value of the first unique primary identification number of a file location in a particular group of file locations divided by NFG files, the method including identifying the unique primary identification number of the document to be retrieved from the database, and dividing the number of file locations NFG in the group by the unique primary identification number, and taking the absolute value of the result to obtain the unique secondary identification number of the group in which the document lies thereby to identify the path to the selected document; and reading the data stored in the directory via the path.
The unique primary identification number may be associated with a name of a legal entity e.g. a natural person, a business, or the like. The method may include searching for the name of the legal entity in a conventional manner and retrieving the unique primary identification number thereby to identify a name and path of the directory in which the document has been stored.
Further in accordance with the invention, there is provided a data management installation which includes reading means for reading data from data storage means which includes a digital image of a plurality of documents arranged in an hierarchical structure in a database which includes a selected number of file locations, each file location including a document which includes a unique primary identification number and the file location being identified from the unique primary identification number, and a plurality of groups of the file locations, each group including NFG files in the group and including a unique secondary identification number which is defined by the absolute value of the first unique primary identification number of a file location in a particular group of file locations divided by NFG files; input means for receiving the primary identification number of a document to be retrieved from the storage means; and
processing means arranged to identify a path in the hierarchical structure to the file location in which the document has been stored, the path being derived from a unique primary identification number of the document.
The installation may include interface means for interfacing the installation to a conventional data management installation which selectively accesses abridgements of documents e.g. abridgements of medical aid claims or the like which have been stored in the form of a digital image.
The installation may be arranged to receive a document number from the abridgement, the document number being translated into a unique primary identification number thereby to permit a digital image of the entire original document to be retrieved.
The storage means may be a plurality of CD ROMs which define the database, the reading means being a so-called "CD jukebox" .
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is now described, by way of example, with reference to the accompanying diagrammatic drawings.
DESCRIPTION OF THE PREFERRED EMBODIMENT
In the drawings, Figure 1 shows a schematic diagram of a data management installation in accordance with the invention;
Figure 2 shows a schematic diagram of an arrangement of data in a database, also in accordance with the invention, of the installation of Figure 1 ; and
Figure 3 shows a sub-section of a directory structure which is arranged in a similar fashion to that of Figure 2.
Referring to the drawings, reference numeral 10 generally indicates a data management installation in accordance with the invention. The installation 10 includes a data capturing sub-section 1 2, a data storage sub-section 14, and a data retrieval sub-section 1 6. The installation 10 is configured or arranged to store a digital image of each of a substantial number of documents for subsequent retrieval from a database arrangement as described in more detail below.
The data capturing sub-system 12 includes a conventional digital scanner 1 8 which scans a substantial number of documents 20 and feeds a digital image of each document 20 into storage means 22 as shown by arrow 24. Once the documents 20 have been scanned, they are physically stored in a warehouse or discarded as indicated by arrow 26.
The documents 20 are numbered sequentially and each number defines a unique primary identification number which is associated with a particular document 20. The installation 10 includes interface means as indicated by arrow 28, for interfacing the installation 1 0 to a conventional computer system
30 which is configured to access abridgements of the documents 20 in its storage means (not shown). In particular, the documents 20 are typically claim forms received by a medical aid company from various medical practitioners.
Conventionally, selected information from each document 20 is manually entered into the conventional computer system 30 to create an abridgement of the document 20 for subsequent retrieval by an operator. However, in certain circumstances, merely accessing the abridgement is insufficient adequately to attend to any query pertaining to the document, and, accordingly, the operator would then in a conventional system physically retrieve the document 20 to obtain comprehensive information on a particular transaction or claim.
However, unlike conventional systems, the data management installation 1 0 stores each document 20 in a unique fashion in an arrangement of data in a database in the storage means 22. In particular, the database is arranged in an hierarchial or so-called "root" structure 32 (see Figure 2) . The structure 32 of the arrangement includes 100 directories or file locations only a few of which are referenced in the drawings by reference numeral 34. The file locations or directories 34 are divided into a number of groups of directories 36.1 , 36.2, 36.3 and so on, each comprising 10 directories 34 or number of files in the group (NFG) . Each directory 34 has as its title or name the unique primary identification number of a document 20 intended to be stored therein. Thus, each document 20 is stored in a specific location to facilitate subsequent retrieval thereof.
The directories 34 are located at a base level LB in the hierarchial structure 32 as indicated by arrow 38. The group of directories 36.1 is associated with a directory 40 at a level LB + 1 , which is one level higher in the hierarchial structure 32. In a similar fashion, further directories 42 to 58 are provided at level LB + 1 each of which are associated with 10 file locations (NFG) or directories 34 at level l_B, each file location or directory 34 bearing the name or label of the unique identification number of the document 20 to be stored therein. Further, as in the case of the directories 34 which are grouped into groups of directories 36, the directories 40 to 58 at level LB + 1 are grouped into two groups of directories 60, 62, at a level LB + 2, each group having 5 (NFGLB + 2) sub-directories. Further, in a similar fashion, the groups of directories 60, 62 are grouped or branch out from a further directory 64 which bears a label " 1 -100" and which is thus representative of the range of documents 20 having unique primary identification numbers between 1 and 1 00 which are associated with the directory. The directory 64 has 2 (NFGLB + 3) directories in its group.
The various names of the directories 64, 60, 62, 40, to 58, and 34 are in the form of reference numerals which are allocated in a specific fashion. In particular, the name of the directory 40 defines a unique secondary
identification number which is defined by the absolute value of the first unique primary identification number 34.1 in the group of directories 36.1 divided by the total number of directories or file locations in NFG in the group of directories 36.1 . For example, as the first document is stored in file location 34.1 it bears a unique primary identification number 0 and the name of the directory 40 is then defined by the absolute value of 0 divided by 10 which is 0 as show in Figure 2. In the case of the directory 42, its name or label is defined by the absolute value of the first unique primary identification number 34.2 in a second group of directories 36.2 divided by the number of files or directories in NFG in the particular group, i.e. the absolute value of 10 divided by 10 which is equal to 1 . In a similar fashion, the unique secondary identification numbers which define the names of the directories 44 to 58 are determined.
In a similar fashion, the label or name of the group of directories 60 is defined by the absolute value of the unique secondary identification number "0" which is the name of the first group of directories 40 at an immediately lower level LB + n.1 ; divided by the number of groups of directories at an immediately lower level, i.e. 5 thus providing a result of 0 as shown in Figure 2. In a similar fashion, the name of the group of directories 62 is derived by the first unique secondary identification number which is the file name of the group of directories 50, i.e. 5 divided by 5 (NFGLB + 2) which equals 1 .
It is to be appreciated that, in other embodiments of the invention, the hierarchial structure may comprise a plurality of different levels. The number of different levels depends upon the number of documents which are to be stored in the hierarchy. Further, the fewer the number of levels, i.e. the flatter the hierarchial structure is, the more simple the path is to the particular directory in which the document is stored and thus retrieval times may be reduced in comparison to a very pointed hierarchial structure in which a number of levels are included.
When a substantial number of documents are to be stored, a plurality of hierarchial structures (one of which is shown in Figure 3) which are independent of each other may be used. Preferably, due to software and hardware limitations of certain computing systems, the number of file locations or directories 34 at the base level LB in the hierarchial structure 38 is typically less then about 1000 and, more preferably, less than about 250. The hierarchial structure 38 may thus include a plurality of levels extending above base level LB, each level including a group of directories at a level LB + n having NFGl-B + n directories in the group. Thus, from a top down point of view, each directory in a group of directories branches out or extends into NFGLB + n groups of directories at an immediately lower level LB + n.1 . The name or secondary identification number of each group of sub-directories is then determined in a similar fashion as described above. When arranging the database, the number of groups of directories at level LB + n are typically between about 2 and about 10 times the number of groups of directories at level LB + n. Thus, it is evident that the number of levels LB + n is dependent upon the number of documents at the base level l_B in the hierarchial structure.
Once the hierarchial structure 38 has been established and the data has been arranged in the database as described above, a digital image of each document 20 is stored on a plurality of compact discs 70 as shown in Figure 1 . The compact discs 70 may form part of a library of information on various transactions or claims which have been submitted to the medical aid via the various doctors. Certain of the compact discs may be loaded in a CD jukebox 72 to provide a near line facility and other compact discs may be loaded in a CD tower 74 to provide an on-line facility as shown by arrows 76, 78 respectively. In other embodiments of the invention, the database is stored on a magnetic media 80.
In order to retrieve a digital image of a specific document 20 from the database, the installation 10 includes computing means 82 (see Figure 1 ) which is arranged to generate a variety of user friendly screens to assist in
instructing the computing means 82 to perform various retrieval functions. The computing means 82 is programmed in such a fashion so that an indexed field window 84 prompts a user to enter a client name 86 via a keyboard (not shown). The computing means 82, in a conventional fashion, then retrieves the unique primary identification number 88 which is associated with the client name 86. The unique primary identification number is then fed to a unique key of documents screen 90 which has a search prompt 92 which may be activated with a mouse to initiate retrieval of a selected document from the database.
In order to facilitate retrieval of the digital image of the selected document from the database, the path to the particular file location or directory 34 in which the document has been stored is derived directly from the unique primary identification number which defines the name of the file location or directory 34 in which the document 20 has been stored. In order to determine this path, the relevant directories in the groups of directories, at the various levels LB + n in the hierarchial structure 32 must be determined. As described above, the name of the actual directory 34 in which the document 20 has been stored is determined as indicated above and the particular directory in the group of directories is then determined by taking the absolute value of the unique primary identification number or document number divided by the number of file locations or directories 34 NFG at base level LB. Once the name of the particular directory at each intermediate level in the hierarchial structure 32 has been determined, the path to the relevant document may be reconstructed and thus retrieval time may be reduced. For example, to identify the path to document number 10 in the hierarchial structure 32, the unique primary identification number i.e. " 10" is divided by the number of the number of groups of directories NFG at the base level LB i.e. " 1 0" and the absolute value thereof is taken, i.e. directory 42 labelled " 1 " is identified at level LB + 1 . In a similar fashion, if document 1 5 is to be retrieved, the absolute value is taken of the unique primary identification number or document number 1 5 divided by the number of files in the group of directories 36, i.e. the result is the absolute value of 1 .5 which is 1 . Likewise a particular
sub-directory at level LB + n is determined and thus the path to the document may be determined.
The inventors believe that the invention, as illustrated, provides a data management system 10 which has enhanced retrieval characteristics of a document from a database as the documents are stored in a particular fashion and the actual path to the directory in which the document has been stored is derived from a unique primary identification number allocated to the document.