US6643644B1 - Method and apparatus for retrieving accumulating and sorting table formatted data - Google Patents

Method and apparatus for retrieving accumulating and sorting table formatted data Download PDF

Info

Publication number
US6643644B1
US6643644B1 US09/762,584 US76258401A US6643644B1 US 6643644 B1 US6643644 B1 US 6643644B1 US 76258401 A US76258401 A US 76258401A US 6643644 B1 US6643644 B1 US 6643644B1
Authority
US
United States
Prior art keywords
field
field value
value
array
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
US09/762,584
Other languages
English (en)
Inventor
Shinji Furusho
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Turbo Data Laboratories Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Application granted granted Critical
Publication of US6643644B1 publication Critical patent/US6643644B1/en
Assigned to FURUSHO, MR. SHINJI, ASSIST SYSTEMS LABORATORY CO., LTD., TURBO DATA LABORATORIES reassignment FURUSHO, MR. SHINJI LICENSE AGREEMENT Assignors: FURUSHO, MR. SHINJI, TURBO DATA LABORATORIES
Assigned to TURBO DATA LABORATORIES, INC. reassignment TURBO DATA LABORATORIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FURUSHO, SHINJI
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Definitions

  • the invention relates to a data processing method and data processing apparatus or processing large amounts of data using a computer or other information processing apparatus, and particularly to a method and apparatus for searching for, tabulating and sorting table-format data.
  • FIG. 1 is a diagram showing an example of expressing the data to be processed in a table format.
  • FIG. 1 shows an example wherein the sex, age and occupation data for a large number of people, e.g. 1 million, are stored in a table.
  • the horizontal rows in the table namely the so-called records, consist of the record number, and the sex, age and occupation fields corresponding to the record number.
  • the vertical columns in the table consist of the record number, sex, field, age field and occupation field.
  • the table indicates that the person with the record number of “0” has a sex of female, age of 18 and occupation of programmer.
  • the data such as “Female,” “18” and “Programmer” set in the various fields are called field values.
  • the table-format data consisting of 1 million records shown in FIG. 1 is used as a specific example of a large amount of data.
  • Whether or not large amounts of data can be searched for or tabulated efficiently depends on the format in which the large amount of data is stored.
  • typical known storage techniques include the so-called “record-sequential” and “field-sequential” storage techniques shown in FIGS. 2A and 2B, respectively.
  • FIG. 2 A and FIG. 2B show a representation of the data storage format on a storage device, e.g. a hard disk.
  • a storage device e.g. a hard disk.
  • the record-sequential storage technique in FIG. 2A a set of the field values of sex, age and occupation for each record number is stored on disk in the order of increasing logical addresses sequentially for each record number.
  • the field values are stored in record number order grouped by field in the direction of increasing logical addresses. To wit, in the example of FIG.
  • the field values for the sex field corresponding to record numbers “0” through “999999” are arranged in order, and next, the field values for the age field are arranged in record number order, and then the field values for the occupation field are arranged in record number order.
  • field values corresponding to all fields for all record numbers are stored as is in a two-dimensional data structure (with the record number as one dimension and the other field values as one dimension).
  • a data structure in particular shall be referred to as a “data table.”
  • searching for and tabulating stored data this is performed by accessing such a data table.
  • the value “Male” may be converted to “0” while the value “Female” is converted to “1” and then the values “0” or “1” are stored as the field values instead of “Male” or “Female.” Even in this case, there is no change to the point that the converted codes are stored in a data table as field values.
  • the data tables easily become enormous in size and cannot be easily separated (physically) into individual fields. For example, when extracting records in which the sex is “Male,” the age and occupation information is unnecessary, so efficiency could be improved if the table could be separated into a table containing only the sex fields.
  • the field-sequential storage technique shown in FIG. 2B while separation into individual fields is simple, when large amounts of data are handled, the size of the data table still becomes enormous, so the actual expansion of a data table into memory or other fast storage device for the purpose of tabulating or searching is difficult.
  • the object of the present invention is to provide a method of searching for, tabulating and sorting table-format data and an apparatus for implementing said method by providing a data control mechanism that both has the functions of the conventional data table and solves the aforementioned problems with the data structure based on the data table.
  • the method and apparatus for searching for and tabulating table-format data proposes a novel data control mechanism that is usable on an ordinary computer system.
  • the data control mechanism according to the present invention comprises a value control table and an array of pointers to the value control table, as a general rule.
  • FIG. 3 is a diagram used to explain the principle of the present invention, showing a value control table 10 and an array of pointers to the value control table 20 .
  • a value control table 10 is defined to be a table made by assigning, for each field in table-format data, an (integral) field value number to each field value belonging to that field, and the table thus contains the field values corresponding to said field value number arranged in order of the field value numbers (reference number 11 ) along with a category number (reference number 12 ) which relates to said field value.
  • An array of pointers to the value control table 20 is defined to be an array containing pointers to the field value numbers of the columns (namely, the fields) in the table-format data, namely to the value control table 10 , arranged in order of the record numbers of the table-format data.
  • the data control mechanism according to the present invention which includes a value control table generated for a certain field within the fields of table-format data and an array of pointers to the value control table, may also be referred to in particular as an “information block” in the following explanation.
  • the information blocks according to the present invention are characterized in that the data are completely separated by column in the table format, namely by field.
  • the present invention large amounts of data are separated by field, so it is possible to load only that data related to those fields required for searching or tabulating into memory or other high-speed storage device, and as a result, the access time to the data is reduced, so the searching and tabulating processes are speeded up, and even extremely large amounts of data can be handled without adversely affecting performance.
  • the field values are stored in the value control table, and the record numbers that indicate the position of the value are associated with the array of pointers to the value control table, so there is no need for the field values to be arranged in record number order. Therefore, data can be sorted on field values such that it is suited to searching and tabulating. Thereby, the determination of whether or not a field value matching the target value is present in the data can be performed at high speed. Furthermore, corresponding field value numbers are assigned to the field values, so even if the field values consist of long data or text strings, they can be handled as integers.
  • all of the field value numbers of the value control table 10 correspond to different field values, so the number of comparison operations between a specific value and the field values which are required to extract a record containing a field value having that specific value is no more than the number of possible field values, namely the number of field value numbers, so the number of comparison operations is greatly decreased, thus speeding up searching and tabulating.
  • the category number 12 can be used as this storage location.
  • FIG. 4 shows the information block according to the present invention which comprises a value control table 10 including an array of field values 11 containing the field values, an array of category numbers 12 containing the category numbers, and an array of counts 14 containing the counts.
  • the array of counts 14 contains numbers which indicate a count of the number of times each field value is present within all data in a certain field, or in other words, the number of records which have a stipulated field value.
  • FIG. 5 shows an information block including a value control table 10 , array of pointers to the value control table 20 and an array of pointers to records 30 .
  • the array of pointers to records 30 is defined to be an array containing, for each field value number, namely each field value, pointers to records that have that field value (corresponding to the record number).
  • the number of pointers contained in the array of pointers to records 30 for each field value matches the number of entries in the array of counts 14 in the value control table 10 .
  • an array of start positions 13 which specifies the starting address of a group of pointers for each field value may be provided within the array of pointers to records 30 .
  • the individual field information refers to the aforementioned “information block”
  • the field value number-specifying information array refers to the aforementioned “array of pointers to the value control table” while the record identifying information array refers to the aforementioned “array of pointers to records.”
  • the method of extracting from the table-format data the field value corresponding to a specific field and a specific record comprises the steps of:
  • a value control table containing field values which are located in order of field value number each corresponding to the field value belonging to the specific field, and a field value number-specifying information array containing information that specifies the field value numbers in the order of the records,
  • category numbers are stored in the value control table corresponding to the field value number, and the category numbers are accessed at the time of obtaining the field value corresponding to the field value number.
  • a single-search method of searching through said table-format data for field values that match a specific search condition comprises the steps of:
  • individual field information such that includes a value control table containing field values which is located in order of field value numbers each corresponding to the field value belonging to the field associated with the search condition, a field value number-specifying information array containing information that specifies said field value numbers in the order of said records, and a record identification information array storing in exclusive areas for each of said field value numbers one or more pieces of record identification information related to identical field value numbers, and said value control table includes, for each of said field value numbers, record identification information-specifying information that indicates the area where said one or more pieces of record identification information related to identical field value numbers in said record identification information array,
  • the multiple-field search method according to the present invention comprises the steps of:
  • this method comprises the steps of:
  • the method of tabulating the table-format data by each field value comprises the steps of:
  • n represents an integer equal to 1 or greater, for each of n fields used in tabulation, keeping in a storage device individual field information including a value control table containing field values for that field corresponding to a field value number that uniquely identifies the field value, which is a field value number that is common to the various fields and has a stipulated order from an initial value, and a field value number-specifying information array containing information that specifies the field value numbers in the order of the records,
  • N i if i represents an integer in the range 1 ⁇ i ⁇ n, for the i th individual information field, the total number of the field value numbers is represented by N i , k i represents an integer in the range 0 ⁇ k i ⁇ N i ⁇ 1, M represents an integer equal to 1 or greater, and if m is an integer in the range 1 ⁇ m ⁇ M, then initializing elements P m (k 1 , k 2 , . . . , k i , . . . , k n ) of n-dimensional M data spaces having a size of N 1 ⁇ N 2 ⁇ . . . ⁇ N i ⁇ . . . ⁇ N n ,
  • table-format data is represented as an array of records including a plurality of fields containing field values for each field, the method of tabulating the table-format data by the category of field values,
  • n represents an integer equal to 1 or greater, for each of n fields used in tabulation, keeping in a storage device individual field information including a value control table containing field values for that field and the category number of the field value corresponding to a field value number that uniquely identifies the field value, which is a field value number that is common to the various fields and has a stipulated order from an initial value, and a field value number-specifying information array containing information that specifies the field value numbers in the order of the records,
  • N i if i represents an integer in the range 1 ⁇ i ⁇ n, for the i th individual information field, the total number of either the field value numbers or the category numbers is represented by N i , k i represents an integer in the range 0 ⁇ k i ⁇ N i ⁇ 1, M represents an integer equal to 1 or greater, and if m is an integer in the range 1 ⁇ m ⁇ M, then initializing elements P m (k 1 , k 2 , . . . , k i , . . . , k n ) of n-dimensional M data spaces having a size of N 1 ⁇ N 2 ⁇ . . . ⁇ N i ⁇ . . . ⁇ N n ,
  • the step of processing the value of the identified element P m comprises: for at least one element P m among the M elements P m ,
  • the information that specifies the field value number may be the field value number itself.
  • the information that specifies the field value number may be a binary value wherein 1 bit is allocated to each field value number, thus setting whether or not it is set.
  • the apparatus for searching for and tabulating the table-format data comprises:
  • a storage device for keeping, for each individual field, a value control table containing field values for that field corresponding to a field value number that uniquely identifies the field value, which is a field value number that is common to the various fields and has a stipulated order from an initial value, and a field value number-specifying information array containing information that specifies the field value numbers in the order of the records,
  • table-format data is represented as an array of records including a plurality of fields containing field values for each field
  • the storage medium upon which is recorded a program for searching for and tabulating the table-format data according to the present invention is recorded with a program characterized in comprising:
  • a step of keeping in a storage device for each individual field, a value control table containing field values for that field corresponding to a field value number that uniquely identifies the field value, which is a field value number that is common to the various fields and has a stipulated order from an initial value, and a field value number-specifying information array containing information that specifies the field value numbers in the order of the records,
  • the present invention also proposes a sorting method whereby an array of record identification information, e.g. record numbers, specifying records including a plurality of fields containing field values corresponding to fields of information is rearranged on a specific field.
  • an array of pointers to the value control table is formed wherein, for each record, record identification information is associated with field value number corresponding to the field values of a certain field.
  • the storage location after reordering said record identification information is defined.
  • Said record identification information is sequentially extracted from the array, and said field value number corresponding to said record identification information thus extracted is determined, the record identification information thus extracted is stored in said storage location according to the record identification information-specifying information corresponding to the field value number thus determined, and the storage location where the record identification information is to be stored is updated in order to store the next record identification information.
  • a preferred embodiment of the sorting method comprises the steps of keeping in a storage device individual field information including a value control table containing field values in the order of field value numbers corresponding to field values for a field value associated with search conditions, and a field value number-specifying information array containing information that specifies field value numbers in the order of the records, where the value control table further includes record identification information-specifying information that, for each field value number, indicates the area in said record identification information-specifying information array where said one or more pieces of record identification information regarding identical field value numbers are stored, and is constituted such that, record identification information is stored at storage locations according to the record identification information-specifying information.
  • the objects of the present invention may be achieved by an apparatus for implementing the aforementioned methods, a computer-readable storage medium containing a program according to this method, or a computer-loadable program product according to the method in question.
  • FIG. 1 is an explanatory diagram illustrating typical table-format data.
  • FIGS. 2A and 2B are explanatory diagrams illustrating table-format data storage techniques in the prior art.
  • FIG. 3 is an explanatory diagram illustrating the principle of the present invention.
  • FIG. 4 is an explanatory diagram illustrating an information block according to the present invention.
  • FIG. 5 is an explanatory diagram illustrating an information block according to the present invention.
  • FIG. 6 is an explanatory diagram illustrating an information block regarding “sex” used in an embodiment of the present invention.
  • FIG. 7 is an explanatory diagram illustrating an information block regarding “age” used in an embodiment of the present invention.
  • FIG. 8 is an explanatory diagram illustrating an information block regarding “sex” used in an embodiment of the present invention.
  • FIG. 9 is a flowchart of the operation of the method of searching within a single field according to Embodiment 1 of the present invention.
  • FIG. 10 is an explanatory diagram illustrating an information block according to Embodiment 1 of the present invention.
  • FIG. 11 is an explanatory diagram illustrating an information block according to Embodiment 1 of the present invention.
  • FIG. 12 is a flowchart of the operation of the method of searching upon an AND of multiple fields according to Embodiment 2 of the present invention.
  • FIG. 13 is an explanatory diagram illustrating an information block according to Embodiment 2 of the present invention.
  • FIG. 14 is an explanatory diagram illustrating an information block according to Embodiment 2 of the present invention.
  • FIG. 15 is an explanatory diagram illustrating the method of multiple-field Boolean operation searching using bit flags according to Embodiment 3 of the present invention.
  • FIG. 16 is an explanatory diagram illustrating the method of multiple-field Boolean operation searching using bit flags according to Embodiment 3 of the present invention.
  • FIG. 17 is a flowchart of the operation of the method of tabulating according to Embodiment 5 of the present invention.
  • FIG. 18 is a conceptual explanatory diagram of Embodiment 6 of the present invention.
  • FIG. 19 is a flowchart of the operation of Embodiment 6 of the present invention.
  • FIG. 20 is a flowchart of the operation of cross-tabulating according to Embodiment 6 of the present invention.
  • FIG. 21 is an explanatory diagram illustrating an information block according to Embodiment 8 of the present invention.
  • FIG. 22 is a flowchart of the operation of cross-tabulating according to Embodiment 9 of the present invention.
  • FIGS. 23A and 23B are conceptual explanatory diagrams of a cross-tabulation table.
  • FIG. 24 is an explanatory diagram illustrating multi-answer type fields.
  • FIG. 25 is an explanatory diagram illustrating an information block of a type compatible with multi-answer type fields according to Embodiment 10 of the present invention.
  • FIG. 26 is an explanatory diagram illustrating the method of handling special values according to Embodiment 11 of the present invention.
  • FIG. 27 is a flowchart of the operation of the method of searching upon multiple fields according to Embodiment 12 of the present invention.
  • FIG. 28 is a structural diagram of a searching and tabulating system for table-format data based on one embodiment of the present invention.
  • FIG. 29 is an explanatory diagram illustrating the method of constructing an information block.
  • FIG. 30 is an explanatory diagram illustrating the preparation for data population and initialization.
  • FIG. 31 is an explanatory diagram illustrating the first pass of data population.
  • FIG. 32 is an explanatory diagram illustrating the second pass of data population.
  • FIG. 33 is an explanatory diagram illustrating the third pass of data population.
  • FIG. 34 is an explanatory diagram illustrating the third pass of data population.
  • FIG. 35 is an explanatory diagram illustrating the third pass of data population.
  • FIG. 36 is an explanatory diagram illustrating the addition of data to an information block.
  • FIG. 37 is a diagram illustrating the structure of an information block according to another embodiment of the present invention.
  • FIG. 38 is an explanatory diagram illustrating the initial state of sorting according to Embodiment 13 of the present invention.
  • FIG. 39 is an explanatory diagram illustrating the first step of sorting according to Embodiment 13 of the present invention.
  • FIG. 40 is an explanatory diagram illustrating the second step of sorting according to Embodiment 13 of the present invention.
  • FIG. 41 is an explanatory diagram illustrating the final state of sorting according to Embodiment 13 of the present invention.
  • FIG. 42 is an explanatory diagram illustrating sorting on a partial set.
  • FIG. 43 is an explanatory diagram illustrating the post-processing of sorting on a partial set.
  • FIG. 44 is an explanatory diagram illustrating the 1 million records of data used in the searching and tabulating tests.
  • FIG. 45 is an explanatory diagram illustrating the results of measurement of the searching and tabulating tests on 1 million records of data.
  • FIGS. 46A and 46B are flowcharts illustrating the OR search process on multiple fields as a variation of Embodiment 2 of the present invention.
  • FIG. 47 is a flowchart illustrating the searching process according to Embodiment 3 of the present invention.
  • FIG. 48 is a flowchart illustrating the tabulating process according to Embodiment 4 of the present invention.
  • FIG. 49 is a flowchart illustrating the tabulating process according to Embodiment 7 of the present invention.
  • FIG. 1 the table-format data illustrated in FIG. 1 as an example of data, and make a detailed description of the search method, tabulating method and sorting method according to the present invention in various embodiments.
  • the data illustrated in the example of FIG. 1 includes the fields of “sex,” “age” and “occupation,” so as shown in the individual figures in FIGS. 6-8, the information blocks obtained are an information block regarding “sex,” an information block regarding “age” and an information block regarding “occupation.”
  • the following description assumes a situation wherein these information blocks are obtained. Note that while one technique of constructing the information blocks will be described later, note that the present invention is in no way limited by the method of constructing the information blocks.
  • the apparatus for searching for and tabulating table-format data is provided with the structure shown in FIG. 28 .
  • the apparatus for searching for and tabulating table-format data is implemented by means of a computer system such as an ordinary personal computer.
  • This computer system includes a CPU 100 that executes programs to control the entire system and its individual constituent components, ROM (Read Only Memory) 110 that stores programs and the like, RAM (Random Access Memory) 120 that stores working data and the like, a hard disk storage device 130 , a display device 140 , and a keyboard, mouse or other input device 150 .
  • the CPU 100 , ROM 110 , RAM 120 , and the like are connected to each other via a bus 160 .
  • Other components that may also be connected to the bus include a CD-ROM drive (not shown) for accessing CD-ROM discs, an external network (not shown) and an interface (not shown) provided to connected external terminals, and the like.
  • the program that performs the searching and tabulating (and also depending on the case, sorting) of table-format data may be contained on CD-ROM (not shown) and read by a CD-ROM drive (not shown), or stored in advance in ROM 110 .
  • the program may also be stored in a specific area of the hard disk storage device 130 .
  • the aforementioned program may also be supplied from outside via the network, external terminals or interface (none of these are shown).
  • This information block generation program may be similarly contained on CD-ROM, stored in ROM 110 , or stored on the hard disk storage device 130 . Alternately, the aforementioned programs may also be supplied from outside via the network, external terminals or interface (none of these are shown). In addition, in this embodiment, the data (information blocks) generated by the aforementioned information block generating program that generates the information blocks are stored in RAM 120 or in a specific area of the hard disk storage device 130 .
  • FIG. 9 is a flowchart of the operation of the method of searching within a single field. This is implemented by the CPU 100 executing the search program acquired by the aforementioned procedure and stored in a stipulated area.
  • Step 100 From among the information blocks regarding table-format data, select the information block regarding “age” shown in FIG. 7 as the specific information block.
  • Step 102 set “1” in the category number of those rows in which the field value within the value control table of the specific information block matches “16” or “19” which is the aforementioned search condition, and set “0” in the category number of other rows (Step 102 ).
  • “1” is set in the category number of those rows corresponding to a field value number of “0” and field value number of “3.”
  • the start positions and counts corresponding to the rows wherein the category number is set to “1” are acquired as pointer extraction information (Step 104 ).
  • the field value number of “0” has a corresponding start position of “0” and count of “45898.”
  • the field value number of “3” has a corresponding start position of “238137” and count of “189653.”
  • Step 106 By extracting from the array of pointers to records the number of pointers specified by the aforementioned start position and count, the record numbers that represents pointers to the records matching the aforementioned search conditions are extracted (Step 106 ).
  • the pointers to records corresponding to the field value number of “0” are stored in the array of pointers to records at locations from the start position of “0,” or namely the beginning, up until the 45898 th location, while the pointers to records corresponding to the field value number of “3” are stored in the array of pointers to records at 189653 locations starting from the 2383137 th location.
  • the “age” corresponding to the record with the last record number of “999999” is “16,” so as shown in FIG. 11, the last pointer among the stored pointers within the array of record pointers which correspond to a field value number of “0,” or namely an “age” of “16,” is “999999.”
  • an array of the extracted record numbers is created as a result set and saved (Step 108 ).
  • FIG. 12 is a flowchart of the operation of the method of searching upon an AND of multiple fields.
  • Step 120 a result set of records wherein the “age” is “16” or “19” is obtained by means of the processing according to Embodiment 1 (Step 120 ). Therefore, the processing of this Step 120 corresponds roughly to that shown in FIG. 9 .
  • Step 122 the information block regarding “occupation” which is the second field shown in FIG. 8 is selected as the second specific information block.
  • Step 124 set “1” in the category number of those rows in which the field value within the value control table of the specific information block matches “Student” which is the aforementioned search condition, and set “0” in the category number of other rows (Step 124 ).
  • “1” is set in the category number of those rows corresponding to a field value number of “0,” and “0” is set in other rows.
  • Step 126 sequentially extract from the result set for the first search condition those record numbers that represent pointers to records. For example, as shown in FIG. 14, the record number “999999” is extracted.
  • Step 128 extract from the array of pointers to the value control table the field value numbers corresponding to the record number obtained with respect to the aforementioned first search condition. For example, as shown in FIG. 14, the field value number of “0” corresponding to the record number of “999999” is extracted.
  • Step 130 a decision is made as to whether or not “1” is set in the category number corresponding to the field value number extracted with respect to the second specific information block (Step 130 ). For example, as shown in FIG. 14, one can see that “1” is set in the category number corresponding to the field value number of “0.”
  • Step 132 add pointers to records corresponding to locations within the array of pointers to the value control table where pointers indicating the field value number in question where “1” is set in the category number, for example, record numbers, to the final result set (Step 132 ). For example, as shown in FIG. 14, the record number “999999” is added to the final result set.
  • FIG. 46A is a flowchart illustrating one example of the processing of an OR search process on multiple fields. This process is also implemented by the CPU 100 executing a program stored in a stipulated area. As shown in FIG. 46, first, after the result set is obtained with respect to the first search condition (Step 4601 ), an information block for the second search condition is selected (Step 4602 ). Next, regarding this information block, a category number is set with respect to the second search condition (Step 4603 ).
  • Step 4604 While skipping record numbers contained in the result set from the first search condition, the array of pointers to the value control table is scanned sequentially with respect to the second specific information block (Step 4604 ).
  • the record numbers wherein the category number was made “1” regarding the second search condition and a decision is made as to whether or not this record number was found within the result set according to the first search condition (Steps 4611 - 4615 ). If the number is not found within the result set according to the first search condition, that number is added to the result set (Step 4614 ).
  • a second result set is generated by combining the record numbers stored in the result set from the first search condition and the record numbers belonging to the field value numbers for which the category number is set with respect to the second information block (Step 4615 ), and this can be provided as output.
  • the process shown in FIG. 46B may be executed.
  • a second result set is obtained based on the second search conditions regarding the second specific information (Steps 4621 - 4624 ), and an OR of the first result set and second result set (Step 4625 ) is performed using a bitmap (Step 4626 ), and a new result set is created based on this (Step 4627 ).
  • steps 4602 and 4603 correspond to Steps 4621 and 4622 of FIG. 46A
  • step 4625 corresponds to Step 4601 of FIG. 46 A.
  • FIGS. 15 and 16 are explanatory diagrams illustrating the method of multiple-field Boolean operation searching using bit flags according to Embodiment 3 of the present invention, illustrating the case of performing a search under the same search conditions as the search according to the aforementioned Embodiment 2 of the present invention.
  • Multiple-field Boolean operation searching using bit flags is defined to mean a search wherein the search conditions are expressed by a Boolean operation among search conditions for each field.
  • a result set obtained by means of a search on a single field should not be constructed of an array of record numbers but rather it is more advantageous for the result set to be constructed in the form of bit flags. Namely, in accordance with the process illustrated in FIG.
  • the result set is generated by allocating one bit each to all of the records, and a bit value of “1” or “0” expresses whether or not each record matches the search conditions.
  • an information block containing field values pertaining to the search condition is selected (Step 4701 ), and then the category number is set to “1” on rows that match the search conditions (Step 4702 ).
  • the corresponding category number is accessed for each record and the bit value to be stored in the result set is determined (Steps 4703 - 4707 ).
  • the size of the result set for each field corresponds to the number of records in the table-format data, so the size of the result set is identical for each field, and as a result, it is simple to perform Boolean operations, e.g., AND, OR and XOR, on elements in the result set.
  • Boolean operations e.g., AND, OR and XOR
  • the result set A shown in FIG. 15 and the result set B shown in FIG. 16 are joined under AND conditions to obtain the desired search result set in bit flag format.
  • the search result set in bit flag format thus obtained can be converted to a result set in the format of an array of pointers to records, and thus combined with the aforementioned method of searching on multiple fields according to Embodiment 2 of the present invention.
  • the tabulating method according to Embodiment 4 of the present invention comprises counting the number of records that have a specific field value in a specific field.
  • Embodiment 4 of the present invention we shall consider the case of counting the number of records that have the field value of “Male” or that have the field value of “Female” in the “sex” field. As illustrated in FIG.
  • the information block regarding “sex” contains a count of the records that contain the field value of “Male” (its value being “632564”) and a count of the records that contain the field value of “Female” (its value being “367436”), so a simple tabulation of the number of records can be obtained immediately by accessing the array of counts within the information block.
  • FIG. 17 is a flowchart illustrating the operation of Embodiment 5 of the present invention. In the same manner as in other embodiments, this process is also implemented by the CPU 100 executing a program stored in a stipulated area.
  • the information block regarding “sex” as shown in FIG. 6 is selected as the first information block (Step 140 ), and the field value numbers of “0” corresponding to the field value of “Male” are detected from within the value control table of the specific information block (Step 142 ).
  • the count corresponding to the field value number of “0” is “632564” so the total number of males is determined to be 632564, and also, the start position corresponding to the field value number of “0” is “0” so the pointers to records wherein the sex is male are determined to be stored in the locations starting from the beginning until the 632564 th location, and thus a list of the pointers to these records, namely, an array of record numbers is kept as the result set (Step 146 ).
  • the information block regarding “age” illustrated in FIG. 7 is selected as the second specific information block (Step 148 ) and from the array of pointers to the value control table of the second specific information block, the field value number corresponding to the record specified in the result set regarding the first specific information block is extracted (Step 150 ), and the field value related to the extracted field value number, namely the “age” is extracted (Step 152 ). Finally, find the total age by sequentially adding the extracted “age” values (Step 154 ), and repeat steps 150 , 152 and 154 until all of the specified records in the aforementioned result set are processed (Step 156 ). The total age thus obtained is divided by the count to find the average age (Step 158 ).
  • FIG. 18 is a conceptual explanatory diagram of Embodiment 6 of the present invention
  • FIG. 19 is a flowchart of the operation of Embodiment 6 of the present invention.
  • the tabulation is performed by first selecting the information block regarding “occupation” as the first information block (Step 170 ), and using the search condition of “occupation is student” to create from among all records a result set containing the records wherein the “occupation is student” (Step 172 ).
  • Step 174 select the information block regarding “sex” as the second information block and also select the information block regarding “age” as the third information block (Step 176 ), and sequentially extract pointers to records from the beginning of the result set (Step 176 ).
  • the array of pointers to the value control table of the second information block is accessed to get the sex corresponding to the extracted pointers to records, and also, the array of pointers to the value control table of the third information block is accessed to extract the age corresponding to the extracted pointers to records (Step 178 ).
  • Totals for both male and female sexes are incremented by 1 for each extraction, to calculate the total extracted age for both males and females (Step 180 ).
  • Step 182 A check is made as to whether or not all pointers to records of the result set have been processed (Step 182 ), and if all pointers to records have been processed, the average ages for both male and female students is calculated by dividing the total ages for males and females by the total number (Step 184 ).
  • Embodiment 7 of the present invention is implemented.
  • the processing program shown in FIG. 20 is also read and executed by the CPU 100 .
  • the respective value control tables and the field value number specifying information array namely, the array of pointers to the value control table, which express two pieces of individual field information, namely the first and second information blocks are kept in a storage device (Step 190 ).
  • the memory device may be implemented in the form of, for example, memory, virtual storage, memory-mapped file or the like.
  • the field value numbers q 1 and q 2 are extracted sequentially from the beginning of the array of pointers to the value control table, and these are used to identify a single element P(q 1 , q 2 ) in the two-dimensional array (Step 194 ), and then the value of the identified element P(q 1 , q 2 ) is incremented by 1 (Step 196 ).
  • tabulation is performed on the entire set of records in the table-format data of FIG. 1, but it is also possible to perform the same type of tabulation on a partial set of records, for example, tabulating a count of 16 year olds by sex/occupation.
  • a single-field search is first performed using the age of 16 as the search condition, and then the identifying information for records that match an age of 16 is acquired and kept.
  • Step 4904 a number that indicates the storage position of pointers in the result set (hereinafter referred to as the “storage position number” depending on the case) is initialized (Step 4904 ).
  • the CPU 100 extracts an array of pointers corresponding to the value control table in the first and second information block and identifies an element P(q 1 , q 2 ) within the two-dimensional (Step 4905 ), and next, the P(q 1 , q 2 ) is incremented (Step 4906 ).
  • Embodiment 8 of the present invention cross-tabulation is implemented in the situation wherein the field values of the field are divided into several categories, by counting the counts for each category of field value. For example, referring to the information block regarding “occupation” shown in FIG. 8, one sees that the four field values of “student,” “programmer,” “teacher” and “other” are registered for “occupation.” As the categories based on these field values, one can envision the case of recategorization into the three types of “income earner,” “non-income earner” and “unknown.” In this example, in this situation, a new category of “presence of income” is created to create a cross-tabulation of counts depending on sex/presence of income.
  • the information block regarding “occupation” shown in FIG. 21 includes a value control table wherein category numbers are applied to each field value number based on the “presence of income” in particular.
  • category numbers are applied to each field value number based on the “presence of income” in particular.
  • students are assigned a category number of “1” (non-income earner), while programmers and students are assigned a category number of “0” (income earner), and “other” is assigned a category number of “2” (unknown).
  • the cross-tabulation in Embodiment 8 of the present invention has a process sequence roughly the same as that of the cross-tabulation in Embodiment 7, but it differs in the point that it uses as the coordinates that specify the element of the two-dimensional array to store the tabulation data, the field value number of the first information block regarding sex and the field value number of the second information block regarding occupation.
  • the respective field value numbers stored in each array of pointers to the value control table are extracted sequentially, and the coordinates of the element P in the two-dimensional array is identified based on either the field value number itself extracted from the array of pointers to the value control table or the category number stored in the value control table corresponding to the field value number.
  • the information block on “sex” is used as the first information block
  • the information block on “occupation” is used as the second information block (see Step 190 of FIG. 20 ).
  • Step 191 since the information block on “sex” contains two field value numbers and the information block on “occupation” contains three category numbers, an initialized 2 ⁇ 3 (2 row by 3 column) two-dimensional array is generated.
  • the field value number q 1 of the first block and the category number q 2 of the second block are extracted, so these are used to identify a single element P(q 1 , q 2 ) and then the value of the element P is incremented (see Steps 194 and 196 ).
  • cross-tabulation according to Embodiments 7 and 8 described above is particularly tabulation in the form of finding counts, but note that the present invention may also be expanded to cross-tabulation wherein an average age is found depending on multiple fields (e.g., by sex/by occupation).
  • cross-tabulation of the aforementioned type is performed.
  • Embodiment 9 of the present invention 2 two-dimensional arrays are used for tabulation, so regarding the first two-dimensional array, the counts by sex/by occupation are counted in the same manner as the aforementioned Embodiment 7, and regarding the second two-dimensional array, the total age by sex/by occupation are calculated.
  • a first, second and third information block for the three fields of sex, occupation and age are loaded into the storage device (Step 200 ).
  • an initialized 2 ⁇ 4 (2 row by 4 column) two-dimensional array for storing tabulation data is created (Step 202 ).
  • the field value numbers q 1 , q 2 and q 3 are extracted sequentially to identify the coordinates (q 1 , q 2 ) of an element of the two-dimensional array (Step 204 ) and then the value of the element P 1 (q 1 , q 2 ) of the first two-dimensional array thus identified is incremented by 1 each (Steps 206 ).
  • the field value corresponding to the field value number of q 3 (namely, the age) is acquired (Step 208 ), and the acquired age is added to the element P 2 (q 1 , q 2 ) of the identified second two-dimensional array (Step 210 ).
  • Step 212 a check is made as to whether or not all subject records have been processed (Step 212 ), and if not, then control returns to Step 204 , but if so, then the operation P 2 (q 1 , q 2 )/P 1 (q 1 , q 2 ) is performed among the various elements of two-dimensional array P 1 and two-dimensional array P 2 (Step 206 ). Thereby, the average age by sex/by occupation is obtained and a cross-tabulation table of averages is created.
  • FIG. 23A is a conceptual explanatory diagram of a cross-tabulation table obtained in the aforementioned Embodiment 7 of the present invention.
  • counts for all combinations of sex/occupation are tabulated.
  • FIG. 23B among the by sex/by occupation categories, there may be cases wherein one wishes to know in particular the count of only those persons having a sex of female and occupation of student.
  • the count in this case is obtained by finding the size of the result set from a search of an AND of the multiple fields of “female” AND “student.”
  • a cross-tabulation table of the average age is found for all combinations of sex/occupation, but it is also possible to find in particular the average age of only those persons having a sex of female and occupation of student.
  • the count is found from the size of the result set from a search of an AND of the multiple fields of “female” AND “student,” and the total of ages is found by adding the ages belonging to records specified by the identifying information for records contained in the result set, and by calculating the fraction (total of ages)/(count), it is possible to find the desired value (e.g., the average age) regarding a specific cell in the cross-tabulation table for average age.
  • FIG. 24 is a diagram illustrating multi-answer type fields
  • FIG. 25 is an explanatory diagram of an information block of a type compatible with multi-answer type fields according to Embodiment 10 of the present invention.
  • “Multi-answer” refers to the situation wherein, for example, when answers are obtained to the question “What kinds of writing implements are now on the table?” then multiple answers such as “pencil, eraser” or “paper, pencil” are obtained from the same person. To wit, in the case of multi-answer, it is possible to specify multiple field values for a single field of a single record.
  • FIG. 24 shows a list of the responses to the aforementioned question obtained from 1 million people, given as is.
  • the array of pointers to the value control table of the information block differs from the array of field value numbers itself as described above, but rather 1 bit is allocated to each field value number in the pointers in the array. Therefore, it is possible to indicate whether or not a record specifies that field value number by means of turning bits on/off (namely, a binary number). Thereby, it is possible to specify multiple field values contained in a single field in a single record. For example, in FIG.
  • the pointers (bit pointers) within the array of pointers are 4-bit in size, and when the highest bit is on (namely, “1”), this means that the response of “paper” is included, when the second bit is on the response of “ruler” is included, and when the third bit is on this means the response of “eraser” is included. Moreover, when the lowest bit is on, this means the response of “pencil” is included.
  • the pointer corresponding to record number “0” has the value “3.” This can be considered to be “2 1 +2 0 .” Therefore, this can be understood as the responses of “pencil” and “eraser” being included corresponding to this record number.
  • the pointers corresponding to record number “1” and record number “2” have the values “4” and “10,” respectively, and these can be considered to be “2 2 ” and “ 2 3 +2 1 .” Therefore, one can thus know that the responses corresponding to these record numbers include “ruler” along with “eraser” and “paper,” respectively.
  • each bit in the pointer value is given meaning so a plurality of field value numbers can be indicated. Therefore, even in the case in which a record has a plurality of field values, this can be expressed by means of the pointer value.
  • the present invention has an advantage in that it can be easily adapted to a multi-answer situation by simply modifying the constitution of one portion of the information block.
  • an information block thus modified can be used to replace the information blocks adopted in the various aforementioned embodiments of the present invention.
  • FIG. 26 is an explanatory diagram illustrating the method of handling blanks, error values and other special values that occur during tabulation processing, according to Embodiment 11 of the present invention.
  • cross-tabulation is executed by taking blanks to be one category.
  • blanks or log( ⁇ 1) or other mathematical errors there may be cases in which blanks or log( ⁇ 1) or other mathematical errors appear.
  • special values blanks, errors, etc.
  • it has the advantage in that they are registered in the value control table as field values, and the registered special values can be used as is as categories for searches or tabulation.
  • Embodiment 12 of the present invention we shall describe delay evaluation using the flowchart of the operation of the method of searching upon multiple fields according to Embodiment 12 of the present invention shown in FIG. 27 .
  • this embodiment in the same manner as in Embodiment 2 of the present invention, we shall consider the case of obtaining a set of records that satisfy both the first search condition of the “age” being “16” or “19” and the second search condition of the “occupation” being “Student.”
  • the category numbers for all records are set in advance for all records (Step 124 of FIG. 12 ), but in the case of Embodiment 12, the setting of the category numbers is performed for only the category numbers corresponding to the field value numbers actually accessed based on the result set from the search on the first search condition.
  • Step 220 a result set of records wherein the “age” is “16” or “19” is obtained according to Embodiment 1 of the present invention.
  • the second specific information block which is the information block regarding “occupation” which is the second field shown in FIG. 8 is selected (Step 222 ), and the value of all category numbers in the value control table of the second specific information block is initialized to “ ⁇ 1” for example (Step 224 ).
  • Step 226 record numbers that represent pointers to records are extracted sequentially.
  • the record number “999999” is extracted, for example.
  • Step 228 extract from the array of pointers to the value control table the field value number corresponding to the record number obtained under for the aforementioned first search condition.
  • the field value number of “0” corresponding to the record number of “999999” is extracted, for example.
  • Step 230 a check is made as to whether the value of the category number corresponding to the field value number extracted with respect to the second specific information block is “ ⁇ 1” or not.
  • the category number is “ ⁇ 1,” this means that the category number has not yet been set for that field value number, so a decision is made as to whether or not the field value corresponding to this field value number matches the aforementioned second search condition (Step 232 ), and if it matches then the category number is set to “1” (Step 234 ), but if it does not match, then the category number is set to “0” (Step 236 ).
  • Step 238 a decision is made as to whether or not the value of the category number corresponding to the field value number extracted above is set to “1” (Step 238 ). If the value of the category number is set to “1,” then add to the final result set a pointer to the record, e.g. the record number, corresponding to the location within the array of pointers to the value control table at which is stored a pointer which indicates the field value number in which the category number is set to “1” (Step 240 ). In this example, as shown in FIG. 14, record number “999999” for example, is added to the final result set. If the category number is “0” then the final result set is not updated.
  • the delay evaluation as shown in the aforementioned Embodiment 12 is effective in the following types of cases. For example, consider the case in which a customer database of 1 million people exists and one wishes to implement a telephone survey, and thus extract a sample of 100 people. For example, when the people are narrowed down to those who satisfy stipulated conditions (sex, age, occupation, etc.) one can come up with 10,000 people, and then in order to ensure randomness, a search is performed based on the numbers (e.g., “12”) at the end of the telephone number.
  • Embodiment 12 first the elements of the “category number array” are filled with “ ⁇ 1” to evaluate only the aforementioned set of 10,000 people. Namely, for the result set of a size of 10,000 people, the elements of the category number array are accessed and if the element is “ ⁇ 1” then and only then the telephone number is accessed and the results of the access are given as elements of the “category number array.” Thereby, it is possible to keep the number of checks down to 10,000. In this manner, by means of Embodiment 12, it is possible to reduce the number of processing steps greatly in comparison to an ordinary AND search.
  • data that has a structure like that of a telephone number consisting of the “country code+area code+central office code+number” can be divided and registered in multiple information blocks, and this has an advantage in that searching and tabulation regarding a country code, area code or other partial data can be performed easily.
  • the apparatus for implementing searching and tabulating according to embodiments of the present invention is implemented by means of an ordinary computer system shown in FIG. 28, for example a personal computer including a CPU 100 , ROM 110 , RAM 120 , a hard disk 130 , a display 140 or other output device, and a keyboard/mouse 150 or other input device 150 connected to each other via a bus 160 .
  • the program for constructing the information block for implementing the aforementioned embodiment may also be recorded on CD-ROM, ROM 110 or the hard disk storage device 130 , or may be supplied from outside via a network (not shown).
  • Step 300 Data Preparation
  • FIG. 2B data of the format shown in FIG. 2B is prepared. Next, this is divided by field. In FIG. 2B, it can be divided into the fields of “sex,” “age” and “occupation.”
  • Step 311 Generation of the Information Block for the “Sex” Field
  • Step 312 Generation of the Value Control Table
  • the field values (“female” and “male”) within the array of field values 11 are sorted according to a stipulated basis. Naturally, at the time of this sort, the array of counts 14 must also be reordered with the sorting of the array of field values 11 .
  • start position in the array of start positions 13 of the value control table is found as the cumulative total of counts corresponding to the start position from the first count in the array of counts 14 within the value control table. Naturally, the value of the first start position is “0.”
  • the array of category numbers 12 is used later as a work area at the time of creating the array of pointers to records.
  • Step 313 Creation of the Array of Pointers to the Value Control Table
  • Step 314 Creation of the Array of Pointers to Records
  • the size of the storage area is the total of the counts in the aforementioned array of counts 14 .
  • the array of pointers to the value control table 20 from the beginning row to the ending row, extract one pointer to the value control table at a time. Extract the J th value of the array of pointers to the value control table 20 , and assuming its value is “K” then extract the category number corresponding to the K+1 th record of the value control table, and assuming its value is “L” then store “J-1” in the L+1 th element of the array of pointers to records 30 , and increment by 1 the category number corresponding to the K+1 th record of the value control table.
  • Step 310 Information blocks for the “age” field and “occupation” field can be created in the same manner (Step 320 and Step 330 ), and thus information blocks for the entire table-format data are obtained.
  • FIGS. 30 through 35 are explanatory diagrams for the procedure of creating the information block regarding “occupation” in the table-format data shown in FIG. 1 .
  • FIG. 30 is a diagram illustrating populating with new data in the case in which categories are already defined and the types of attribute values are known in advance.
  • the value control table is created according to known category definitions. Since the start positions and counts are unknown, these are initialized to “0.” In addition, storage areas are allocated for the array of pointers to the value control table and array of pointers to records and these are similarly initialized.
  • FIG. 31 shows the pass in which the array of pointers to the value control table and the counts in the value control table are completed.
  • the data to be populated is taken one item at a time starting from the beginning and its value examined as to which item (namely, which field value number) in the value control table it matches, and then it is stored in the array of pointers to the value control table and the corresponding count in the value control table is updated by “+1” at a time.
  • the example of FIG. 31 shows the state after the processing of the second item of data to be populated is complete.
  • FIG. 32 shows the second pass for completing the value control table.
  • the accumulation of counts uses the correspondence to the start position to find the start positions. Moreover, the value of the start position is copied to the category number. In the figure, the setting of the category number is complete.
  • FIGS. 33-35 show the third pass of data population.
  • one value (pointer) at a time is taken from the beginning of the array of pointers to the value control table, and the offset in the array of pointers to the value control table, namely the record number, is stored at the position in the array of pointers to records specified by the category number within the value control table referenced by that value.
  • FIGS. 33, 34 and 35 respectively show the processing of the first, second and last pieces of data of the array of pointers to the value control table of the information block regarding “occupation.”
  • the field of the category number is used as a work area, but any array that is an array of integers with a number of elements equal to or greater than the number of rows in the value control table, namely the total number of field value numbers, can also be used as the work area.
  • the population with new data in the case that categories are not defined in advance is implemented by scanning the data to be populated and acquiring a list of values to be registered in the value control table and then, performing the aforementioned process of population with new data in the case that the categories are defined.
  • FIG. 36 is an explanatory diagram for this addition of a record.
  • the field value number 0 which indicates “Student” is added to the end of the array of pointers to the value control table, and then the count of students in the value control table is increased by “1.”
  • it is necessary to allocate space for storing the record number, namely the value of the pointer to the value control table ( 1000000), within the array of pointers to records.
  • the value at the end of the array of pointers to records corresponding to “Student” is extracted and “1000000,” which is the expansion address, is stored.
  • FIG. 37 illustrates the structure of an information block according to another embodiment of the present invention.
  • the array of starting locations contains addresses that indicate the beginning of the area where the array of pointers to records is disposed. For example, “0” is stored as the start position for the field value of “Student.” On the other hand, for the field value of “Programmer,” the value of “n (where n>455214)” is allocated as the start position.
  • FIG. 38 illustrates the initial state of sorting records on the “occupation” field.
  • the raw data shown in this figure shows the array of record numbers to be sorted.
  • an array of pointers to records obtained for fields other than “occupation,” or a result set from a search can be used as the raw data.
  • the record numbers of the raw data are arranged in the order “0, 1, 2, . . . , 9” but one must note that the order of record numbers prior to the sort will generally be random.
  • the field values in the “occupation” field corresponding to each record number are arranged in the order “Teacher, Programmer, Student, . . . , Other.”
  • the information block regarding “occupation” is created by the information block construction method explained with reference to FIGS. 29-35.
  • the value control table and array of pointers to the value control table of an information block regarding “occupation” prepared in advance is used.
  • the start position the start positions set at the time of construction of the information block are used as is.
  • the start positions are copied to the corresponding end positions.
  • the area to contain the end position may be, for example, the area allocated for the count (the count array).
  • the array of pointers to the value control table may be prepared in advance in record number order, for example.
  • the array of record numbers of the raw data are in descending order, so the connection between the record numbers of the raw data and the array of pointers to the value control table exhibits a simple relationship.
  • the array of pointers to the value control table is an array for storing the sorted result set, so an area of the same size as the data to be sorted is allocated.
  • the aforementioned end positions are used as an array for storing the sorted results in the array of pointers to records.
  • FIG. 39 is an explanatory diagram illustrating the first step of sorting according to Embodiment 13 of the present invention.
  • the field value of the “occupation” field of the record with record number “0” is “Teacher.”
  • the field value number “2” which specifies a field value of “Teacher” is stored in the array of pointers to the value control table corresponding to record number “0.”
  • the value “5” of the end position corresponding to the field value number of “2” is extracted and this value of “5” is used as an address to set this record number “0” in the 5 th position of the array of pointers to records where the sorted result set is stored.
  • the value of the end position corresponding to this field value number of “2” is incremented by “+1,” so “5” is increased to “6.”
  • FIG. 40 is an explanatory diagram illustrating the second step of sorting according to Embodiment 13 of the present invention.
  • the field value of the “occupation” field of the record with record number “1” is “Programmer.”
  • the field value number “1” which specifies a field value of “Programmer” is stored in the array of pointers to the value control table corresponding to record number “1.”
  • the value “3” of the end position corresponding to the field value number of “1” is extracted and this value of “ 3 ” is used as an address to set this record number “1” in the 3 rd position of the array of pointers to records where the sorted result set is stored.
  • FIG. 41 shows the final state of the sort thus obtained.
  • the sort according to Embodiment 13 of the present invention results in the records being sorted in the order of the “occupation” field value numbers, being reordered into the order of “2, 4, 6, 1, 7, 0, 3, 5, 8, 9” by record number.
  • FIG. 42 is a diagram illustrating the state of completion of the aforementioned sorting on a partial set.
  • the raw data given consists of a record with a record number of “0” in which the field value of the “occupation” field is “Teacher,” and a record with a record number of “1” in which the field value is “Programmer.”
  • sort results as shown by the sorted result set in this figure are obtained.
  • the result set is contained in the array of pointers to records. Therefore, an area of the same size as the entire set is allocated to store the sort results from a partial set.
  • FIG. 43 shows the post-processing for this sorting on a partial set.
  • This post-processing namely the compression of the result set, comprises taking the difference between the start position and end position for each field value in the value control table, and extracting the count and storage position in the sort results corresponding to the field value in question, and then arranging the sort results based on the extracted count and storage position.
  • the raw data is considered to have storage position numbers attached in order starting from the beginning.
  • the storage position numbers match the record numbers.
  • the storage location number differs from the record number.
  • the storage location number is initialized (Step 5001 ).
  • the corresponding pointer within the array of pointers to the value control table is accessed for a certain storage location number (Step 5002 ), and then identify the value of the end position where the field value number specified by the pointer is positioned (Step 5003 ).
  • Step 5004 the corresponding record number is stored at the position within the array of pointers to records identified by the aforementioned end position.
  • Step 5005 the value of the end position identified in Step 5003 is incremented.
  • the processing in the aforementioned Steps 5002 through 5005 is performed for all of the raw data (see Steps 5006 and 5007 ), and thereby, it is possible to obtain an array of pointers to records containing the stipulated record numbers.
  • FIG. 39 corresponds to Step 5002 through Step 5005 in the case that the storage position number is “0,” and one can also see that FIG.
  • Step 40 corresponds to Step 5002 through Step 5005 in the case that the storage position number is “1.”
  • Step 5008 it is sufficient to compress the result set by means of the sort post-processing (see Step 5008 ).
  • the sort according to the aforementioned Embodiment 13 of the present invention is a so-called “ascending order” sort, namely a sort wherein the sort results are arranged in order of increasing field value numbers of the sorted field values.
  • the sort results may also be arranged in “descending order,” wherein they are arranged in order of decreasing field value numbers of the sorted field values.
  • the “descending order” sort is implemented by modifying the start position used in the case of an “ascending order” sort.
  • the starting positions for an “ascending order ” sort are as follows:
  • a sort performed according to Embodiment 13 of the present invention as such has the following advantages.
  • a high-speed sort is achieved.
  • the novel sort according to the present invention achieved a sort time for 1 million of 145 ms.
  • the time required to sort 1 million integers was 1530 ms.
  • the sorting speed does not drop even if the data size increases.
  • the sorting speed is expressed by O(n) where n is the data size.
  • the sorting speed is O(n ⁇ log(n)), for example.
  • sorts on multiple fields can be divided into sorts on each field.
  • records corresponding to a field value of “Student” are arranged in the order record number “2,” record number “4,” and record number “6.”
  • this order of record numbers (namely, record number “2,” record number “4,” and record number “6”) is preserved in the final sorted result set.
  • the order of records in the sorted result reflects the order of records prior to sorting within the scope of satisfying the purpose of the sort.
  • a sort on multiple fields can be achieved by performing sequential sorts on each individual field.
  • the state prior to sorting is known not to be reflected in the order of the sort results.
  • the value control table contains a value list of the field values.
  • the field value column contains a list including the values “16,” “17,” “18,” . . . .
  • the value control table includes category numbers set for each field value number.
  • the value list find the smallest value that does not satisfy the condition (in this example, 100). Then, set “0” as the category number for all the values in the value list before the smallest value, namely “100.” In addition, set “1” as the category number for all values in the value list after “100.” Thereby, if the smallest value can be found, thereafter, the category number is set without performing any comparison operations, and thus a field value or field value number having a value satisfying the search conditions can be obtained.
  • the condition in this example, 100.
  • the smallest value can be found by a small number of comparison operations. For example, if there are N variations in the values present in the value list, then the number of comparison operations required to find the aforementioned smallest value is roughly log 2 (N).
  • the content of the aforementioned value list and the search conditions are no more than a single specific example for explaining this example, and according to the present invention, the determination of whether or not the stipulated search conditions are met can be speeded up for various value lists and for various combinations of search conditions.
  • FIG. 44 is a table showing the data used in the tests.
  • the data consisted of one million numbers in the range from “000000” to “999999” in the form of table-format data divided into three fields consisting of the 0,000's unit, the 100's unit and the 1's unit. Field values in the range from “00” to “00” appear 10,000 times apiece for each field.
  • FIG. 45 is a list of the test results showing the time required to search for/tabulate 1 million records, measured depending on the result set type.
  • the result set type is one two types, namely the aforementioned bit flag type and array of pointers type.
  • the times in the test results are given in units of milliseconds (ms; ⁇ fraction (1/1000) ⁇ seconds).
  • the search performed in the aforementioned test is a search of an AND of multiple fields, by connecting the three fields of “ ⁇ 10,0000,” “ ⁇ 100” and “ ⁇ 1” with an AND condition.
  • the search was a cascade of the fields “ ⁇ 10,000,” “ ⁇ 100” and “ ⁇ 1” in this order.
  • the intermediate and final result sets from the search take the form of a bit flag or array of pointers as described above.
  • the measured times are given as the average of five measurements.
  • the tabulation in these tests consists of counting the number of times the various values (00 through 99) of the “ ⁇ 100” and “ ⁇ 1” fields appear in the result sets obtained from the search tests.
  • the constitution of the searching and tabulating system for table-format data is in no way limited to the examples described in the aforementioned embodiments, but rather the various constituent elements of the searching and tabulating system may be implemented in software (program), stored on a disk device or the like and, if necessary, the searching and tabulating system can be installed on a computer to perform the searching and tabulating of table-format data.
  • the program thus implemented may be stored on a floppy disk or CD-ROM or other portable storage medium, and can be used in a general purpose fashion in a situation in which such a system is used.
  • a two-dimensional array was generated in order to perform a tabulation on two fields, but this is not a limitation, as it is possible to generate a three-dimensional or higher-dimensional array in order to perform a tabulation on three or more fields, and it goes without saying that these can be used to perform the aforementioned tabulation.
  • field value numbers q 1 , q 2 , q 3 in each of the three information blocks arc extracted, and this is used to identify one element P(q 1 , q 2 , q 3 ) in the three-dimensional array.
  • Embodiment 9 it goes without saying that it is possible to performed tabulation on three or more fields in the same manner as in Embodiment 7 and Embodiment 8.
  • the present invention is in no way limited to this, but rather it goes without saying that it may be constituted such that a board computer used exclusively for data processing is connected to a personal computer or other ordinary computer system, and this board computer can execute the aforementioned processing. Therefore, in this s pacification, the word means does not necessarily mean a physical means, but rather it includes the case in which the functions of the various means are implemented by software and the case in which some or all of the functions are implemented by hardware.
  • the functions of a single means may be implemented by two or other physical means, or the functions of two or more means may be implemented by one physical means. According to the aforementioned description, by means of the present invention, it is possible to process large amounts of data expressed in table format without using the conventional data tables which required long access times, so the speed of tabulating and searching. can be greatly increased.
  • the present invention is particularly suited for use in systems that handle large amounts of data, for example, databases and data warehouses. More specifically, it is suited to large-scale scientific and technical calculation, control systems for plants and power supply and the like, methods of planning of delivery and resource distribution, and to order management and the management of clerical work such as securities trading.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
US09/762,584 1998-08-11 1999-08-09 Method and apparatus for retrieving accumulating and sorting table formatted data Ceased US6643644B1 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
JP22727898 1998-08-11
JP10/227278 1998-08-11
JP33813398 1998-11-27
JP10/338133 1998-11-27
PCT/JP1999/004300 WO2000010103A1 (fr) 1998-08-11 1999-08-09 Procede et dispositif de recuperation, de stockage et de triage de donnees formatees en tableaux

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/142,045 Reissue USRE41901E1 (en) 1998-08-11 1999-08-09 Method and apparatus for retrieving accumulating and sorting table formatted data

Publications (1)

Publication Number Publication Date
US6643644B1 true US6643644B1 (en) 2003-11-04

Family

ID=26527589

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/142,045 Expired - Lifetime USRE41901E1 (en) 1998-08-11 1999-08-09 Method and apparatus for retrieving accumulating and sorting table formatted data
US09/762,584 Ceased US6643644B1 (en) 1998-08-11 1999-08-09 Method and apparatus for retrieving accumulating and sorting table formatted data

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/142,045 Expired - Lifetime USRE41901E1 (en) 1998-08-11 1999-08-09 Method and apparatus for retrieving accumulating and sorting table formatted data

Country Status (7)

Country Link
US (2) USRE41901E1 (fr)
EP (1) EP1136918A4 (fr)
JP (1) JP3581831B2 (fr)
KR (1) KR100688121B1 (fr)
CN (1) CN1194319C (fr)
CA (1) CA2340008C (fr)
WO (1) WO2000010103A1 (fr)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027790A1 (en) * 2003-07-28 2005-02-03 Alan Dziejma System and method for an intelligent blotter engine
US20050108184A1 (en) * 2001-11-09 2005-05-19 Turbo Data Laboratories, Inc Data joining/displaying method
US20050125446A1 (en) * 2003-12-03 2005-06-09 Roy Schoenberg Range definition method and system
US20050165819A1 (en) * 2004-01-14 2005-07-28 Yoshimitsu Kudoh Document tabulation method and apparatus and medium for storing computer program therefor
US20060136408A1 (en) * 2004-11-15 2006-06-22 Charles Weir Searching for and providing objects using byte-by-byte comparison
US20060230020A1 (en) * 2005-04-08 2006-10-12 Oracle International Corporation Improving Efficiency in processing queries directed to static data sets
US20070106666A1 (en) * 2005-11-10 2007-05-10 Beckerle Michael J Computing frequency distribution for many fields in one pass in parallel
US20070282808A1 (en) * 2006-06-05 2007-12-06 Fujitsu Limited Search processing method and apparatus
EP1901183A1 (fr) * 2005-05-24 2008-03-19 Turbo Data Laboratories, Inc. Systeme multiprocesseur et son procede de traitement d informations
US20100179963A1 (en) * 2009-01-13 2010-07-15 John Conner Method and computer program product for geophysical and geologic data identification, geodetic classification, and organization
US20100180057A1 (en) * 2009-01-09 2010-07-15 Yahoo! Inc. Data Structure For Implementing Priority Queues
US20120203794A1 (en) * 2011-02-09 2012-08-09 Roulei Zhang Efficiently delivering event messages using compiled indexing and paginated reporting
US20140136548A1 (en) * 2009-01-22 2014-05-15 American Express Travel Related Services Company, Inc. Method and system for ranking multiple data sources
WO2015007175A1 (fr) * 2013-07-18 2015-01-22 International Business Machines Corporation Analyse de sujet de données tabulaires
US20150278268A1 (en) * 2014-03-25 2015-10-01 Mohamad El-Ali Data encoding and corresponding data structure
US9892107B2 (en) 2013-07-31 2018-02-13 International Business Machines Corporation Associating mentioned items between documents
US11972865B1 (en) * 2012-07-25 2024-04-30 Azad Alamgir Kabir High probability differential diagnoses generator and smart electronic medical record

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4563558B2 (ja) 2000-07-31 2010-10-13 株式会社ターボデータラボラトリー データのコンパイル方法、および、コンパイル方法を記憶した記憶媒体
GB0100331D0 (en) * 2001-01-06 2001-02-14 Secr Defence Method of querying a structure of compressed data
JP3861044B2 (ja) * 2002-10-24 2006-12-20 株式会社ターボデータラボラトリー 連鎖したジョインテーブルのツリー構造への変換方法、および、変換プログラム
JP4136594B2 (ja) 2002-10-25 2008-08-20 株式会社ターボデータラボラトリー データ処理方法およびデータ処理プログラム
JP4511464B2 (ja) * 2003-04-16 2010-07-28 株式会社ターボデータラボラトリー 情報処理システムおよび情報処理方法
WO2005041066A1 (fr) * 2003-10-24 2005-05-06 Shinji Furusho Systeme de traitement d'informations du type memoire distribuee
WO2005041067A1 (fr) * 2003-10-27 2005-05-06 Shinji Furusho Systeme de traitement d'informations du type memoire distribuee
JP2005135221A (ja) * 2003-10-31 2005-05-26 Turbo Data Laboratory:Kk 表形式データの結合方法、結合装置およびプログラム
US20080281843A1 (en) * 2003-12-25 2008-11-13 Turbo Data Laboratories, Inc. Distributed Memory Type Information Processing System
WO2005073880A1 (fr) * 2004-01-29 2005-08-11 Shinji Furusho Système de traitement des informations de type mémoire distribuée
JP3935889B2 (ja) * 2004-02-27 2007-06-27 シャープ株式会社 データ処理装置、データ処理方法、データ処理プログラム、およびデータ処理プログラムを記録した記録媒体
WO2005106713A1 (fr) * 2004-04-28 2005-11-10 Shinji Furusho Procédé de traitement d'informations et système de traitement d'informations
JP2007034878A (ja) * 2005-07-29 2007-02-08 Turbo Data Laboratory:Kk 情報処理方法、情報処理装置および情報処理プログラム
WO2007020849A1 (fr) 2005-08-15 2007-02-22 Turbo Data Laboratories Inc. Système multiprocesseur de type mémoire partagée et méthode de traitement d’information de celui-ci
JP4005108B2 (ja) * 2006-09-26 2007-11-07 富士通株式会社 表示制御プログラムおよび記録媒体
JP2008250546A (ja) * 2007-03-29 2008-10-16 Fujitsu Broad Solution & Consulting Inc データ検索方法、プログラム及び装置
JP5008720B2 (ja) * 2007-04-19 2012-08-22 株式会社ターボデータラボラトリー メモリ間接参照をメモリ直接参照に変換する方法及び装置
JP2009003605A (ja) * 2007-06-20 2009-01-08 Fujitsu Broad Solution & Consulting Inc データベース管理装置,データベースシステム及びデータベース管理プログラム
JP5392253B2 (ja) 2008-05-30 2014-01-22 日本電気株式会社 データベースシステム、データベース管理方法、データベース構造およびコンピュータプログラム
JP5392254B2 (ja) 2008-05-30 2014-01-22 日本電気株式会社 データベースシステム、データベース管理方法、データベース構造およびコンピュータプログラム
WO2010013320A1 (fr) * 2008-07-30 2010-02-04 株式会社ターボデータラボラトリー Procédé d'exploitation de données de forme tabulaire, multiprocesseur à mémoire distribuée et programme
CN101355560B (zh) * 2008-09-12 2011-12-14 深圳市联软科技有限公司 一种数据传输方法及系统
US8438173B2 (en) * 2009-01-09 2013-05-07 Microsoft Corporation Indexing and querying data stores using concatenated terms
JP5352310B2 (ja) * 2009-03-30 2013-11-27 株式会社日立製作所 バッチ処理実行システム及びその方法
CN101853263B (zh) * 2009-04-03 2012-09-19 鸿富锦精密工业(深圳)有限公司 资料结构化处理系统及方法
JP5499825B2 (ja) * 2010-03-29 2014-05-21 日本電気株式会社 データベース管理方法、データベースシステム、プログラム及びデータベースのデータ構造
CN101894133B (zh) * 2010-06-08 2011-12-07 用友软件股份有限公司 用于批量修改表单数据的方法和装置
EP2909747B1 (fr) * 2012-10-22 2019-11-27 Ab Initio Technology LLC Caractérisation de sources de données dans un système de stockage de données
KR102129643B1 (ko) 2012-10-22 2020-07-02 아브 이니티오 테크놀로지 엘엘시 소스 추적으로 데이터 프로파일링
EP3594821B1 (fr) 2014-03-07 2023-08-16 AB Initio Technology LLC Gestion d'opérations de profilage de données associées à un type de données
CN107066564B (zh) * 2017-03-31 2020-10-16 武汉斗鱼网络科技有限公司 一种基于安卓列表的数据处理方法及装置
CN107402978A (zh) * 2017-07-04 2017-11-28 第四范式(北京)技术有限公司 拼接数据记录的方法及装置
US11068540B2 (en) 2018-01-25 2021-07-20 Ab Initio Technology Llc Techniques for integrating validation results in data profiling and related systems and methods
CN110264331B (zh) * 2019-04-22 2023-01-17 创新先进技术有限公司 资金数据的分析方法、装置及设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5583962A (en) 1978-12-19 1980-06-24 Sharp Corp Data retrieving system
JPS63298626A (ja) 1987-05-29 1988-12-06 Matsushita Electric Ind Co Ltd デ−タベ−ス管理方法
JPH01219927A (ja) 1988-02-29 1989-09-01 Hitachi Ltd データベースの情報検索方式
JPH04128972A (ja) 1990-09-20 1992-04-30 Fujitsu Ltd ジョイン処理方式
JPH06282578A (ja) 1993-03-26 1994-10-07 Fujitsu Ltd 情報の抽出方法
US6513041B2 (en) * 1998-07-08 2003-01-28 Required Technologies, Inc. Value-instance-connectivity computer-implemented database
US6523027B1 (en) * 1999-07-30 2003-02-18 Accenture Llp Interfacing servers in a Java based e-commerce architecture
US6550057B1 (en) * 1999-08-31 2003-04-15 Accenture Llp Piecemeal retrieval in an information services patterns environment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03252858A (ja) * 1990-03-02 1991-11-12 Nippon Telegr & Teleph Corp <Ntt> 日本語長文検出装置
US5204958A (en) * 1991-06-27 1993-04-20 Digital Equipment Corporation System and method for efficiently indexing and storing a large database with high data insertion frequency
JPH07319924A (ja) * 1994-05-24 1995-12-08 Matsushita Electric Ind Co Ltd 手書き電子文書のインデックス付けおよび探索方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5583962A (en) 1978-12-19 1980-06-24 Sharp Corp Data retrieving system
JPS63298626A (ja) 1987-05-29 1988-12-06 Matsushita Electric Ind Co Ltd デ−タベ−ス管理方法
JPH01219927A (ja) 1988-02-29 1989-09-01 Hitachi Ltd データベースの情報検索方式
JPH04128972A (ja) 1990-09-20 1992-04-30 Fujitsu Ltd ジョイン処理方式
JPH06282578A (ja) 1993-03-26 1994-10-07 Fujitsu Ltd 情報の抽出方法
US6513041B2 (en) * 1998-07-08 2003-01-28 Required Technologies, Inc. Value-instance-connectivity computer-implemented database
US6523027B1 (en) * 1999-07-30 2003-02-18 Accenture Llp Interfacing servers in a Java based e-commerce architecture
US6550057B1 (en) * 1999-08-31 2003-04-15 Accenture Llp Piecemeal retrieval in an information services patterns environment

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
"A Database Operation Processor: DBE", S. Matsuda et al., vol. 33, No. 12, Dec. 1992, pp. 1424-1430.
"Introduction of Multi-Dimensional Database", S. Tanaka et al., pp. 1-7, 110-1 (Oct. 24, 1996).
C&C Information Technology Lab., vol. 88, No. 127, Jul. 22, 1988, pp. 25-32, "A Retrieval Interface Based on Montage Method for Statistical Summary Databases", Jun' ichi Ueno.
Campbell et al., Video decimator design using a systolic array, Circuits and Systems, 1993, ISCAS '93, 1993 IEEE International Symposium on, May 3-6, 1993, pp. 1726-1729, vol. 3.* *
Elsherbeni et al., Visualization of two and three dimensional antenna patterns, Southeastcon '94, 'Creative Technology Transfer-A Global Affair'., Proceedings of the 1994 IEEE, Apr. 10-13, 1994, pp. 270-272.* *
Hirayama, A framework for forms processing using an enhanced-line-shared-adjacent format, Document Analysis and Recognition, 1999, ICDAR '99, Proceedings of the Fifth International Conference on, Sep. 20-22, 1999, pp. 103-106.* *
Inst. of Electronics, Information and Communication Engineers, Sybase IQ, vol. 97, No. 415, Dec. 2, 1997, pp. 51-56, "The Approach to the Data Warehouse by the Original Data Structure", Masayuki Unoki.
Journal of IPSJ, vol. 38, No. 9, Sep. 1997, pp. 745-750, "Data Warehouse and Multi-Dimensional Database", S. Tanaka.
The Institute of Electronics, Information and Communication Engineers, Technical Report of IEICE, A197-49, DE97-82 (Dec. 1997), pp. 25-30, "Storing a Large Time Sequenced Data within Disks", H. Sakai et al.

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108184A1 (en) * 2001-11-09 2005-05-19 Turbo Data Laboratories, Inc Data joining/displaying method
US7184996B2 (en) * 2001-11-09 2007-02-27 Turbo Data Laboratories, Inc. Method for concatenating table-format data
US20050027790A1 (en) * 2003-07-28 2005-02-03 Alan Dziejma System and method for an intelligent blotter engine
US8549031B2 (en) * 2003-12-03 2013-10-01 Trizetto Corporation Range definition method and system
US20050125446A1 (en) * 2003-12-03 2005-06-09 Roy Schoenberg Range definition method and system
US7774377B2 (en) * 2003-12-03 2010-08-10 The Trizetto Group, Inc. Range definition method and system
US20100281050A1 (en) * 2003-12-03 2010-11-04 Roy Schoenberg Range Definition Method and System
US20050165819A1 (en) * 2004-01-14 2005-07-28 Yoshimitsu Kudoh Document tabulation method and apparatus and medium for storing computer program therefor
US20060136408A1 (en) * 2004-11-15 2006-06-22 Charles Weir Searching for and providing objects using byte-by-byte comparison
US8176038B2 (en) 2004-11-15 2012-05-08 Zi Corporation Of Canada, Inc. Organizing pointers to objects
US8161020B2 (en) 2004-11-15 2012-04-17 Zi Corporation Of Canada, Inc. Searching for and providing objects using byte-by-byte comparison
US20110161363A1 (en) * 2004-11-15 2011-06-30 Zi Corporation Of Canada, Inc. Organizing pointers to objects
US7890492B2 (en) * 2004-11-15 2011-02-15 Zi Corporation Of Canada, Inc. Organizing pointers to objects in an array to improve the speed of object retrieval
US20060173807A1 (en) * 2004-11-15 2006-08-03 Charles Weir Organizing pointers to objects
US7925617B2 (en) 2005-04-08 2011-04-12 Oracle International Corporation Efficiency in processing queries directed to static data sets
US7725468B2 (en) * 2005-04-08 2010-05-25 Oracle International Corporation Improving efficiency in processing queries directed to static data sets
US20060230020A1 (en) * 2005-04-08 2006-10-12 Oracle International Corporation Improving Efficiency in processing queries directed to static data sets
US20100312802A1 (en) * 2005-05-24 2010-12-09 Turbo Data Laboratories, Inc. Shared-memory multiprocessor system and method for processing information
EP1901183A4 (fr) * 2005-05-24 2010-01-13 Turbo Data Lab Inc Systeme multiprocesseur et son procede de traitement d informations
US20080215584A1 (en) * 2005-05-24 2008-09-04 Shinji Furusho Shared-Memory Multiprocessor System and Method for Processing Information
US8065337B2 (en) 2005-05-24 2011-11-22 Turbo Data Laboratories, Inc. Shared-memory multiprocessor system and method for processing information
EP1901183A1 (fr) * 2005-05-24 2008-03-19 Turbo Data Laboratories, Inc. Systeme multiprocesseur et son procede de traitement d informations
US7801903B2 (en) 2005-05-24 2010-09-21 Turbo Data Laboratories, Inc. Shared-memory multiprocessor system and method for processing information
US20070106666A1 (en) * 2005-11-10 2007-05-10 Beckerle Michael J Computing frequency distribution for many fields in one pass in parallel
US7565349B2 (en) * 2005-11-10 2009-07-21 International Business Machines Corporation Method for computing frequency distribution for many fields in one pass in parallel
US20070282808A1 (en) * 2006-06-05 2007-12-06 Fujitsu Limited Search processing method and apparatus
US20100180057A1 (en) * 2009-01-09 2010-07-15 Yahoo! Inc. Data Structure For Implementing Priority Queues
US20100179963A1 (en) * 2009-01-13 2010-07-15 John Conner Method and computer program product for geophysical and geologic data identification, geodetic classification, and organization
US8402058B2 (en) * 2009-01-13 2013-03-19 Ensoco, Inc. Method and computer program product for geophysical and geologic data identification, geodetic classification, organization, updating, and extracting spatially referenced data records
US20140136548A1 (en) * 2009-01-22 2014-05-15 American Express Travel Related Services Company, Inc. Method and system for ranking multiple data sources
US9330146B2 (en) * 2009-01-22 2016-05-03 American Express Travel Related Services Company, Inc. Method and system for ranking multiple data sources
US9679020B2 (en) 2009-01-22 2017-06-13 American Express Travel Related Services Company, Inc. Assigning a regulated data source ranking for data fields
US20120203794A1 (en) * 2011-02-09 2012-08-09 Roulei Zhang Efficiently delivering event messages using compiled indexing and paginated reporting
US8738583B2 (en) * 2011-02-09 2014-05-27 Cisco Technology, Inc. Efficiently delivering event messages using compiled indexing and paginated reporting
US11972865B1 (en) * 2012-07-25 2024-04-30 Azad Alamgir Kabir High probability differential diagnoses generator and smart electronic medical record
WO2015007175A1 (fr) * 2013-07-18 2015-01-22 International Business Machines Corporation Analyse de sujet de données tabulaires
US20170075983A1 (en) * 2013-07-18 2017-03-16 International Business Machines Corporation Subject-matter analysis of tabular data
US9607039B2 (en) 2013-07-18 2017-03-28 International Business Machines Corporation Subject-matter analysis of tabular data
US10229154B2 (en) * 2013-07-18 2019-03-12 International Business Machines Corporation Subject-matter analysis of tabular data
US9892107B2 (en) 2013-07-31 2018-02-13 International Business Machines Corporation Associating mentioned items between documents
US20150278268A1 (en) * 2014-03-25 2015-10-01 Mohamad El-Ali Data encoding and corresponding data structure
US9870382B2 (en) * 2014-03-25 2018-01-16 Sap Se Data encoding and corresponding data structure

Also Published As

Publication number Publication date
CN1317117A (zh) 2001-10-10
CA2340008A1 (fr) 2000-02-24
JP3581831B2 (ja) 2004-10-27
USRE41901E1 (en) 2010-10-26
CN1194319C (zh) 2005-03-23
KR100688121B1 (ko) 2007-03-09
WO2000010103A1 (fr) 2000-02-24
EP1136918A1 (fr) 2001-09-26
EP1136918A4 (fr) 2006-03-29
KR20010085359A (ko) 2001-09-07
CA2340008C (fr) 2008-09-23

Similar Documents

Publication Publication Date Title
US6643644B1 (en) Method and apparatus for retrieving accumulating and sorting table formatted data
US5450580A (en) Data base retrieval system utilizing stored vicinity feature valves
US5710915A (en) Method for accelerating access to a database clustered partitioning
US5237678A (en) System for storing and manipulating information in an information base
US5394487A (en) Forms recognition management system and method
US5864857A (en) Method for processing multi-dimensional data
US6484168B1 (en) System for information discovery
US8171029B2 (en) Automatic generation of ontologies using word affinities
US8193954B2 (en) Computer product, information processing apparatus, and information search apparatus
US6785684B2 (en) Apparatus and method for determining clustering factor in a database using block level sampling
WO1992006440A1 (fr) Systeme et procede pour la recherche d&#39;informations
WO2008154029A1 (fr) Classification de données et groupement hiérarchisé
US6965898B2 (en) Information retrieval system, an information retrieval method, a program for executing information retrieval, and a storage medium wherein a program for executing information retrieval is stored
JP3143532B2 (ja) 画像検索装置及び方法
KR20020009583A (ko) 색인키 데이터 필드를 추출하기 위한 시스템 및 방법
CN111191430B (zh) 自动建表方法、装置、计算机设备和存储介质
Skiena et al. Sorting and searching
JP3151730B2 (ja) データベース検索システム
US10990575B2 (en) Reorganization of databases by sectioning
Jaro UNIMATCH: a computer system for generalized record linkage under conditions of uncertainty
JP6237193B2 (ja) 行列圧縮装置、制御方法、及びプログラム
US7996366B1 (en) Method and system for identifying stale directories
JP3288063B2 (ja) 可変長データの格納および参照システム
JP2003108576A (ja) データベース管理装置およびデータベース管理方法
US7356604B1 (en) Method and apparatus for comparing scores in a vector space retrieval process

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
AS Assignment

Owner name: FURUSHO, MR. SHINJI, JAPAN

Free format text: LICENSE AGREEMENT;ASSIGNORS:TURBO DATA LABORATORIES;FURUSHO, MR. SHINJI;REEL/FRAME:017275/0514

Effective date: 20050527

Owner name: TURBO DATA LABORATORIES, JAPAN

Free format text: LICENSE AGREEMENT;ASSIGNORS:TURBO DATA LABORATORIES;FURUSHO, MR. SHINJI;REEL/FRAME:017275/0514

Effective date: 20050527

Owner name: ASSIST SYSTEMS LABORATORY CO., LTD., JAPAN

Free format text: LICENSE AGREEMENT;ASSIGNORS:TURBO DATA LABORATORIES;FURUSHO, MR. SHINJI;REEL/FRAME:017275/0514

Effective date: 20050527

AS Assignment

Owner name: TURBO DATA LABORATORIES, INC., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FURUSHO, SHINJI;REEL/FRAME:018239/0662

Effective date: 20060706

FPAY Fee payment

Year of fee payment: 4

RF Reissue application filed

Effective date: 20080619