WO2014034383A1

WO2014034383A1 - Information processing device, record location information specification method, and information processing program

Info

Publication number: WO2014034383A1
Application number: PCT/JP2013/071127
Authority: WO
Inventors: 古庄　晋二
Original assignee: 株式会社ターボデータラボラトリー
Priority date: 2012-08-29
Filing date: 2013-08-05
Publication date: 2014-03-06
Also published as: JP2015207026A

Abstract

Provided is a technology which provides an easily usable environment whereby a large database is administered inexpensively and without restriction on usage environment. Provided is an information processing device, comprising: a location information specification unit which returns a record number of a designated value which is an index for specifying location information of a desired record from a database wherein a plurality of records are stored which each respectively have unique record numbers, and which specifies the location information of the record using the index which returns the record number corresponding to an order after sorting with a prescribed item. The size of the index is proportional to the size of the original database.

Description

Information processing apparatus, record position information specifying method, and information processing program

The present invention relates to database management technology, and particularly to management technology for large-scale data stored in a distributed manner.

“Search” that accumulates data, retrieves necessary data from it, and presents it is the basic role of the database management device. An index is essential for speeding up this search. Examples of the existing index include B-Tree and hash (for example, see Non-Patent Document 1).

In recent years, the amount of data has increased rapidly, and the database has inevitably become larger. In addition, a large database often collects data in various locations. For example, POS data generated at each store, observation data acquired at observatories and meteorological stations in various locations, and the like.

Non-Patent Document 1: Douglas Comer “The Ubiquitous B-Tree”, Computing Surveys, June 1979, Vol 11, No. 1 2, p121-p137

The conventional index cannot handle large-scale data or data obtained in a distributed manner.

First of all, the processing speed that is urgently required as the scale increases is not sufficient. For example, if a conventional index is used, suppose that there is a system that takes about 1 second to retrieve 1 million rows of data. One second is satisfactory. However, when the data becomes 100 million rows, even if the same processing speed is maintained, it takes 100 seconds and cannot be used. In addition, B-Tree, which is the most frequently used index in the past, has a complicated operation mechanism, is hard to hit the cache, and is difficult to speed up with large-scale data. For this reason, when the data scale becomes large, a dedicated system or the like must be constructed and dealt with.

Also, with existing technology, serverless and database decentralization is not possible. As described above, in a large-scale database, data is often distributed and collected in various places. However, in a conventional search system, data is collected in a server, and thereafter, processing such as search is performed. This is because the conventional index cannot give a unique record number to data in the database. The unique record number is an index that can be used even between databases with different schemas. However, since conventional indexes do not have this property, it is difficult to distribute and manage data. Therefore, at the time of search, the search processing is performed on the server side where data is accumulated while using only the CPU of the server, and a search delay occurs at an early stage as the number of simultaneous accesses increases.

This processing on the server side increases costs and restricts the usage environment. Normally, only one million lines of data can be managed by one server. For this reason, when the handling data reaches 100 million rows, 100 servers are required, the introduction cost and the management cost become enormous, and a facility for installing and managing these servers is required. As mentioned above, it is even more so when building a dedicated system. At this time, the capacity of the index itself also becomes a problem. For example, B-Tree requires a storage area of O (n * log (n)), where n is the number of data stored in the database. An increase in index capacity also leads to a decrease in performance.

Therefore, it is desirable that an index in a large-scale database has a property that a necessary storage capacity does not increase rapidly even if the database becomes large. For example, if the number of data stored in the database is n, the size is preferably O (n). In addition, it is desirable that the data is acquired without being server-less, and the data acquired in each place is distributed and managed as it is, and can be freely accessed via the network. These cannot be realized with the current index.

The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technology that can manage a large-scale database at low cost without restrictions on the use environment and provide an easy-to-use environment.

The present invention is an index for specifying position information of a desired record from a database in which a plurality of records each having a unique record number is stored, and returns a record number of a specified value and a predetermined item An information processing apparatus including a position information specifying unit that specifies position information of a record using an index that returns a record number corresponding to the rank after sorting is provided. The size of this index is proportional to the original database size.

大 Large-scale databases can be managed at low cost without restrictions on the usage environment, and an easy-to-use database management environment can be provided.

It is a block diagram of the database system of a first embodiment. (A)-(d) is explanatory drawing for demonstrating the database of 1st embodiment. (A)-(d) is explanatory drawing for demonstrating the database of 1st embodiment. It is explanatory drawing for demonstrating the virtual integrated data and virtual integrated sort data of 1st embodiment. It is a functional block diagram of the information processing apparatus of 1st embodiment. (A)-(c) is explanatory drawing for demonstrating the index file for every data item of 1st embodiment. (A) And (b) is explanatory drawing for demonstrating the index file for every table of 1st embodiment. It is a flowchart of the 1st search process of 1st embodiment. It is a flowchart of the 2nd search process of 1st embodiment. It is a flowchart of the positional information specific process of 1st embodiment. It is explanatory drawing for demonstrating the browsing process of 1st embodiment. (A)-(c) is explanatory drawing for demonstrating the index file for every data item of 2nd embodiment. (A) And (b) is explanatory drawing for demonstrating the index file for every table of 2nd embodiment.

In the embodiment of the present invention, it is an index for specifying position information of a desired record from a database storing a plurality of records each having a unique record number, and returns a record number of a specified value, There is provided an information processing apparatus including a position information specifying unit that specifies position information of a record using an index that returns a record number corresponding to a rank after sorting by a predetermined item. The size of this index is proportional to the original database size.

Specifically, an information processing apparatus that manages a database including records that store item values for predetermined data items, using an index file for each data item that can be searched, and the index file A position information specifying unit that specifies position information of a desired record, and each record is uniquely given a record number in advance, and the position information specifying unit specifies the record number as the position information. The index file for each data item can acquire the record number from the item value of the data item, and can acquire the record number from the order of the sort database in which the data item is sorted as a key item. There is provided an information processing apparatus characterized by being.

At this time, there are a plurality of databases to be managed, a database ID is uniquely assigned to each database in advance, the index file is generated for each database, and the sort database includes the plurality of databases. The virtual integrated database virtually integrated is sorted using the data items as key items, and the position information specifying unit further specifies the database ID of the database to which the desired record belongs as the position information. You may do.

The index file for each data item stores a value list for storing unique item values belonging to the data item in a predetermined order, and the cumulative number of records in the database for each item value in the order in which the value list is stored. And a sort list that stores the order of the record numbers after sorting in the predetermined order using the data item as a key item.

In addition, the index file for each data item includes the database, the data item of the database, the sort list that stores the order of the record numbers after sorting the data item as a key item in a predetermined order, and the data item of the database And an original data list that stores the item values in the initial arrangement order.

In addition, the position information specifying unit may include a first search unit that uses the index file for each data item and specifies the position information of the specified item value of the data item.
Further, the position information specifying unit may include a second search unit for specifying the position information of a specified position in the sort database using an index file for each data item.
The position information specifying unit may further include, for each item value for each data item, a record number calculation unit that calculates, for each database, the number of records smaller than the item value and the number of records equal to the item value. .

Further, a record extracting unit that extracts the desired record from the database according to the position information specified by the position information specifying unit may be further provided.

A record position information specifying method for specifying position information of a desired record in a database including a record storing item values for each predetermined data item and a record number uniquely assigned to each record. An index file that can acquire the record number from the item value of the data item, and that can acquire the record number from the order of a sorted database obtained by sorting the database using the data item as a key item, There is provided a record position information specifying method characterized by including a position information specifying step for specifying the position information by specifying the record number of the desired record using an index file generated every time.

The position information specifying step includes a first search step of specifying a data item and an item value of the desired record and specifying the record number of the desired record using the index file of the data item. May be.
Further, the position information specifying step receives the designation of the data item of the record and the rank in the sort database as the desired record, and uses the index file of the data item, and the desired information in the sort database. A second search step for specifying the record number of the record may be provided.
In the second search step, the position information specifying step calculates, for each database, the number of records smaller than the item value and the number of records equal to the item value for each item value of the data item of the record. A record number calculating step may be provided.

There are a plurality of databases, a database ID is uniquely assigned to each database in advance, the index file is generated for each database, and the sort database is a virtual integration of the plurality of databases. The integrated database is a virtual integrated sort database in which the data items are sorted as key items. In the position information specifying step, the database ID of the database to which the desired record belongs may be further specified as the position information. Good.

Further, in an information processing apparatus including a position information specifying unit that specifies position information of a desired record, a database including records storing item values for each predetermined data item stored in a storage device, A record position information specifying method for specifying position information of a record having a target value that is a predetermined item value of a target item that is a predetermined data item in a database in which a record number is uniquely assigned in advance. The storage device further stores an index file for each of the data items that can be searched. The index file stores a value list that stores unique item values belonging to the data item in a predetermined order, and the value list. In the storage order, the cumulative number of records in the database is stored for each item value. A cumulative list, and the database, the data list as a key item, a sort list that stores the order of the record numbers after sorting in the predetermined order, and accessing the value list of the target item, The presence / absence determination step for determining whether or not the target item of the database has the target value, and when determined to be present in the presence / absence determination step, using the cumulative number list and the sort list, A record position information specifying method comprising: specifying a record number of the target value and specifying a record number as the position information.

Further, in an information processing apparatus including a position information specifying unit that specifies position information of a desired record, the information storage device includes a record that stores an item value for each predetermined data item stored in a storage device. A record position information specifying method for specifying position information of a record having a target value that is a predetermined item value of a target item that is a predetermined data item in a database in which a record number is uniquely assigned in advance, The storage device further stores an index file for each of the data items that can be searched. The index file stores the order of the record numbers after the database is sorted in a predetermined order using the data item as a key item. Sort list to perform and the value of the data item in the database An original data list stored in an initial arrangement order, accessing the original data list of the target item, and whether or not the target item of the database has the target value If it is determined in the presence / absence rank determination step that determines the rank and the presence / absence rank determination step, the rank of the original data list is specified as the record number of the target value, and the position information And a record number specifying step, wherein the record position information specifying method is provided.

Furthermore, in an information processing apparatus including a position information specifying unit that specifies position information of a desired record, a plurality of databases including records storing item values for each predetermined data item stored in a storage device, Virtual records in a virtual integrated sort database in which a plurality of databases are virtually integrated and sorted as key items in a plurality of databases in which record numbers are uniquely assigned in advance to each record. In the record position information specifying method for specifying the position information of the record of the target position which is a correct position, the storage device further stores an index file for each of the data items that can be searched for each database, The index file stores unique item values belonging to the data item. A value list to be stored in order, a cumulative number list to store the cumulative number of records in the database for each item value in the storage order of the value list, and the database to be sorted in the predetermined order using the data item as a key item A sort list for storing the order of the subsequent record numbers, and using the value list of the key items, the cumulative number list, and the sort list, the storage range in the virtual integrated sort database in the target range A search value determination step for determining a search value including a position, the value list of the key item, the cumulative number list, and the sort list, and a search value corresponding to the target position in the determined search value is A position information specifying step for specifying the table to which the table belongs and the rank in the table as the position information. Providing record position information specifying method according to claim Rukoto.

An information processing apparatus including a position information specifying unit that specifies position information of a desired record includes a record that stores an item value for each predetermined data item stored in a storage device. Of a target position which is a virtual position in a virtual integrated sort database obtained by virtually integrating the plurality of databases and sorting predetermined data items as key items in a plurality of databases assigned record numbers to A record position information specifying method for specifying position information of a record, wherein the storage device further stores an index file for each data item that can be searched for each database, and the index file is stored in the database. Are sorted in the specified order using the data item as a key item. A sort list for storing the order of the record numbers; and an original data list for storing the values of the data items in the database in an initial order of order, the sort list for the key items and the source A search value determining step for determining a search value including the target position in a storage range in the virtual integrated sort database, and using the sort list and the original data list of the key item. A record position information specifying method, comprising: a table to which a search value corresponding to the target position in a search value belongs; and a position information specifying step for specifying a rank in the table as the position information. provide.

A record extraction method for extracting a desired record from a database comprising a record for storing an item value for each predetermined data item and a record number uniquely assigned to each record. There is provided a record extraction method including a record extraction step of extracting the desired record in accordance with position information specified by the method.

Further, the computer is a plurality of databases each of which stores values for each predetermined data item, and each database of each database is assigned with a unique record number in advance. An information processing program that functions as position information specifying means for specifying position information of a desired record using an index file included in the index file, wherein the index file is generated from each of the databases, and for each data item, Provided is an information processing program for acquiring the record number from the item value of a data item and acquiring the record number from the rank of a sorted database obtained by sorting the data item as a key item. . The information processing program may be provided by being recorded on a computer-readable storage medium.

In addition, a first information processing apparatus that manages a database that is connected via a network and that includes records that store item values for each predetermined data item, and a second information processing that specifies position information of the desired record The first information processing apparatus includes an index file for each data item that can be a search target, and each record is uniquely assigned a record number in advance, The index file for each data item can acquire the record number from the item value of the data item, and can acquire the record number from the order of the sort database in which the data item is sorted as a key item. The second information processing apparatus specifies the record number as the position information. To provide a database system for the butterflies.

In this database system, there are a plurality of databases to be managed, a database ID is uniquely assigned to each database in advance, the index file is generated for each database, and the sort database is the plurality of databases. A virtual integrated database obtained by virtually integrating the databases is a database in which the data items are sorted as key items, and the second information processing apparatus uses the database ID of the database to which a desired record belongs as the position information. You may comprise so that it may specify further. At this time, at least one database among the plurality of databases to be managed may be stored on different first information processing apparatuses connected to the network.

<< First Embodiment >>
Hereinafter, embodiments to which the present invention is applied will be described with reference to the drawings. First, the system configuration of this embodiment will be described.

FIG. 1 is a diagram for explaining an outline of a database system 100 according to an embodiment of the present invention and functional blocks of an information processing apparatus provided in the database system 100. As shown in the figure, in this embodiment, a plurality of information processing apparatuses 110-0, 110-1, and 110-2 are connected via a network 120. Hereinafter, when there is no need to distinguish each information processing apparatus, the information processing apparatus 110 represents the information processing apparatus. Here, as an example, the case where three information processing apparatuses 110 are connected to the network 120 is shown, but the number of information processing apparatuses 110 connected is not limited thereto.

Each information processing apparatus 110 functions as a data management apparatus that manages a database held by each information processing apparatus 110 while holding a database described later. As a data management device, for example, a database browsing function, a search function, and the like are also provided. Each information processing apparatus 110 includes a CPU 111, a memory 112, and a storage device 113. In addition, a network interface (NWIF) 114 that enables data transmission / reception between the information processing apparatuses 110 via the network 120 is provided. Each information processing device 110 is connected to an input device 115 and a display device 116 which are user interfaces of the information processing device 110. Furthermore, an external storage device 117 may be connected.

In the present embodiment, the information processing apparatuses 110-0, 110-1, and 110-2 store the databases 200-0, 200-1, and 200-2, respectively. The database is represented by the tabular data 201 when it is not necessary to distinguish between the databases. The database 200 is stored in the storage device 113 or the external storage device 117 of each information processing apparatus 110.

Further, in the present embodiment, the information processing apparatuses 110-0, 110-1, and 110-2 are index files 300-0, 300-1, and 300 of the databases 200-0, 200-1, and 200-2, respectively. -2. If there is no need to distinguish the index file, the index file 300 is representative. The index file 300 is stored in the storage device 113 or the memory 112 of each information processing device 110. The index file 300 is created at an arbitrary time interval. For example, it is created every time a predetermined amount of data is collected.

Next, the database 200 stored in each information processing apparatus 110 will be described. The database of the present embodiment may be structured tabular data, semi-structured data, or unstructured data.

An example of structured tabular data 201 is shown in FIG. The structured tabular data 201 is an array of one or more records (rows) 213 including item values 212 corresponding to one or more data items (columns) 211 as shown in FIG.

Each record 213 is given a record number (RecNo.) 214. This record number is information indicating the position where the record is stored in the tabular data 201. This record number is given to the tabular data 201 at a predetermined timing. The predetermined timing is, for example, the time when the tabular data 201 is created. In the database 200 of this embodiment, each record can be accessed by designating a record number.

Generally, in the tabular data 201, the records are not always arranged in the order of the record numbers (Rec No.) 214. For example, when the tabular data 201 at the time of creation (referred to as the original tabular data 201) is sorted so that the item values 212 are arranged in ascending order using the predetermined data item 211 as a key item, the sorted tabular format The order of records in the data 201 s is different from the order of records in the original tabular data 201. Such an example is shown in FIG. FIG. 2B shows a sorting result when the tabular data 201 is sorted in ascending order using the data item 211 “Name” as a key item. In this specification, information indicating the order of records in the database 200 of each aspect is referred to as a record order number (rank) 215. In the original tabular data 201, the record order number 215 matches the record number (RecNo.) 214.

2A exemplifies five records 213 including three items <Gender>, <Name>, and <Age> as the data item 211. FIG. Here, for example, in the record 213 with the record number 214 being 0, the item value 212 of the <Generator> data item 211 is “female”, the item value 212 of the <Name> data item 211 is “Jemi”, and the data item 211. The item value 212 of <Age> is “2”. However, in the present embodiment, the number of data items 211 and the number of records 213 are not limited thereto.

The item value 212 may be either numeric data or text data, but it is assumed that the order can be uniquely assigned. For example, numerical data such as 2, 1,... Is stored as the item value 212 of <Age> in the data item 211, and text data such as Jemi, Griza,... As the item value 212 of the <Name> is stored in the data item 211. Stored.

As shown in FIG. 2C and FIG. 2D, the data item 211 of the tabular data 201 of this embodiment is a repeated item that can store a plurality of item values 212 in each record 213. Also good. Here, the case where the data item 211 of <Name> is a repetition item is illustrated. It should be noted that the plurality of item values 212 stored in the repeated item does not matter in the normal order. That is, the tabular data 201 of FIG. 2C and the tabular data 201 shown in FIG. 2D are considered to be logically the same.

An example of the semi-structured data 202 is shown in FIG. The semi-structured data 202 basically has the same configuration as the tabular data 201. That is, it is an array of one or more records including item values 212 corresponding to one or more data items 211. However, in the semi-structured data 202, the data item 211 includes a data item 211 that is guaranteed to have a value and a data item 211 that is not guaranteed.

In the example of FIG. 3A, <ID> is a data item 211 that is guaranteed to have a value, and other <name>, <address>, <gender>, <age>, <food> Is a data item 211 that is not guaranteed.

An example of the unstructured data 203 is shown in FIG. The unstructured data 203 also basically has the same configuration as the tabular data 201. That is, it is an array of one or more records 213 including item values 212 corresponding to one or more data items 211. However, in the unstructured data 203, there is no data item for which data is guaranteed to exist.

In this embodiment, the semi-structured data 203 and the unstructured data 204 are mapped to a structure similar to that of the tabular data 201 as shown in FIGS. 3C and 3D, respectively. I do. The handling of the item value 212 without a value (NULL item) is determined in advance. Hereinafter, in the present embodiment, the NULL item is described as being handled as the minimum value of each data item 211.

Hereinafter, in the present embodiment, a case where structured tabular data 201 is registered as the database 200, including a case where a NULL item is included, will be described as an example. The processing is the same for other types of data.

In this embodiment, it is assumed that the tabular data 201 is distributedly managed. Hereinafter, in this specification, the tabular data 201 included in each information processing apparatus 110 is referred to as a table. Each table is uniquely assigned an identification number i in advance. In the present embodiment, the tabular data 201-0, 201-1 and 201-2 are referred to as Table 0, Table 1 and Table 2 with

identification numbers

0, 1 and 2, respectively. In the present embodiment, a plurality of tables may be provided in one information processing apparatus 110. The identification number i of each table is called a table ID.

The information processing apparatus 110 according to the present embodiment specifies position information of a desired record from a table group that is distributed and managed. A database obtained by virtually integrating a table group that is distributed and managed in the order of table IDs is referred to as a virtual integrated database (virtual integrated DB). A database in which the virtual integrated DB is sorted using predetermined data items as key items is referred to as a virtual integrated sort database (virtual integrated sort DB). The record order number of the virtual integrated sort DB is called a virtual row (Vrec).

FIG. 4 is a diagram for explaining the virtual integrated DB and the virtual integrated sort DB. Here, a case where the search target table group is a table 0 (Table 0) and a table 1 (Table 1) is illustrated. As shown in this figure, the virtual integration DB 500 is a table in which table 0 and table 1 are virtually integrated in the order of table IDs. The virtual integrated sort DB 510 is obtained by sorting the virtual integrated DB 500 using predetermined data items (here, <Name>) as keys. Here, the item 501 indicates a table ID and a record number.

In this example, the table 0 is the tabular data 201 shown in FIG. 2A, and is structured tabular data having five records. On the other hand, Table 1 is unstructured data with six records and NULL items.

When the data item 211 and the predetermined item value 212 are designated by the user, the information processing apparatus 110 according to the present embodiment searches the table group, and records 213 having the item value 212 designated by the data item 211. Identify and return location information. The position information is a table ID and a record number of a table (affiliation table) to which the record 213 equal to the item value 212 belongs. Also, when the user specifies a data item 211 as a key item when generating the virtual integrated sort DB 510 and a virtual row (Vrec), position information of the record 213 of the virtual row (Vrec) is returned.

The function of the information processing apparatus 110 that realizes this will be described below. FIG. 5 shows a functional block diagram of the information processing apparatus 110 that realizes the above functions. As shown in the figure, the information processing apparatus 110 of this embodiment includes an index creation unit 410 and a position information identification unit 420. Each of these functions is realized by the CPU 111 included in the information processing apparatus 110 loading a program stored in the storage device 113 in advance into the memory 112 and executing it. Details of each part will be described below.

The index creation unit 410 creates the index file 300 from the tabular data 201 at an arbitrary time interval.

Here, the index file 300 created by the index creation unit 410 of this embodiment will be described. The index file 300 according to the present embodiment includes one or more elements provided to speed up the process of specifying the position of a desired record 213 from the tabular data 201 managed on each information processing apparatus 110. One or more lists in array format.

FIG. 6 is a diagram for explaining the index file 300 of the present embodiment. The index creation unit 410 according to the present embodiment creates the following index files 300 for all tables that are distributed and managed. Here, the index file 300 created from the tabular data 201 shown in FIG. 2A will be described as an example.

The index file 300 is generated for each data item 211 of the tabular data 201. The data item 211 for creating the index file 300 is called an item of interest. 6A shows an example in which the item of interest is <Gender>, FIG. 6B shows an example in which the item of interest is <Name>, and FIG. 6C shows an example in which the item of interest is <Age>. As shown in these drawings, the index file 300 according to the present embodiment includes a value list (VL) 310, an accumulation number list (CAGR) 320, and a sort list (SOS) 330. Each list is composed of an element and a rank (Ord) indicating a record sequence number as its position. Each list can be extracted from each list by specifying the rank (Ord). Further, the element of the rank j starting from 0 in the list ABC is denoted as ABC [j].

VL310 is a list in which unique item values 212 appearing in the item of interest are sorted in a predetermined order (for example, ascending or descending order) and stored as elements. Specifically, the VL 310 generates the table format data 201 by sorting the table item data 201 in a predetermined order using the item of interest as a key, and suppressing the same value as the result (sorted table format data 201s).

The SOS 330 stores the tabular data 201 as an element in the arrangement order of the record numbers 214 when the item of interest is sorted as a key. Sorting is performed in the same order as VL310. By providing the SOS 330, the record number 214 corresponding to the sorted item value 212 can be freely extracted.

CAGR 320 stores an accumulated value of the number of records of each item value 212 as an element. The number of records is accumulated in the order of VL310. This is also a list that associates the VL 310 with the SOS 330. The CAGR 320 can know the storage range of each element of the VL 310 in the SOS 330. That is, when i is larger than 0, the element VL [j] of the VL310 is the section of [CAGR [j-1], CAGR [j]) of the SOS 330, that is, CAGR [j-1] to CAGR [j] Stored in order of -1. Note that the element VL [0] of the VL310 is stored in the rank of the section [0, CAGR [0]) of the SOS330. Hereinafter, in the present specification, when a section and a range are described, a closed section is indicated by [] and an open section is indicated by ().

For example, in the example of FIG. 6B, the element “Grizza” of VL rank 1 will be described. The element of rank 0 of CAGR 320 is “1”, and the element of rank 1 of CAGR 320 is “3”. Therefore, “Grizza” is stored in the range of the rank [1, 3) of the SOS 330, that is, the range of the rank [1, 2].

Also, each list of the index file 300 is created for each table. FIGS. 7A and 7B show an example of an index file 300 when the item of interest is <Name>. 7A shows the index file 300 of the table 0, and FIG. 7B shows the index file 300 of the table 1.

Next, the position information specifying unit 420 will be described. The position information specifying unit 420 searches the table group using the index file 300 in accordance with an instruction from the user, and specifies position information of a predetermined record. In order to realize this, the position information specifying unit 420 according to the present embodiment searches for a record having the item value 212 of the data item 211 in response to the designation of the data item 211 and the predetermined item value 212, In response to the designation of the first search unit 421 for specifying the position information, the data item 211 as the sort key item, and the virtual row (Vrec), the record of the virtual row (Vrec) is searched and the position information is specified. A second search unit 422 that calculates the number of records specified, and a record number calculation unit 423 that calculates the specified number of records.

The record number calculation unit 423 of the present embodiment prepares two functions represented by the following expressions (1) and (2), and when the first search unit 421 and the second search unit 422 search for position information, The number of records shown by the following formulas (3) and (6) is calculated. The calculation is performed using the VL310, CAGR320, and SOS330 of the designated data item 211. Hereinafter, each list of the table (i) is referred to as VL (i), CAGR (i), and SOS (i), respectively.

CLTP (i) [j] obtained by Expression (1) is the number of records belonging to a value smaller than the item value of the order j of VL (i).

CEQP (i) [j] obtained by Expression (2) is the number of records belonging to a value equal to the item value of rank j of VL (i).

CLTV (i) <x> obtained by Expression (3) is the number of records belonging to a value smaller than a predetermined item value x in the table i. In Expression (3), case1 is a case where the item value x exists in VL (i), and j is a rank in the VL (i) of the item value x. Case 2 is a case where the item value x does not exist in VL (i), and j is the maximum item value when a value smaller than x exists in the item value of VL (i). The order of Case 3 is a case where the item value x does not exist in VL (i), and a value smaller than x does not exist in the item value of VL (i).

CEQV (i) <x> obtained by Expression (4) is the number of records belonging to a value equal to the predetermined item value x in the table i. In (4), case1 is a case where the item value x exists in VL (i), and j is a rank in the VL (i) of the item value x. Case 2 is a case where the item value x does not exist in VL (i).

CALTV <x> obtained by Expression (5) is the number of records belonging to a value smaller than a predetermined item value x in the virtual integrated DB 500 and the virtual integrated sort DB 510.

CAEQV <x> obtained by Expression (6) is the number of records belonging to a value equal to a predetermined item value x in the virtual integrated DB 500 and the virtual integrated sort DB 510.

Next, processing of the first search unit 421 of the present embodiment will be described. As described above, when the data item 211 and the item value are given by the user, the first search unit 421 returns the position information in the distribution management target table. That is, the table ID and record number of the record having the value are specified from the value.

Specifically, for each table i, in the order of table ID, the search is performed for VL (i) in the index file 300 having the data item 211 as the target item, and the presence / absence of the specified item value is present. Identifies its location. The search for VL (i) is performed using a bisection method or the like. When there is an item value specified in VL (i), the record number is specified by the above method using CAGR (i) and SOS (i).

FIG. 8 is a processing flow example of the first search process by the first search unit 421 of the present embodiment. Here, the number of tables to be searched is M (M is an integer of 1 or more). It is assumed that the table group to be searched is determined in advance. At this time, the search result is stored in the first search result storage area in the storage device 113.

As shown in this figure, when a search target data item 211 (Target Item: TI) and an item value 212 (Target Value: TV) are given by the user, first, a table ID to be searched is initialized (i = 0) and the first search result storage area is initialized (step S1101). Then, the index file 300 of the data item TI of the table i is accessed.

First, VL (i) is accessed and the item value TV is searched (step S1102). Here, the search is performed using a bisection method or the like. If the item value TV exists in VL (i), the rank is extracted, CAGR (i) is accessed, and the storage range of the item value TV in SOS (i) is specified by the above-described method (step S1103). ). According to the obtained storage range, the SOS (i) is accessed, and the record number 214 of the item value TV is obtained (step S1104). The obtained record number 214 is additionally stored in the first search result storage area in association with the table ID of the table being searched (step S1105).

Thereafter, the index file (i) of the next table is accessed until the processing of all tables is completed, and the processing from step S1102 is repeated (steps S1106 and 1107).

On the other hand, if the item value TV does not exist in VL (i) in step S1102, the process proceeds to step S1106 as it is and the process is repeated.

When all the tables have been processed, a set of table ID and record number stored in the first search result storage area is output as position information (step S1108).

The first search process by the first search unit 421 described above will be described with reference to FIG. For example, it is assumed that <Name> is designated as the data item 211 and “Sillarub” is designated as the item value. First, VL (0) in table 0 is accessed to determine whether or not “Sillarub” exists. Since table 0 does not have this item value, it moves to table 1 next. Then, in the table 1, VL (1) is similarly accessed, and 4 is obtained as the rank. CAGR (0) is accessed and [4, 5] is obtained as its storage range. Then, SOS (0) is accessed, and

record numbers

1 and 2 are obtained. Finally,

record numbers

1 and 2 of Table 1 are output as search results.

Next, processing of the second search unit 422 of this embodiment will be described. As described above, when the key item and the virtual row (Vrec) of the virtual integrated sort DB 510 are designated by the user, the second search unit 422 returns the position information of the corresponding record. That is, the table ID and the record number 214 of the record of the designated virtual row TP of the virtual integrated sort DB 510 are specified.

Specifically, the VL 310 is accessed in the order of the table ID, and a value at a predetermined position (for example, near the center) is extracted as a temporary search value (provisional search value). A virtual row (provisional virtual row) is obtained. The obtained virtual virtual line is compared with the designated virtual line, and the search is repeated until they match. Then, the position information of the matching provisional search value is calculated.

Note that the temporary virtual row of the temporary search value is calculated by Expression (5) and Expression (6) by the record number calculation unit 423. In other words, the range of the temporary virtual row (rank) is [CALTV <provisional search value>, CALTV <provisional search value> + CAEQV <provisional search value>). That is, CALTV <provisional search value> to CALTV <provisional search value> + CAEQV <provisional search value> -1.

FIG. 9 is a processing flow example of the second search process by the second search unit 422 of the present embodiment. Here, the number of tables to be searched is M (M is an integer of 1 or more). At this time, an area for storing the search result in the storage device 113 is set as a second search result storage area. Further, an area that holds the value extracted as the temporary search value is set as a temporary search value storage area.

When TP is given as a designated virtual row by the user, first, the table number to be searched and the second search result storage area are initialized (step S1201). Then, the index file 300 of the key item TI when creating the virtual integrated sort DB 510 in the table i is accessed.

First, VL (i) is accessed, and the provisional search value vp is determined according to a predetermined rule (step S1202). Here, for example, the median is extracted as described above. At this time, the rank of the temporary search value vp in the VL (i) is j. Further, the determined provisional search value vp and rank j are additionally registered in the provisional search value storage area (step S1203). Then, the record number calculation unit 423 is caused to calculate the range of the virtual row (temporary virtual row) of the temporary search value vp (step S1204).

The designated virtual row TP is compared with the range of the temporary virtual row (step S1205). Designated virtual line TP is within the range of the temporary virtual line, temporary search value vp is, determines that the value _{V TP} of the virtual line (Step S1209). Then, in the value _{V TP,} performs position information specifying process of specifying a table ID and a record number of the virtual line TP (step S1210), the process ends.

On the other hand, if the designated virtual row TP is outside the range of the temporary virtual row, it is determined whether a new temporary search value can be determined in the table i according to a predetermined rule (step S1206). Here, for example, when the designated virtual row TP is smaller than the minimum value of the temporary virtual row, the temporary search value vp in VL (i) and the temporary search value already stored in the temporary search value storage area are used. It is determined between the maximum value among the values smaller than the search value vp. On the other hand, when the designated virtual row TP is larger than the maximum value of the temporary virtual row, the temporary search value vp in VL (i) and the temporary search value stored in the temporary search value storage area are larger than the temporary search value vp. Decide between the smallest of the values.

If it can be determined, a new temporary search value vp is determined (step S1207), the process proceeds to step S1203, and the process is repeated.

On the other hand, when the new temporary search value vp cannot be determined within the above range, the process moves to the next table (step S1208), and the process is repeated from step S1202.

Next, the flow of the position information specifying process by the second search unit 422 of the present embodiment will be described. Here, it is determined whether or not the record corresponding to the virtual row TP belongs to the table in order of the table ID, and if it belongs, the record number is determined. For the determination and determination, the calculation result by the record number calculation unit 423 is used. FIG. 10 is a processing flow example of the position information specifying process of the present embodiment by the second search unit 422.

First, affiliation table determination processing for determining the table ID of the table to which the affiliation belongs is performed. Here, in the order of table ID (step S1301), the total number AC (i) <V _TP > of records having a value equal to the value V _TP included in the table below i is calculated (step S1302). AC (i) is calculated by the following equation (7).

Then, the rank POS (i) <V _TP > (calculated virtual row) in the virtual integrated sort DB 510 of the record having the largest rank among the records having a value equal to the item value V _TP of the table i is determined. This POS (i) <V _TP > is obtained by the following formula (8) in which AC (i) <V _TP > is added to the total number of records CALTV <V _TP > having a value smaller than the item value V _TP ( Step S1303).

Thereafter, the calculated virtual row POS (i) <V _TP > is compared with the designated virtual row TP (step S1304). As a result, when POS (i) <V _TP > is greater than or equal to the virtual row TP, the affiliation table of the record corresponding to the virtual row TP is determined as the table i (step S1305).

In step S1304, if the calculated virtual row is smaller than the designated virtual row TP, the process moves to the next table (step S1310), returns to step S1302, and repeats the process.

On the other hand, when the affiliation table i is determined, a record number calculation process for calculating a record number (RecNo.) In the table i of the record corresponding to the virtual row TP is performed using the following formula.

Record number calculation processing, first, calculates the virtual integration sorting DB 510, the records belonging to a value equal to the item value _{V TP} table i, the position immediately before the record (step S1306). This is POS (i−1) <V _TP >. When i = 0, CALTV <V _TP > is set.

Then, the record order AA of the record corresponding to the virtual row TP among the records belonging to the value equal to the item value _VTP in the table i is calculated (step S1307). This is obtained by subtracting 1 from the value obtained by subtracting POS (i−1) <V _TP > (or CALTV <V _TP >) from the virtual row TP.

Then, the order Ord in SOS (i) is calculated (step S1308). A value obtained by adding the record order AA to the number of records CLTV (i) <V _TP > belonging to a value smaller than the item value V _TP in the table i indicates the position (order Ord) of SOS (i). That is, when BB = CLTV (i) <V _TP > + AA, the position (rank Ord) in the SOS (i) of the record corresponding to the virtual row TP is represented by BB.

Then, the element of SOS (i) [BB] is determined as the record number (RecNo.) (Step S1309), and the process is terminated.

Hereinafter, the second search process of the present embodiment will be described using a specific example with reference to FIGS. 4 and 7. It is assumed that <Name> is specified as the key item and 5 is specified as the virtual row (Vrec) TP.

The second search unit 422 accesses the index file 300 in which the item of interest shown in FIG. First, VL (0) of the table 0 is accessed, and for example, “Jemi” having a rank of 2 is extracted as the temporary search value vp. Then, the record number calculation unit 423 obtains the range of the rank of “Jemi” in the virtual integrated sort DB 510. Here, [6, 7] is obtained.

Since the designated virtual row TP is a smaller value outside this range, a smaller value is extracted again as the temporary search value vp in VL (0). For example, “Grizza” is set to vp. Similarly, [3, 5] is obtained as the range of rank in the virtual integrated sort DB 510 of “Grizza”. Since the virtual row TP is within the range, the temporary virtual value vp “Grizza” is set as the virtual row value V _TP .

Next, determine the table. Here, first, the number of “Grizza” up to Table 0 is calculated, and 2 is obtained. Further, the total number of values smaller than “Grizza” (CALTV <Grizza>) in the virtual integrated sort DB 510 is 3. Therefore, the virtual row in the virtual integrated sort DB 510 with the highest rank of “Grizza” in the table 0 is 4.

Since the calculated virtual row is smaller than the virtual row TP, the process moves to the next table 1 and performs the same processing. As a virtual row in the virtual integrated sort DB 510 for the one with the highest rank of “Grizza” in Table 1, 5 is obtained. Since this is a value less than or equal to the virtual row TP, the affiliation table of the record of the virtual row TP is determined as 1.

Finally, determine the record number. In the virtual

integrated sort DB

510, 4 is obtained as the rank of the record immediately before “Grizza” in Table 1. The rank AA of “Grizza” corresponding to the designated virtual row TP in the table 1 is 0. In Table 1, since the number of records having a value smaller than “Grizza” (CLTV <Grizza>) is 2, the element of rank 2 of SOS (1) becomes the record number of “Grizza” in the designated virtual row TP. .

In the present embodiment, the table ID of the table to which the table belongs and the record number are output as the position information. However, the present invention is not limited to this. For example, using the number of records in each table, sequential record numbers (integrated record numbers) may be assigned to all records in all tables in the order of table ID, and the integrated record numbers may be returned. The integrated record number is obtained by adding the total number of records in a table having a table ID smaller than the own table to the record number of the own table.

In the above embodiment, the case where a plurality of databases are set as search targets has been described as an example. However, the number of databases set as search targets may be one. However, when the number of databases is one, the first search unit 421 and the second search unit 422 search only the index file 300 of the database and return only the record number as the position information.

That is, by using the index file 300 of the present embodiment for a single database and specifying a predetermined data item and item value, the record number of the record having the item value can be obtained. Moreover, the record number of the record can be obtained by designating a predetermined row of the sorted database using a predetermined data item as a key item.

In the above embodiment, the case where each information processing apparatus 110 includes the index creation unit 110 and the position information identification unit 420 has been described as an example. However, the present invention is not limited to this. The position information specifying unit 420 is an information processing apparatus independent of the information processing apparatus 110 that holds the database, and may be provided with an information processing apparatus that can transmit and receive data to and from each information processing apparatus 110 that holds the database. Good. The same applies to the index creation unit 110. In this case, the information processing apparatus 110 including the position information specifying unit 420 accesses the information processing apparatus 110 including the desired database 200 and the index file 300, and executes the processing by the position information specifying unit 420.

Also, the user may select a database to be integrated and search for data. When the user selects, a list of databases that can be selected by the user may be displayed and received from the list.

In the present embodiment, the user may specify the data item 211 and the item value 212 to be subjected to the first search process. In this case, you may comprise so that the user interface screen which receives designation | designated of the data item 211 and the item value 212 from a user may be provided. Similarly, in the second search process, the user may instruct the designated virtual row TP for performing the second search process. In this case, you may comprise so that the user interface screen which receives the instruction | indication of virtual row TP from a user may be provided.

In addition, the information processing apparatus 110 according to the present embodiment may further include a display control unit. The display control unit accesses the table according to the position information specified by the first search unit 421 or the second search unit 422, extracts records, and displays them in the display area of the display device 116. That is, the display control unit implements a record extraction function and a display function.

Thus, for example, a search process in which a specific item value is specified can be realized. The search process is realized as follows. In the data item 211 specified by the user, the first search unit 421 specifies the position information of the record having the item value 212 specified by the user. In accordance with the position information specified by the first search unit 421, the display control unit extracts the record from each table and displays it on the display area of the display device 116.

Also, browsing processing of the virtual integrated sort DB 510 can be realized. The browsing process is realized as follows. The second search unit 422 specifies the position information of each record of a predetermined number of virtual rows including the virtual row TP designated by the user. Here, as shown in FIG. 11, the position information of the virtual rows of the number of rows (here, L rows) that can be displayed in the display area of the display device 116 is specified. In accordance with the position information specified by the second search unit 422, the display control unit extracts these records from each table i and displays them in the display area of the display device 116 in the order of virtual rows. For example, each time the virtual row TP designated by the user is changed by a scroll operation or the like, this series of processing is performed to update the display.

As described above, when the item value 212 is specified in the specific data item 211, the database 200 of this embodiment returns the position information of the record belonging to the item value 212, and the virtual row of the virtual integrated sort DB 510 When TP is designated, an index file 300 is provided that returns position information of the virtual row TP. Then, the position information specifying unit 420 searches for a record designated by the user using the index file 300, and specifies the position information. In particular, even if the database 200 is managed in a distributed manner, it is possible to return position information of records in a specified order in a virtually integrated and sorted state.

Therefore, according to the present embodiment, a user can easily search for a desired record using the index file 300 of the present embodiment, regardless of whether the database is single or distributed and managed in a plurality of databases. The position information can be specified.

Thus, as described above, even for a database that is distributed and managed, it is possible to easily realize a search process for extracting a desired value from all databases. Furthermore, it is possible to easily integrate the entire database and realize browsing processing in a sorted state. Further, since virtual integration is only required during search processing and browsing processing, and there is no need for actual integration, there is no need to actually copy and centrally manage all databases. For this reason, the time for copying becomes unnecessary, and it is not necessary to prepare a huge memory area for centralized management.

In addition, the use area of an index such as a B-tree conventionally used for searching a large amount of database increases (O (nlog (n)) at an accelerated rate as the amount of data in the original database increases. The index file 300 of this embodiment is proportional to the size of the original database (O (n)), so that even if the size of the original database is enormous, the storage area is greatly increased. There is no pressure on you.

In addition, all the elements in each list constituting the index file 300 of this embodiment can be accessed in order (Ord). Further, the above search is realized only by searching the index file 300. For this reason, the amount of communication between sites that are pre-distributed and managed for searching can be suppressed. Therefore, the communication amount does not increase when searching and extracting records.

Therefore, even if it is a large-scale database, and even if the database is distributedly managed, there is no need to prepare a dedicated communication network because there is no transmission / reception of a large amount of data. For this reason, according to this embodiment, it is possible to construct a database system using an existing network such as the Internet.

In addition, since the index file 300 of the present embodiment has a simple configuration as described above, it can be created regardless of the database type. Therefore, it is possible to easily specify and extract the position of desired data regardless of the database type to be managed. In addition, prior design for searching is not necessary.

Therefore, according to the present embodiment, even if it is a large-scale database, distributedly managed, easily, at high speed, without restrictions on the use environment, on general-purpose hardware and a general-purpose communication network, It can be handled in the same way as a small-sized or middle-sized database.

That is, the index file 300 according to the present embodiment has a large scale such that a very high-speed search can be realized and a database of 1 trillion records can be practically constructed. Furthermore, since the index file 300 of the present embodiment has a unique record number that is an index that can be used even between databases with different schemas, the index file 300 has wide-area dispersibility, and cooperation between databases that are behind each other is also possible. Is possible. Moreover, according to this embodiment, a server is not required. That is, a search is performed using the client CPU. For this reason, the number of CPUs to be added increases as the number of clients increases, and a large number of clients can be connected without difficulty. Further, since it is serverless, a server system and server software are unnecessary, and a database system can be constructed at a low cost.

<< Second Embodiment >>
Next, a second embodiment to which the present invention is applied will be described. Although it is the same function as 1st embodiment, a different index is used.

The database system of this embodiment is basically the same as the database system 100 of the first embodiment shown in FIG. The same applies to each device of the database system 100. However, the index file 300 is different as described above. Therefore, the configuration of the index file 300 in the information processing apparatus 110 is different, and the processes of the index creation unit 410 and the position information identification unit 420 are different. The applicable database types are also different. Hereinafter, the present embodiment will be described focusing on the configuration different from the first embodiment.

The functional configuration of the information processing apparatus 110 according to the present embodiment basically includes an index creating unit 410 and a position information specifying unit 420 as in the first embodiment shown in FIG. And the positional information specific | specification part 420 is provided with the 1st search part 421, the 2nd search part 422, and the record number calculation part 423 like 1st embodiment.

The index creation unit 410 according to the present embodiment creates the index file 300 from the tabular data 201 at an arbitrary time interval, as in the first embodiment. For example, it is created every time a predetermined amount of data is collected. However, the index file 300 to be created is different.

The index file 300 created by the index creation unit 410 of this embodiment will be described. FIG. 12 is a diagram for explaining the index file 300 of the present embodiment. The index creation unit 410 according to the present embodiment creates the following index files 300 for all tables that are distributed and managed. The index file 300 of this embodiment is also one or more lists in an array format including one or more elements created for each data item 211 of the tabular data 201, as in the first embodiment. As in the first embodiment, the data item 211 for creating the index file 300 is referred to as a focused item.

Here, the index file 300 created from the tabular data 201 shown in FIG. 2A of the first embodiment will be described as an example. 12A is an example in which the item of interest is <Gender>, FIG. 12B is an example in which the item of interest is <Name>, and FIG. 12C is an example in which the item of interest is <Age>. As shown in these drawings, the index file 300 includes a sort list (SOS) 330 and a list (original data list: ORG) 340 composed of data of an item of interest in the original table. Each list includes an element and a rank (Ord) indicating its position. Each list can be extracted from each list by specifying the rank (Ord). Further, the element of the rank j starting from 0 in the list ABC is denoted as ABC [j]. The configuration and creation method of the SOS 330 are the same as those in the first embodiment.

Also in this embodiment, each list of the index file 300 is created for each table. FIG. 13A and FIG. 13B show an example of the index file 300 when the item of interest is <Name>. 13A shows the index file 300 of the table 0, and FIG. 13B shows the index file 300 of the table 1.

Next, databases applicable in the present embodiment will be described. In the present embodiment, SOS 330 and ORG 340 are used as the index file 300. For this reason, in the present embodiment, any of structured data, semi-structured data, and unstructured data may be used, as in the first embodiment. However, in any type of database, one item value is stored in each data item.

Next, the position information specifying unit 420 of this embodiment will be described. Similarly to the first embodiment, the position information specifying unit 420 of this embodiment also specifies position information in accordance with an instruction from the user. In response to the designation of the data item 211 and the predetermined item value 212, the first search unit 421 searches for a record having the item value 212 of the data item 211 and identifies position information. Further, the second search unit 422 searches for the record of the virtual row (Vrec) in the virtual integrated sort DB 510 in response to the designation of the data item 211 and the virtual row (Vrec) as the sort key items, and the position Returns information.

First, the first search process by the first search unit 421 will be described. Similarly to the first embodiment, the first search process of this embodiment searches for and specifies position information of a record having a specified value. When the search target data item 211 (Target Item: TI) and the item value 212 (Target Value: TV) are specified, the first search unit 421 of this embodiment searches the ORG 340 in the order of the table ID. The search uses a conventional search method such as a two-division method.

The first search unit 421 according to the present embodiment additionally stores the record number and the table ID in the first search result storage area with the order (Ord) of the record as the record number every time it hits.

Hereinafter, the first search process of the present embodiment will be described using a specific example with reference to FIG. For example, it is assumed that <Name> is specified as the data item 211 and “Silllab” is specified as the item value 212. First, the ORG 340 of the table 0 is accessed, and the presence / absence of “Sillabub” is determined by the two-division method. Since table 0 does not have this value, it moves to table 1 next. Then, in the table 1, the ORG 340 is similarly accessed, and 1 and 2 are obtained as ranks. This is stored as a record number in the first search result storage area in association with the table ID, and finally output.

Next, the second search process of the second search unit 422 of this embodiment will be described. Similarly to the first embodiment, the second search process of the present embodiment also returns position information of the corresponding record when a key item and a virtual row (Vrec) of the virtual integrated sort DB 510 are designated by the user. That is, the table ID and the record number 214 of the record of the designated virtual row TP of the virtual integrated sort DB 510 are specified.

At this time, in the present embodiment, the ORG 340 is accessed in the order of the table ID, and a value at a predetermined position (for example, near the center) is extracted to obtain a provisional search value (provisional search value). A virtual row (temporary virtual row) in the sort DB 510 is obtained. The obtained virtual virtual line is compared with the designated virtual line, and the search is repeated until they match. Then, the position information of the matching provisional search value is calculated.

Therefore, the flow of the second search process of the present embodiment is basically the same as the second search process shown in FIGS. 9 and 10 of the first embodiment. However, the determination method of the initial temporary search value vp in step S1202, the information stored in the temporary search value storage area in step S1203, and the determination method of the new temporary search value vp in step S1206 are different.

In the present embodiment, CLTV (i) <x> indicating the number of records belonging to a value smaller than the value x in the table (i) used in the second search process by the record number calculation unit 423 is the same as the x The calculation method of CEQV (i) <x> indicating the number of records belonging to a value equal to is different from the first embodiment. Prior to the description of the second search process of this embodiment, the above-described record number calculation process by the record number calculation unit 423 of this embodiment will be described.

The record number calculation unit 423 according to the present embodiment searches for ORG (i) and acquires the rank (Ord) in the table (i) when the value x is designated. Here, the calculation is performed using a two-division method or the like, and the search is performed until one rank (Ord) is designated.

Here, when the value x is not detected in ORG (i), CLTV (i) <x> and CEQV (i) <x> of the table i are both set to 0.

On the other hand, when one rank (Ord) is detected, SOS (i) is searched, and the storage range [e1, e2] in SOS (i) of the value x is specified. Here, it is determined by discriminating the elements of ORG (i) of the records before and after the record having the detected rank Ordx as an element.

At this time, CLTV (i) <x> is obtained with the value e1 of the minimum order of the storage range, and CEQV (i) <x> subtracts the minimum order e1 from the number in the storage range, that is, the maximum order e2. It is obtained as a value obtained by adding 1 to the obtained value.

The calculation method of the number of records CALTV <x> belonging to a value smaller than the value x and the number of records CAEQV <x> belonging to a value equal to the value x in the virtual integrated DB 500 used in the second search process is as follows. This is the same as the embodiment.

Next, details of the second search process of this embodiment will be described. Here, according to the example of the processing flow of the second search process of the first embodiment shown in FIG.

In step S1202, in the present embodiment, the first provisional search value vp is determined in the following procedure in each table i. That is, first, SOS (i) is accessed, and an element (ElementA) at a predetermined position (for example, near the center) is extracted. Then, the ORG 340 is accessed, and the element (ValueB) of the record having the element (ElementA) in the rank (Ord) is extracted and set as the provisional search value vp.

In step S1203, the temporary search value vp, the rank (Ord) in ORG (i), and the rank (Ord) of the temporary search value vp in SOS (i) are also stored in this embodiment.

Further, in step S1206, the new provisional search value vp is sequentially determined by performing the bisection method in SOS (i). At this time, when the designated virtual row TP is smaller than the minimum value of the temporary virtual row, the rank of the current temporary search value vp in SOS (i) and the temporary search value already stored in the temporary search value storage area are used. The maximum value among the values smaller than the current provisional search value vp and the rank in the SOS (i) are determined. On the other hand, when the designated virtual row TP is larger than the maximum value of the temporary virtual row, the rank of the current temporary search value vp in SOS (i) and the temporary search value already stored in the temporary search value storage area are: It is determined between the rank in the SOS (i) of the minimum value among the values larger than the current provisional search value vp.

Hereinafter, a specific example of the second search process according to the present embodiment will be described with reference to FIGS. 4 and 13A and 13B. Here, it is assumed that <Name> is designated as the key item and 5 is designated as the virtual row (Vrec) TP.

First, the second search unit 422 accesses the index file 300 whose table of interest is Name in Table 0 shown in FIG. Then, SOS (0) is accessed, and for example, element 0 having a rank of 3 is extracted. Then, ORG (0) is accessed, and the element “Jemi” with rank 0 is extracted as the provisional search value vp.

Then, the range of the ranking of “Jemi” in the virtual integrated sort DB 510 is obtained. Here, [6, 7] is obtained. Since the virtual row TP is a smaller value outside this range, a value having a smaller rank is re-extracted as the temporary search value vp in SOS (0). For example, element 1 with rank 1 is extracted, ORG (0) is accessed, and element “Grizza” with rank 1 is set as a new temporary search value vp.

Similarly, [3, 5] is obtained as the range of the rank of “Grizza” in the virtual integrated sort DB 510. Since the virtual row TP is within the range, “Grizza” is set as the value V _{TP of the} virtual row.

Next, determine the table. Here, first, the number of “Grizza” up to Table 0 is calculated (CALTV <Grizza>), and 2 is obtained. Further, the total number of values smaller than “Grizza” (CALTV <Grizza>) in the virtual integrated sort DB 510 is 3. Therefore, the virtual row in the virtual integrated sort DB 510 with the highest rank of “Grizza” in the table 0 is 4.

Finally, determine the record number. In the virtual

integrated sort DB

510, 4 is obtained as the rank of the record immediately before “Grizza” in Table 1. 0 is obtained as the rank AA of “Grizza” corresponding to the designated virtual row TP in the table 1. In Table 1, since the number of records having a value smaller than “Grizza” (CLTV <Grizza>) is 2, the element of rank 2 of SOS (1) becomes the record number of “Grizza” in the designated virtual row TP. .

In the present embodiment, the case where a plurality of databases are set as search targets has been described as an example in the above embodiment, but the number of databases set as search targets may be one. Further, the position information specifying unit 420 may be constructed in an information processing device independent of the information processing device 110 that holds the database. Furthermore, a display control unit similar to that of the first embodiment may be provided so that search processing, browsing processing, and the like can be realized. In addition, an interface that allows the user to specify item values to be specified and extraction targets, a virtual row, and an interface from which the user can select a database to be searched may be provided.

As described above, also in this embodiment, the same effect as that of the first embodiment can be obtained.

Note that the configuration of the index file 300 is not limited to the configuration of each of the above embodiments. That is, it is created from the original database, the size and size of the original database are proportional, and given a predetermined data item and value, the position information of the record that satisfies it can be returned, and Any index file can be used as long as it is an index file that can be integrated virtually and can return position information of records in a specified rank in a state of being sorted by predetermined data items. For example, it may be a combination of a first list capable of determining the number of predetermined item values (including 0) and a second list capable of grasping the rank of each record after sorting by a predetermined data item. .

100: Database system, 110: Index creation unit, 110: Information processing device, 111: CPU, 112: Memory, 113: Storage device, 114: NWIF, 115: Input device, 116: Display device, 117: External storage device, 120: Network, 200: Database, 201: Tabular data, 201s: Tabular data after sorting, 202: Semi-structured data, 203: Semi-structured data, 203: Unstructured data, 204: Unstructured data , 211: data item, 212: item value, 213: record, 214: record number, 215: record order number, 300: index file, 310: VL, 320: CAGR, 330: SOS, 340: ORG, 410: index Creation unit, 420: position information identification unit, 421: first search Part, 422: second search unit, 423: record number calculation unit, 500: virtual integration DB, 501: table ID and a record number, 510: virtual integration Sort DB

Claims

An information processing apparatus for managing a database composed of records storing item values for each predetermined data item,
An index file for each data item that can be searched;
Using the index file, comprising a position information specifying unit for specifying position information of the desired record,
Each record is uniquely given a record number in advance,
The position information specifying unit specifies the record number as the position information,
The index file for each data item can acquire the record number from the item value of the data item, and can acquire the record number from the rank of the sort database obtained by sorting the database using the data item as a key item An information processing apparatus characterized by
The information processing apparatus according to claim 1,
There are multiple databases to be managed,
Each database is uniquely given a database ID in advance,
The index file is generated for each database,
The sort database is a virtual integrated database obtained by virtually integrating the plurality of databases, the data items being sorted as key items,
The position information specifying unit further specifies the database ID of the database to which the desired record belongs as the position information;
An information processing apparatus characterized by the above.
The information processing apparatus according to claim 1, wherein
The index file for each data item is
A value list for storing unique item values belonging to the data item in a predetermined order;
A cumulative number list for storing the cumulative number of records in the database for each item value in the storage order of the value list;
An information processing apparatus comprising: the database; and a sort list that stores an arrangement order of the record numbers after sorting in the predetermined order using the data item as a key item.
The information processing apparatus according to claim 1, wherein
The index file for each data item is
A sort list for storing the order of the record numbers after sorting the database in a predetermined order using the data item as a key item;
An information processing apparatus comprising: an original data list that stores the item values included in the data item of the database in an initial arrangement order.
The information processing apparatus according to any one of claims 1 to 4,
The information processing apparatus according to claim 1, wherein the position information specifying unit includes a first search unit that uses an index file for each data item and specifies position information of an item value designated by the data item.
The information processing apparatus according to any one of claims 1 to 4,
The information processing apparatus according to claim 1, wherein the position information specifying unit includes a second search unit that uses the index file for each data item and specifies the position information of a specified position in the sort database.
An information processing apparatus according to claim 6,
The position information specifying unit further includes, for each item value for each data item, a record number calculating unit that calculates, for each database, the number of records smaller than the item value and the number of records equal to the item value. Information processing apparatus.
The information processing apparatus according to any one of claims 1 to 7,
An information processing apparatus, further comprising: a record extraction unit that extracts the desired record from the database according to the position information specified by the position information specifying unit.
A record position information specifying method for specifying position information of a desired record in a database including records storing item values for each predetermined data item and a record number uniquely assigned to each record,
An index file that can acquire the record number from the item value of the data item, and that can acquire the record number from the order of a sorted database in which the database is sorted using the data item as a key item. A position information specifying method for specifying the position information by specifying the record number of the desired record using an index file generated in step (a).
A method for specifying record position information according to claim 9,
The database is plural,
Each database is uniquely given a database ID in advance,
The index file is generated for each database,
The sort database is a virtual integrated sort database in which a virtual integrated database obtained by virtually integrating the plurality of databases is sorted using the data items as key items,
In the position information specifying step, as the position information, the database ID of the database to which the desired record belongs is further specified.
The record position information specifying method according to claim 9 or 10,
The index file generated for each data item is
A value list for storing unique item values belonging to the data item in a predetermined order;
A cumulative number list for storing the cumulative number of records in the database for each item value in the storage order of the value list;
A sort list for storing the order of the record numbers after the database is sorted in the predetermined order using the data item as a key item;
The location information specifying step includes:
Access to the value list of the target item that is the data item of the desired record and determine whether or not the target item of the database has a target value that is the item value of the desired record Steps,
A record number specifying step of specifying the record number of the target value using the cumulative number list and the sort list and determining the position information when the presence / absence determining step is determined to be present. A method for specifying record position information as a feature.
The record position information specifying method according to claim 9 or 10,
The index file generated for each data item is
A sort list for storing the order of the record numbers after sorting the database in a predetermined order using the data item as a key item;
An original data list for storing the item values of the database in the data item, in an initial arrangement order;
The location information specifying step includes:
Accessing the original data list of the target item that is the data item of the desired record, and whether or not the target item of the database has a target value that is the item value of the desired record; Presence / absence ranking determination step for determining the ranking,
A record number specifying step of specifying the rank of the original data list as the record number of the target value and determining the position information when the presence / absence rank determination step is determined to be present. To identify record position information.
The record position information specifying method according to claim 10,
The index file generated for each data item is
A value list for storing unique item values belonging to the data item in a predetermined order;
A cumulative number list for storing the cumulative number of records in the database for each item value in the storage order of the value list;
A sort list for storing the order of the record numbers after the database is sorted in the predetermined order using the data item as a key item;
The location information specifying step includes:
Search value determination for determining a search value including a target position, which is a virtual position in the virtual integrated sort database, designated by the user, using the value list of the key items, the cumulative number list, and the sort list. Steps,
Using the value list, the cumulative number list, and the sort list of the key item, a table to which a search value corresponding to the target position in the determined search value belongs, and a rank in the table are represented by the position. And a position information specifying step for specifying the information as information.
The record position information specifying method according to claim 10,
The index file generated for each data item is
A sort list for storing the order of the record numbers after sorting the database in a predetermined order using the data item as a key item;
An original data list for storing the item values of the database in the data item, in an initial arrangement order;
The location information specifying step includes:
A search value determination step for determining a search value including a target position, which is a virtual position in the virtual integrated sort database, designated by a user, using the sort list of the key items and the original data list;
A position that uses, as the position information, a table to which a search value corresponding to the target position in the determined search value belongs, and a rank in the table, using the sorted list of the key items and the original data list An information specifying step; and a record position information specifying method.
A record extraction method for extracting a desired record from a database comprising a record for storing an item value for each predetermined data item and a record number uniquely assigned to each record,
The record extraction method characterized by including the record extraction step which extracts the said desired record according to the positional information specified by the record positional information specific method of any one of Claim 9-12.
Computer
Each of the databases consists of a record storing a value for each predetermined data item, and each record of each database is uniquely assigned with a record number in advance, using an index file included in each database. An information processing program that functions as position information specifying means for specifying position information of a desired record,
The index file is generated from each of the databases, and for each data item, obtains the record number from the item value of the data item, and obtains the record number from the order of the sort database,
The information processing program, wherein the sort database is obtained by sorting a virtual integrated database obtained by virtually integrating the plurality of databases using the data items as key items.