WO2004084095A1

WO2004084095A1 - Information retrieving system

Info

Publication number: WO2004084095A1
Application number: PCT/JP2003/003245
Authority: WO
Inventors: Akira Naruse; Kouichi Kumon
Original assignee: Fujitsu Limited
Priority date: 2003-03-18
Filing date: 2003-03-18
Publication date: 2004-09-30
Also published as: WO2004084095A9; JPWO2004084095A1

Abstract

An information retrieving system characterized in that the retrieving operation of a data base being retrieved by a plurality of information processors is accelerated by dividing the data base into sub-data bases each having a memory capacity not larger than that of each information processor upon receiving a retrieval request consisting of retrieval conditions of the data base, dividing the retrieval request for processing in parallel, and then combining the processing results.

Description

TECHNICAL FIELD The present invention relates to an information search system, an information search method, an information search device, an information search program, and a computer-readable recording medium storing the program.

The present invention relates to an information search system, an information search method, an information search device, an information search program, and a computer-readable program that records the program, which are suitable for processing a search request for a database in parallel by a plurality of information processing devices. It relates to a possible recording medium. Background technology ''

In recent years, attention has been focused on how to use various types of information from data stored in a database, rather than just using the database for storing data. This requires the ability to search for desired information at high speed, but in databases, in general, as the amount of stored data increases, the amount of search processing increases rapidly and the processing time tends to increase. is there.

In recent years, it has been known to perform a parallel search using a plurality of computers (workstations) in order to perform such a database search at high speed.

In a computer that actually performs a search process on a database, the database to be searched is sequentially copied from an external storage device such as a hard disk to a memory (main storage device). Search processing is performed.

Generally, an external storage device such as a hard disk in which a database is stored has a low data input / output (I / O), and it takes time to transfer data between the external storage device and the memory. . Therefore, when a search process is performed, the data transfer time between the external storage device and the memory often determines the search time. A typical OS (Operating System) has a function called a disk cache. This disk cache function allows data read from an external storage device to be stored in an unused portion of memory in the near future. If the database to be searched has been processed by another search process in advance, it may not be necessary to transfer data from the external storage device to the memory. Therefore, when a search request continues for the same database, except for the first search request, there is no need to transfer data from the external storage device to the memory. As a result, the search time may be significantly reduced. There is.

However, in order to enjoy the advantage of shortening the search time by the disk cache function, the size of the database is limited to the case where the size of the database is smaller than the memory. This is because if the size of the database is larger than the memory size, the entire contents of the database cannot be stored in memory.

FIG. 12 is a diagram schematically showing a relationship between a database size and a search time in a conventional information search system. As shown in Fig. 12, if the size of the database to be searched becomes larger than the memory size of the computer (computer), even if the same database is searched in advance, the hard disk Access to an external storage device such as the above occurs, and data transfer from the external storage device becomes a bottleneck, and the search time is significantly longer than when the database size is small. '.

Therefore, in the conventional information retrieval system, there is no good way to greatly reduce the access to the external storage device for the retrieval process for the database having a size larger than the memory size of the computer. There is a problem that search time becomes longer.

SUMMARY OF THE INVENTION The present invention has been made in view of the above-described problems, and in a case where a search request for a database is processed in parallel by a plurality of information processing apparatuses, each information processing apparatus processes the search request at high speed. Information retrieval system, information retrieval method, information retrieval device, information retrieval program, and computer readable recording of the information retrieval program, so that the database can be searched at high speed. It is an object to provide a possible recording medium. Disclosure of the invention

In order to achieve the above object, an information search system according to the present invention includes a plurality of information processing apparatuses, and transmits a plurality of search requests including a database to be searched and search conditions for the database. A sub-database creating unit that creates a plurality of sub-databases having a size equal to or less than the capacity of a storage unit provided in the information processing device based on the database; An assignment management unit that assigns the sub-database created by the sub-database creating unit to the information processing device as a sub-search request, which causes the information processing device to process a search request for the sub-database; And a connection unit that acquires and combines the processing results related to search requests.

A sub-search condition creating unit that divides the search condition to create a sub-search condition is provided, and the assignment management unit causes the information processing device to search the sub-database using the sub-search condition. The condition may be assigned to the information processing device as a sub-search request.

In addition, the information processing apparatus is provided with a DB affinity setting section that can set, as a DB affinity, information on a sub-database to be preferentially allocated by the allocation management section, and the allocation management section performs processing based on the DB affinity. Then, a sub search request may be assigned to the information processing device.

Further, the DB affinity setting unit may be able to set in advance a sub-database to be processed preferentially, and the DB affinity setting unit may set the sub-database based on the processing history of the sub-search request in the information processing device. And you may set the DB facility.

The information processing device further includes a free space management unit that manages the free space of the storage unit provided in the information processing device, and the allocation management unit performs a process based on the free space of the storage unit of the information processing device that is managed by the free space management unit. A sub-database smaller than the free space may be allocated to the information processing device.

Furthermore, it is possible to predict the time required for processing related to the sub-search request by the information processing device. The allocation management unit, based on the time required for the processing predicted by the processing time prediction unit, gives priority to the sub search request from the sub search request having the longest time. May be assigned.

Also, the sub-database creating unit or the sub-search condition creating unit may create a plurality of sub-search requests by dividing a sub-search request not yet assigned to the information processing device by the assignment managing unit. .

Further, with respect to the sub search request, the DB affinity for the information processing device set by the DB affinity setting unit, the database as the sub search request, and the time required for the processing predicted by the processing time prediction unit are set. An evaluation unit for evaluating at least one or more using an evaluation function may be provided, and the assignment management unit may randomly assign a sub-search request to the information processing device based on an evaluation result by the evaluation unit.

Further, an information search method of the present invention is an information search method in which a search request including a database to be searched and a search condition for the database is processed in parallel by a plurality of information processing apparatuses. A sub search request creating step for creating a plurality of sub search requests of a size equal to or less than the capacity of the storage unit provided in the information processing device based on the search request; and a sub search request created in the information processing device in the sub search request creating step. In order to process the search request, it is necessary to provide an assignment management step of allocating the sub search request to the information processing device and a combining step of acquiring and combining the processing results of the plurality of information processing devices with respect to the sub search request. Features.

Further, the information search apparatus of the present invention is an information search apparatus that causes a plurality of information processing apparatuses to process a search request including a database to be searched and search conditions for the database in parallel. A sub-search request creating unit that creates a plurality of sub-search requests of a size equal to or smaller than the storage unit provided in the information processing device based on the request; In order to process the sub-search request, an assignment management unit that allocates the sub-search request to the information processing device, and a combining unit that acquires and combines the processing results of the sub-search requests by the plurality of information processing devices are provided. Features.

Further, the information search program of the present invention includes a database to be searched and the database to be searched. An information search program for causing a computer to execute an information search function for causing a plurality of information processing apparatuses to execute a search request including search conditions for a database in parallel with each other. A sub-search request creating unit for creating a plurality of sub-search requests of a size equal to or less than the capacity of the storage unit provided in the storage unit; The computer is caused to function as an assignment management unit for allocating a search request to an information processing device, and a combining unit for acquiring and combining processing results related to a sub-search request by a plurality of information processing devices.

Further, a computer-readable recording medium according to the present invention records the above-described information search program.

As described above, according to the information search system, the information search method, the information search device, the information search program, and the computer-readable recording medium on which the program is recorded, the following effects and advantages are obtained.

(1) Based on the database, create a plurality of sub-databases of a size smaller than the capacity of the storage unit provided in the information processing device, and have the information processing device process a search request for the sub-database. By allocating the sub-database as the sub-search request to the information processing device, the sub-database to be searched is cached in the storage unit of the information processing device that has once processed the sub-search request. As a result, the information processing device does not need to access a hard disk having a slow access speed (disk access) to access the subdatabase, and can perform a high-speed search process for the subdatabase. it can.

(2) By creating a sub-search condition by dividing a search condition, a sub-search request of an appropriate size (the length of the predicted processing time) can be easily created, and the convenience is high.

(3) Information about the sub-database to be preferentially assigned to the information processing device is set as a DB affinity, and a sub-search request (sub-database) is assigned to the information processing device based on the DB activity. This makes it possible to easily assign a sub-search request to the information processing device. (4) By preliminarily setting a sub-database that performs preferential processing as a DB affinity, the DB affinity can be reliably set.

(5) The DB affinity can be easily set by setting the DB affinity based on the processing history of the sub search request in the information processing device.

(6) By managing the free space of the storage unit provided in the information processing device and allocating a sub-database having a size equal to or less than the free space to the information processing device, the storage of the information processing device can be easily and reliably performed. A sub-database having a size equal to or less than the free space of a copy can be allocated to the information processing device.

(7) Predicting the time required for the information processing device to process the sub-search request, and assigning the sub-search request to the information processing device preferentially from the sub-search request having the longer time, so that multiple Requests can be efficiently allocated to multiple information processing devices.

(8) By dividing a sub-search request that has not been assigned to the information processing device and creating a plurality of sub-search requests, the total processing time (search time) of the entire system is shortened. It is possible to reduce the processing speed and to reduce the risk (sub-database read load) caused by allocating jobs whose DB affinity does not match.

(9) For the sub-search request, at least one of the DB affinity, the sub-database, and the estimated time required for the processing is evaluated using an evaluation function, and based on the evaluation result, the information processing apparatus performs the sub-search. By randomly assigning requests, a job can be easily and reliably assigned to each information processing apparatus. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a diagram showing a schematic configuration of an information search system as one embodiment of the present invention.

2 to 4 are diagrams for explaining a method of creating a sub-search request (job) in the information search system as one embodiment of the present invention.

FIG. 5 is a diagram for explaining an example of an uneven job creation method in the information search system as one embodiment of the present invention. FIG. 6 is a diagram for explaining a dynamic job assignment method by the assignment management unit in the information search system as one embodiment of the present invention.

FIG. 7 is a diagram showing an example of a job created by the sub-search request creating unit in the information search system as one embodiment of the present invention.

FIG. 8 is a flowchart for explaining a job allocating method by the allocation managing unit in the information search system as one embodiment of the present invention.

FIG. 9 is a diagram showing an example of a state in which a job is assigned to each PC by the assignment management unit in the information search system as one embodiment of the present invention.

FIG. 10 is a flowchart for explaining another job assignment method by the assignment management unit in the information search system as one embodiment of the present invention.

FIG. 11 is a diagram showing an example of another state in which a job has been assigned to each PC by the assignment management unit in the information search system as one embodiment of the present invention.

FIG. 12 is a diagram schematically showing a relationship between a database size and a search time in a conventional information search system. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

(A) Description of one embodiment

FIG. 1 is a diagram showing a schematic configuration of an information search system as one embodiment of the present invention. The information search system 1 transmits a search request for the database 3 input from the search request input unit 41 to a plurality of (four in this embodiment) PCs (Personal Computers: information processing devices) 2 a, 2 b , 2 c, 2 d are processed in parallel. The search request input unit 41, the search result output unit 42, the management server 20, the database 3, and the PCs 2 a, 2 b, 2 c, 2 d It is configured with.

The database (search target) 3 collects and accumulates some kind of information (data) comprehensively (information group), and has a structure to process addition, deletion, update, and search of information. The search is performed based on the search request input from the search request input unit 41, and information corresponding to the search request can be provided.

The search request input section 41 is for the operator to input a search request. This is for inputting a database to be searched (database 3 in the present embodiment) and desired search conditions for the database. The search request input unit 41 is realized by, for example, a keyboard and a mouse in a computer system.

The search request includes a database to be searched and search conditions for the database. In the present embodiment, only one database 3 is provided, but the invention is not limited to this, and a plurality of databases are provided, and at least one specific database is searched from these databases. It may be targeted, and the operator may arbitrarily select at least one of the databases as a search target.

In the search request input section 41, the operator can input arbitrary search conditions.

The search result output unit 42 obtains a search result for the search request input from the search request input unit 41 from the management server 2 ◦ and outputs it to an operator or the like. The search result output unit 42 is realized by various types of output devices such as a display device and a printer.

PCs 2 a, 2 b, 2 c, and 2 d are information processing devices (computers) that respectively process search requests input from the search request input unit 41. In this embodiment, the PCs 2 a, 2 b, 2 c, and 2 d are sub search requests created by the sub search request creation unit 5 and assigned by the assignment management unit 8 in the management server 20. (Described later), and the processing result is transmitted to the management server 20.

In the following, as the code indicating PC, the codes 2a, 2b, 2c, and 2d are used when it is necessary to specify one of a plurality of PCs, but when any PC is indicated, the code is used. Use 2.

As shown in FIG. 1, the PCs 2 a, 2 b, 2 c, and 2 d include a CPU (Central Processing Unit) 30, a RAM 31, a ROMS 2, a hard disk 33, and a hard disk 33, respectively. It has a communication control section 34. In the drawings, the same reference numerals as those described above indicate the same or substantially the same portions, and thus detailed description thereof will be omitted. The ROM 32 pre-records programs such as BI OS (Basic Input I Output System) for performing basic input and output on the PC 2, and the hard disk 33 stores an OS (Operating System) and Various programs and data for operating the PC 2, such as an application program, are stored. Then, the CPU 30 executes a variety of programs stored in the ROM 32 and the hard disk 33, so that a sub search request (described later) can be processed.

The RAM (storage unit) 31 is for temporarily storing and expanding data and the like when the CPU 30 performs various processes. In the present embodiment, a sub-database as a sub-search request (details) 1S is loaded on this RAM31.

Further, in this information retrieval system 1, for convenience, each of the PCs 2a, 2b, 2c, and 2d has a RAM 31 of the same capacity.

The PC 2 has a disk cache function, and stores (caches) frequently used data and last used data in the RAM 31. Thus, when a read request is issued for the data, the CPU 30 does not need to read the data from the hard disk 33 or the database 3 having a low access speed, and can read the data from the RAM 31. , The processing can be performed at high speed.

In the PC 2, the sub-database once processed is held in the RAM 31 by this disk cache function. When new data is read into the RAM 31, the data cached in the RAM 31 is flushed out of the RAM 31 in a first-in first-out manner, starting with the oldest data.

The communication control unit 34 controls communication of various data between the PC 2 and the outside.For example, the communication control unit 34 receives a sub search request (described later) transmitted from the management server 20, The control for transmitting the processed search result to the management server 20 is performed.

The management server ₂₀ converts the search request input by the search request The processing is performed by the PCs 2 in parallel. Further, the processing results obtained by the PCs 2 are obtained and combined, and the search results are output to the search result output unit 42.

As shown in FIG. 1, the management server 20 includes a sub-search request creating unit 5, an allocation managing unit 8, a DB affinity setting unit 9, a memory managing unit (free space managing unit) 10, a processing time estimating unit 1 1 And a connecting portion 12.

The memory management unit (free space management unit) 10 obtains and holds the size of the RAM 31 of each PC 2 in advance, so that the sub database creation unit 6 described later can store the RAM 31 of each PC 2 You can know the size. The memory management unit 10 manages the usage status of the RAM 31 of each PC 2, and what kind of sub search request is issued to each PC 2 by the allocation management unit 8 (details will be described later). The size of the sub-database cached in the RAM 31 of each PC 2 is managed by managing whether the (sub-database) has been allocated, thereby using the RAM 31 of each PC 2 You can manage the free space (remaining capacity; size of unused area). In this information search system 1, each PC 2 is configured to perform only the processing (sub search request) assigned by the management server 20 (assignment management unit 8). The RAM 31 caches the sub-databases assigned by the assignment management unit 8 respectively. The processing time prediction unit 11 predicts the time required for processing related to the sub-search request by each PC 2, and includes, for example, the size of the sub-database allocated to the PC 2, the contents of the sub-search conditions, and the PC 2 ( Based on information such as the specifications and performance of the CPU 30), the processing time can be predicted. The processing time estimating unit 11 may inquire of the PC 2 about the time required for each processing, and may be implemented in various modifications without departing from the spirit of the present invention. The sub-search request creating unit 5 creates a sub-search request to be processed by the PC 2 based on the search request input from the search request input unit 41. The sub-search request creating unit 5 creates a plurality of sub-search requests (hereinafter, also referred to as jobs) by using two methods, database division and query division. As shown in FIG. 1, the system includes a sub-database creating unit 6 and a sub-search condition creating unit 7.

2 to 4 are diagrams for explaining a method of creating a sub-search request (job) in the information search system 1 as one embodiment of the present invention, and FIG. Fig. 3 is a diagram for explaining a method for creating multiple jobs by query division, and Fig. 4 is a diagram for explaining a method for creating multiple jobs by query division. FIG. 7 is a diagram for explaining a method of creating the job.

Note that in FIGS. 2 to 4, a search request including a database DB 1 (vertical axis in the figure) to be searched and a query (Query) Q [search condition; horizontal axis in the figure] for the database is shown. An example is shown in which multiple jobs (sub search requests) are created by division.

The sub-database creating unit 6 creates a plurality of sub-databases having a size equal to or smaller than the capacity of the RAM 31 provided for each PC 2 based on the database 3. In the example shown in FIG. 2, the sub-database creating unit 6 divides the database DB 1 into the sub-databases SDB 1 and SDB 2 by dividing the database with respect to the search request including the database DB 1 and the query Q. In addition, two jobs are created: a job 1 that searches the sub-database SDB 1 with the query Q and a job 2 that searches the sub-database SDB 2 with the query Q.

The sub-database creating unit 6 stores the database 3 relating to the search request input by the search request input unit 41 based on the information on the size of the RAM 31 of each PC 2 managed by the memory management unit 10. Therefore, a plurality of sub-databases that are smaller than the size of RAM 31 of each PC 2 are created.

For example, when each PC 2 has a RAM 31 of 256 MB, the sub-database creating unit 6 creates a plurality of sub-databases of a size of 256 MB or less.

In general, databases are composed of many independent entries. Therefore, it is easy to divide the database in units of this entry. It is. Further, due to the characteristic that there are many entries, there is also a characteristic that even when the number of PCs 2 is large, it is easy to create a sub-database having a predicted search time equivalent to the number of PCs 2.

The sub-search condition creation unit divides the search condition into a plurality of sub-search conditions having no mutual dependency based on the search condition (query; search request) input from the search request input unit 41. By doing so, a plurality of jobs to be processed by each PC 2 are created. In the example shown in FIG. 3, the sub-search condition creating unit 7 divides the search condition (query) Q into one sub-search condition (query) for a search request including the database DB 1 and the query Q. By subdividing into SQA and SQB, Job A searches the database DB 1 with the sub search condition SQA and Job B searches the database DB 1 with the sub search condition SQB And have created two jobs.

In general, since a plurality of search conditions often depend on each other, it is difficult to divide the search condition (query) as compared with the above-described database division. However, if division is possible, the sub-search conditions after division are often independent of each other. Once division is possible, each sub-search condition is divided into multiple PCs. In many cases, the search results are the same when processing is performed in parallel with, and when the search conditions before the division are processed by a single computer (PC 2). Therefore, in the merging process (described later) of the search results for those sub-search conditions performed by the combining unit 12, the load is relatively light, which is characteristic.

A homology search often used in the field of bioinformatics such as BLAST (Basic Local Alignment Search Tool) and FASTA sometimes uses a completely independent set of search requests as a query. In such a case, it can be said that there is almost no merging load due to the query division in the combining unit 12.

Further, in the present embodiment, the database division by the sub database creation unit 6 and the query division by the sub search condition creation unit 7 can be performed together. While database partitioning is easy to do, The feature is that the load of merging these search results is heavy. Conversely, query division is characterized by the fact that while segmentation is difficult, the load on merging those search results is light.

As described above, it can be said that database partitioning and query partitioning have conflicting features in the ease of database partitioning and the ease of merging search results. When both can be applied, it is desirable to create a job by taking advantage of both division methods. For example, first, a job is created by tally division with a light merge load. After that, a job of the required number and size cannot be created by query division alone. In such a case, it is further divided into multiple jobs by database division ( This makes it possible to take advantage of the features of both partitioning methods.

In the example shown in FIG. 4, the sub-database creating unit 6 divides the database DB 1 into sub-databases SDB 1 and SDB 2 by dividing the database in response to a search request including the database DB 1 and the query Q. The sub-search condition creating unit 7 divides the search condition (query) Q into sub-search conditions (sub-query) SQA and SQB by query division for the search request including the database DB1 and the query Q. are doing.

As a result, the sub-search request creator 5 (sub-database creator 6, sub-search condition creator 7) sends the sub-database SDB 1 to the job 1A and the sub-database SDB 1 that perform the search using the sub-search condition S QA Job 1 B to search with sub search condition SQB, job 2 A to search with sub search condition SQ A for database SDB 2, and sub search condition S QB for sub database SDB 2 Job 4 to be executed 4 jobs of B are created.

Further, in the sub-search request creating section 5, the sub-database creating section 6 and the sub-search condition creating section 7 are used to process jobs of different sizes (unequal) in the estimated processing time (search time) in the PC 2. Is created (uneven job creation). As an example of such an uneven job creation method, for example, when the correlation between the search time and the database entry is very high, the sub-database creation unit 6 creates a plurality of sub-databases having different numbers of entries. Can be realized.

FIG. 5 is a diagram for explaining an example of an uneven job creation method in the information search system 1 as one embodiment of the present invention. In order to create an uneven job, for example, as shown in FIG. 5, for example, first, the same number of jobs as the number of PCs 2 (four in this embodiment) for which the estimated search times are almost equal, Each of these jobs is created by dividing it into three so that the predicted search time is in the ratio of 4: 2: 1. This makes it possible to easily create 12 unequal jobs for four PCs 2. For convenience, in the example shown in FIG. 5, the created jobs are denoted by reference numerals 11, 12, 13, 21, 22, 23, 31, 32, 33, 41, 42 and 43. Is shown.

The DB affinity setting unit 9 can set information on a sub database to be preferentially assigned to the PC 2 by the assignment managing unit 8 as a DB affinity (Data Base Affinity). That is, the DB affinity setting unit 9 can set in advance, for each PC 2, a sub-database to be searched by the PC 2. Although a plurality of sub-databases may be specified as DB affinity for each PC 2, the total size of the sub-databases set as DB affinity for one PC 2 is calculated based on the PC 2 Do not exceed the size of RAM 31 (memory size).

The DB affinity setting unit 9 may set in advance the sub database on which each PC 2 performs processing preferentially by an operator, an administrator, or the like, or the processing history of the sub database in each PC 2. Based on the sub database, the sub database that has been processed in the past may be preferentially set as the DB affinity.

The assignment management unit 8 assigns the sub search request created by the sub search request creation unit 5 (the sub database creation unit 6 and the sub search condition creation unit 7) to the PC 2. In other words, the assignment management unit 8 requests the PC 2 to process the search request for the sub-database by requesting the sub-database and the sub-search condition created by the sub-database creation unit 6 (Job) is assigned to PC 2.

When allocating a job to the PC 2, the allocation management unit 8 By referring to the DB affinity set in the security setting section 9, a sub-database having a matching DB affinity is assigned to the PC 2.

In other words, in the information retrieval system 1, a sub-database in charge of each PC 2 is set in advance as a DB affinity, and the assignment management unit 8 assigns a job in accordance with the DB affinity. Has become. As a result, the sub database to be searched is cached in the RAM 31 of the PC 2 that has once performed the process for the sub search request.

Then, by making the PC 2 actively process the search request related to the sub-database in accordance with the DB affinity, the PC 2 does not need to perform disk access to access the sub-database. Search processing for the active database can be performed at high speed.

Further, the assignment management unit 8 assigns a job to each PC 2 using a dynamic job assignment method. The dynamic job allocation method is a method of literally dynamically allocating a job to each PC 2. In the dynamic job assignment, more jobs than the number of PCs 2 are prepared, and the jobs are assigned to each PC 2 in order from the job with the longest estimated search time, and the processing is completed. This can be realized by selecting a job having a long predicted search time from the remaining jobs to the PC 2 and sequentially assigning the same, and repeating these processes until there are no more jobs. The dynamic job allocation is particularly effective when the processing time prediction unit 11 has low accuracy in predicting the job search time in each PC 2.

FIG. 6 is a diagram for explaining a dynamic job assignment method by the assignment management unit 8 in the information search system 1 as one embodiment of the present invention. FIG. 5) is a diagram showing an example in which is dynamically assigned to four PCs 2a, 2b, 2c, and 2d. In the example shown in FIG. 6, it is assumed that each of the jobs 41, 42, and 43 has twice as long as the prediction search time by the processing time prediction unit 11.

In the example shown in FIG. 6, first, jobs 1 1, 21, 3 1, 41 are assigned to PCs 2a, 2b, 2c, and 2d, respectively. U. After that, the remaining jobs are sent to PCs 2a, 2b, 2 Jobs 1 2, 2 2, and 3 2, which have a long predicted search time from among them, are assigned, and after their processing is completed, jobs 4 2, 1 3 are assigned to PCs 2 a, 2 b, and 2 c, respectively. , 2 and 3 are assigned. After that, jobs 33 and 43 are assigned to the PCs 2b and 2c which have completed the processing, respectively.

As described above, by dynamically allocating jobs to the PC 2, even when the processing time prediction unit 11 does not accurately predict the search time, a plurality of jobs can be processed at high speed.

It should be noted that the dynamic job allocation as described above requires more complicated job management than the static job allocation and has a large job management load. If the prediction accuracy of the job is sufficiently high, the assignment management unit 8 may assign a static job without necessarily assigning a dynamic job.

The static job allocation method is a method of literally allocating a job to each PC 2 statically. For example, when the correlation between the search time and the number of entries in the database is very high, the static job assignment is performed by dividing the database so that the number of entries is equal to each other and creating jobs for the number of PCs 2, By simply assigning those jobs to each PC 2 statically, the load balance between the PCs 2 can be maintained. When the prediction accuracy of the prediction search time is low, it can be said that the dynamic job allocation method can shorten the overall processing time rather than the static job allocation method. For example, when creating jobs for the number of PCs 2 and letting each PC 2 process each of these jobs, the processing of one job took twice as long as the other jobs. In such a case, the search time for the job that took that long time will determine the performance of the entire system 1. Therefore, in such a case, the effect of performing the processing in parallel using a plurality of PCs 2 becomes unclear.

In other words, the static job allocation method is easier to manage jobs than the dynamic job allocation method, and is effective when the processing time (predicted search time) of each job can be predicted with high accuracy in advance. It is.

Further, when determining the job to be assigned to the PC 2, the assignment managing unit 8 may calculate the evaluation value of the job that has not been assigned using the evaluation function and determine the job to be assigned to the PC 2. ,. A simple example of a merit function is Conceivable.

(1) When the sub-database to be searched for the job matches the DB2 facility of PC2

Evaluation value = Estimated job search time

(2) When the search target sub-database of the job does not match the DB2 facility of PC2

Evaluation value = Estimated job search time Z 2

When the evaluation function as described above is used, the allocation management unit 8 selects the job having the highest evaluation value from the jobs that have not been allocated yet, and allocates it to PC2.

The combining unit 12 acquires the processing result (search result) of the job (sub search request) by each PC 2 and combines (merges) it. The search for the search request input from the search request input unit 41 is performed To create the result. The search results combined by the combining unit 12 are transmitted to the search result output unit 42. In the information search system 1 as one embodiment of the present invention configured as described above, when a search request (database to be searched and search conditions) is input by the user through the search request input unit 41, The search request is transmitted to the management server 20.

In the management server 20, the sub-search request creating unit 5 generates a plurality of jobs (sub-search requests) to be processed by the plurality of PCs 2 based on the search request input from the search request input unit 41. Create (sub search request creation step). Specifically, the sub-database creating unit 6 creates a plurality of sub-databases based on the database 3 so as to have a capacity equal to or smaller than the RAM 31 of each PC 2. Further, the sub search condition creating unit 7 creates a sub search request based on the search condition input as a search request from the search request input unit 41 as necessary.

Further, the sub-search request creating unit 5 creates an uneven job based on the estimated processing time by the processing time estimating unit 11.

FIG. 7 is a diagram illustrating an example of a job created by the sub-search request creating unit 5 in the information search system 1 according to an embodiment of the present invention, in which four PCs 2 (2 a, 2 b, 2c, 2d), based on a search request to search a database 3 having a size (for example, 384MB) 1.5 times the memory size (for example, 256MB) of the RAM 31 of each PC 2 This shows an example of creating a plurality of jobs. In the example shown in FIG. 7, two sub-databases SDB 1 and SDB 2 are created by the sub-database creating unit 6 based on the database 3, and these sub-databases SDB 1 and SDB 2 are Each of them is formed to be smaller than the memory size of the RAM 31 of each PC 2.

In the example shown in FIG. 7, the sub-search condition creating unit 7 divides the search request input from the search request input unit 41 and creates four sub-search conditions S QA, SQB, SQC, and SQD. ing.

That is, in the example shown in FIG. 7, eight jobs (sub-search requests) 1A, 1B, 1C, 1D, 2A, 2B, 2C and 2D have been created.

Note that the predicted search times of these jobs predicted by the processing time prediction unit 11 indicate that the jobs IB, 1D, 2B, and 2D require substantially the same predicted search time, and that the jobs 1A, 1C, 2A, and 2C require approximately the same estimated search time, and jobs 1A, 1C, 2A, and 2C have approximately twice the predicted search time as jobs 1B, ID, 2B, and 2D. Search time is required.

In the example shown in FIG. 7, the DB affinity setting unit 9 sets the DB affinity so that the sub database SDB 1 is preferentially assigned to PC 2 a and PC 2 b. It is assumed that the DB affinity is set so that the sub database SDB 2 is preferentially assigned to PC 2 c and PC 2 d.

Then, the assignment management unit 8 sets each of the jobs 1A, 1B, 1C, ID, 2A, 2B, 2C, 2D created by the sub-search request creating unit 5 as described above in the DB affinity setting. According to the DB affinity set in Part 9, each PC 2 is assigned individually (assignment management step). At this time, the assignment management unit 8 assigns each job to the PC 2 using a dynamic job assignment method.

Here, an assignment management unit in the information search system 1 as one embodiment of the present invention 8 will be described with reference to FIG. 9 and the flowchart (steps 10 to 880) shown in FIG. FIG. 9 is a diagram showing an example of a state in which a job has been assigned to each PC 2 by the assignment management unit 8 in the information search system 1 as one embodiment of the present invention.

The assignment management unit 8 determines whether there is any unassigned job (step A 10

), If there is no unassigned job (see NO route in step A10), the processing ends.

If there is an unassigned job (see the YES route in step A10), the assignment management unit 8 determines whether there is a PC 2 waiting for the job to be assigned, that is, processes the job. Determine if any PC 2 is ready (step A 20).

If there is a PC 2 waiting for job assignment (refer to the YE S route in step A20), the assignment management unit 8 refers to the DB affinity setting unit 9 and the DB facility for the PC 2 It is determined whether there is a matching (matching) job (step A60). If there is a job that matches the DB affinity for the PC 2 (refer to the YES route in step A60), the processing time prediction unit 11 refers to the predicted search time to determine the DB affinity. Among the jobs that match the two, the job with the longest predicted search time is assigned to the PC 2 (step A80), and the process proceeds to step A10.

Also, if there is no job that matches the DB affinity for that PC 2,

(Refer to the N route in step A60), the assignment of the job to the PC 2 is completed (step A50), and the process returns to step A20.

On the other hand, if there is no PC 2 waiting for job assignment (see the NO route in step A20), the assignment management unit 8 determines whether there is any PC 2 executing the job (step A). 30). If there is a PC 2 executing a job (see the YES route in step A 30), the assignment management unit 8 waits for the job of the PC 2 to be completed (step A 70). Return to 10. If there is no PC 2 that is executing the job (see the NO route in step A30), the assignment management unit 8 outputs a message indicating an error to the operator of the information search system 1 or the like. Then (step A40), and return to step A10.

As shown in FIG. 9, a job is dynamically allocated to each PC 2 by the job allocation method as described above. In the example shown in FIG. 9, the processing of job 2C takes 1.5 times longer than the predicted search time predicted by the processing time prediction unit 11, and the processing of job 2D is performed. It is assumed that the processing time is twice as long as the prediction search time predicted by the processing time prediction unit 11.

As described above, when the assignment management unit 8 assigns a job to each PC 2, each PC 2 processes the assigned job. That is, each PC 2 searches the sub database based on the sub search condition, and transmits the search result to the management server 20.

In PC 2, since the size of the sub database assigned to each PC 2 is smaller than the size of RAM 31 provided for each PC 2, the sub database is searched for sub-search conditions. When performing the search, the search can be performed by expanding all the sub-databases to be searched on the RAM 31, and the search process can be performed at high speed without generating a disk access or the like.

The search results from each PC 2 are combined (merged) by the combining unit 12 in the management server 20, transmitted to the search result output unit 42 as a search result for the search request, and presented to the operator. .

As described above, according to the information search system 1 as one embodiment of the present invention, the sub-database creation unit 6 (sub-search request creation unit 5) has a capacity S, which is equal to or less than the capacity of the RAM 31 provided for each PC 2. Creates multiple sub-databases of the same size and assigns these sub-databases to each PC 2 so that the PC 2 that has processed the sub-search request once has its RAM 31 Sub-databases are cached. As a result, in p C 2, it is not necessary to access a hard disk having a slow access speed (disk access) to access the sub database, and the retrieval process for the sub database can be performed at high speed.

In particular, a sub-database in charge of each PC 2 is set in advance in the DB affinity setting section 9 as a DB affinity, and the assignment management section 8 complies with the DB affinity. In this case, the assignment management unit 8 can easily assign the job (sub database) to the PC 2, and the PC 2 that has once processed the sub search request has its RAM 31 Then, the sub database to be searched is cached, and the PC 2 does not need to perform a disk access to access the sub database, so that the search process for the sub database can be performed at high speed.

In addition, the sub-search condition creating unit 7 divides the search condition input from the search request input unit 41 to create a sub-search condition, so that an appropriate (arbitrary) size (predicted search time length) is obtained. The sub search request can be easily created, and the convenience is high. Since the processing time prediction unit 11 predicts the predicted search time for each job, the allocation management unit 8 can easily perform dynamic job allocation and is highly convenient.

DB affinity setting unit 9 The sub-database that each PC 2 processes with priority is set in advance by the operator or administrator, so that the DB affinity can be set reliably. On the other hand, based on the processing history of the sub-database in each PC 2, the sub-database that has been processed in the past is preferentially set as the DB affinity, so that the DB affinity can be easily set. The memory management unit 10 manages the free space of the RAM 31 provided in each PC 2 and allocates a sub-database smaller than the free space to each PC 2 to easily and reliably. A sub-database smaller than the free space of RAM 31 of PC 2 can be allocated to PC 2.

The processing time prediction unit 11 predicts the time (predicted search time) required for processing of each sub search request by each PC 2, and the allocation management unit 8 preferentially sub-processes the sub search request having the long predicted search time. By allocating the search request to the PC 2, a plurality of jobs can be efficiently allocated to the plurality of PCs 2.

(B) Other

The present invention is not limited to the above-described embodiment, and can be variously modified and implemented without departing from the spirit of the present invention.

For example, when the DB affinity between the PC 2 and the sub-database does not match, the assignment management unit 8 determines whether the? 〇 2 of 1] \ [31 remaining capacity or less A job with a data size of PC 2 may be assigned to PC 2, and the sub-search request creating unit 5 may assign a job to PC 2 so that the data size is within the remaining capacity of RAM 31 of PC 2. Unassigned (unallocated) jobs (subdatabases) may be further divided into less than the remaining capacity and assigned to PC 2. That is, in the present information search system 1, the allocation management unit 8

From 10, the usage status or free space (size of unused area) of RAM 31 of each PC 2 is obtained, and according to the free space, unallocated jobs are further added to the sub search request creation unit. According to 5, the job may be divided into a plurality of jobs (sub-databases) less than the free space, and the job created by the division may be assigned to the PCs 2.

FIG. 10 is a flow chart (step B 1) showing another job allocation method by the allocation management unit 8 in the information search system 1 as one embodiment of the present invention, with reference to FIG. 11. 0 to B110). FIG. 11 is a diagram showing an example of another state in which a job is assigned to each PC 2 by the assignment managing unit 8 in the information search system 1 as one embodiment of the present invention.

The allocation management unit 8 determines whether there is an unallocated job (step B10). If there is no unallocated job (see the N route in step B10), the processing ends. .

If there is an unassigned job (see the YES route in step B10), the assignment managing unit 8 next determines that there is a PC 2 waiting for the job to be assigned. It is determined whether or not there is any PC 2 in a state where it can process (step B 20).

If there is no PC 2 waiting for job assignment (see the NO route in step B.20), the assignment management unit 8 next determines whether there is any PC 2 that is executing the job. Judge (step B30). If there is a PC 2 executing a job (see the YES route in step B 30), the assignment management unit 8 waits for the job of the PC 2 to be completed (step B 50), Return to 10. If there is no PC 2 executing the job (see the NO route in step B 30), the assignment management unit 8 sends a message indicating an error to the operator of the information search system 1. Output (Step B 40), and return to step B 10.

If there is a PC 2 waiting for job assignment (refer to the YE S route in step B20), the assignment management unit 8 refers to the DB facility setting unit 9 to check the DB 2 Then, it is determined whether or not there is a job that matches (matches) (Step B60). If there is a job that matches the DB affinity for the PC 2 (see the YES route in step B60), the assignment management unit 8 refers to the predicted search time by the processing time prediction unit 11 Then, among the jobs suitable for the PC 2, the job having the longest predicted search time is assigned to the PC 2 (step B80), and the process proceeds to step B10.

Also, if there is no job whose DB affinity matches the PC 2

(Refer to the NO route in step B60), the allocation management unit 8 refers to the memory management unit 10, checks the remaining capacity of the RAM 31 of the PC 2, and checks the job size of the data within the remaining capacity of the PC 2. It is determined whether or not there is, that is, whether or not there is a job whose sub-database size is within the remaining capacity of the PC 2 (step B70). If there is a job with a data size within the remaining capacity of the PC 2 (see the YES route in step B70), the assignment management unit 8 proceeds to step B80. In other words, even if the DB affinity does not match, the assignment management unit 8 assigns the job with the longest predicted search time among the jobs whose data size is within the remaining capacity of the RAM 31 of the PC 2 to that PC. It allocates to 2 (step B80) and moves to step B10.

If there is no job in which the size of the sub database is less than the remaining capacity of the RAM 41 of the PC 2 (see the NO route in step B70), the assignment management unit 8 sets the unassigned job to the PC 2. For example, if there is a job that can be created so that the RAM of PC 2 is less than or equal to the remaining capacity of RAM3, that is, if there is a job that can further divide its sub-database for unassigned jobs (Step B 90).

If there is a job that can further divide the sub-database (see YES route in Step B90), the assignment management unit 8 sends the sub-database of the job to the RAM3 It will be less than 1 remaining capacity Then, the process is further divided (step B1 10), and the process proceeds to step B10.

If there is no job that can further divide the sub-database (see NO route in step B90), the job assignment to the PC 2 is terminated (step B100), and the process proceeds to step B20. .

According to the above-described job assignment method, a job is dynamically assigned to each PC 2 as shown in FIG. In the example shown in FIG. 11, the processing of job 2A takes 1.5 times longer than the predicted search time predicted by the processing time prediction unit 11, and the processing of job 2C takes It is assumed that it takes twice as long as the prediction search time predicted by the processing time prediction unit 11.

In FIG. 11, both jobs 2D-1 and 2D-2 are created by subdividing job 2D. In the example shown in FIG. 11, the job 2D— 1 and 2D— 2 do not have the same DB affinity as the PCs 2 a and 2 b. 1, 2 Sub-database for D-2 SDB 2 is not cached. Therefore, in order for PCs 2a and 2b to process jobs 2D-1 and 2D-2, PCs 2a and 2b store the sub-database SDB 2 for jobs 2D-1 and 2D-2 in database 3 The job 2D-1 and 2D-2 must be re-divided and created by the sub-search request creating unit 5 so as to reduce the read load. As a result, the processing time (search) of the entire system is shortened, the processing speed can be shortened as a result, and the risk of allocating jobs that do not conform to the DB affinity (sub database read load) ) Can be reduced.

In addition, since a job whose DB affinity does not conform is allocated with a size smaller than the remaining capacity of the RAM 31 of the PCs 2a and 2b, the sub-database S DB 1 is stored in the RAM 31 of the PCs 2a and 2b. Are cached continuously, and the search speed does not decrease when PCs 2a and 2b are again processed for the sub-database SDB 1.

Further, for the job (sub search request), the DB affinity, the sub database, and the predicted search time predicted by the processing time prediction unit 11 for each PC 2 set by the DB affinity setting unit 9 Forecasted search throughput) It is also possible to provide an evaluation unit for evaluating one or more of the PCs 2 using the evaluation function as described above, and the assignment management unit 8 may randomly assign a job to each PC 2 based on the evaluation result by this evaluation unit. . Thereby, the assignment management unit 8 can easily and reliably assign a job to each PC 2.

In the above-described embodiment, for the sake of convenience, each PC 2 has the same capacity R

Although the AM 31 is provided, the present invention is not limited to this.Each PC 2 may have a RAM 31 of a different size, and various modifications may be made without departing from the spirit of the present invention. Can be implemented.

Furthermore, the management server 20 is realized by, for example, a computer (information processing device) having a server function, and the CPU of the computer executes an information search program to thereby execute the above-described sub-search request creating unit 5 and sub-database. Creation section 6, Sub search condition creation section 7, Assignment management section 8, DB facility setting section 9, Memory management section 10, Processing time prediction section 11, Combining section 12, and Evaluation section It is working.

It should be noted that the sub-search request creator 5, the sub-database creator 6, the sub-search condition creator 7, the assignment manager 8, the DB affinity setting unit 9, the memory manager 10, the processing time estimator 11 and the combiner 1 2. The programs (information retrieval program) for realizing the function as the evaluation unit are, for example, flexible disk, CD-ROM, CD-R, CD-R / W, DVD, DVD-R, DVD-R / It is provided in a form recorded on a computer-readable recording medium such as a W, magnetic disk, optical disk, or magneto-optical disk. Then, the computer reads the program from the recording medium, transfers the program to an internal storage device or an external storage device, stores the program, and uses the program. Alternatively, the program may be recorded on a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided to the computer from the storage device via a communication path. .

Sub search request creator 5, Sub database creator 6, Sub search condition creator 7, Assignment manager 8, DB abundity setting unit 9, Memory management unit 10, Processing time prediction unit 11, Combining unit 12, When realizing the function as the evaluation unit, the program stored in the internal storage device (RAM or ROM of the printer in this embodiment) is compiled. Executed by your microprocessor (CPU). At this time, the computer may read and execute the program recorded on the recording medium.

In this embodiment, the computer is a concept including hardware and an operating system, and means hardware that operates under the control of the operating system. In the case where the application program alone operates hardware without an operating system, the hardware itself corresponds to a computer. The hardware includes at least a microprocessor such as a CPU and means for reading a computer program recorded on a recording medium. In the present embodiment, the management server 20 has a function as a computer. It is doing.

Further, as the recording medium in the present embodiment, the above-mentioned flexible disk, CD-ROM, CD-R, CD-R / W, DVD, DVD-R, DVD-R / W, magnetic disk, optical disk, magneto-optical In addition to disks, IC cards, ROM cartridges, magnetic tapes, punch cards, internal storage devices of computers (memory such as RAM and ROM), external storage devices, and printed materials on which codes such as bar codes are printed Various computer-readable media can be used. If each embodiment of the present invention is disclosed, it can be manufactured by those skilled in the art. Industrial applicability

As described above, the information retrieval system, the information retrieval method, the information retrieval device, the information retrieval program, and the computer-readable recording medium recording the program according to the present invention include a plurality of information processing devices, It is useful for a plurality of information processing devices to process search requests consisting of a database and search conditions for this database in parallel.Especially, each information processing device processes search requests at high speed. It is suitable for performing high-speed searches against databases.

Claims

The scope of the claims

1. With a plurality of information processing devices (2), a search request including a database (3) to be searched and search conditions for the database (3) is processed in parallel by the plurality of information processing devices (2). A sub-database creating unit (2) for creating, based on the database (3), a plurality of sub-databases having a size equal to or less than the capacity of a storage unit provided in the information processing device (2). 6) and

In order for the information processing apparatus (2) to process the search request for the sub-database, the sub-database created by the sub-database creation unit (6) is sent to the information processing apparatus (2) as a sub-search request. Assignment management unit (8)

An information retrieval system, comprising: a coupling unit (12) for acquiring and coupling processing results relating to the sub-search request by the plurality of information processing devices (2).

2. A sub-search condition creating section (7) for dividing the search condition to create a sub-search condition is provided,

The allocation management unit (8) causes the information processing device (2) to search the sub database using the sub search condition. The information processing device (2) uses the sub search condition as the sub search request. 2. The information retrieval system according to claim 1, wherein the information retrieval system is assigned to the information retrieval system.

3. A DB affinity setting unit (9) capable of setting, as a DB affinity, information on the sub database to be preferentially assigned to the information processing device (2) by the assignment management unit (8). With

The information according to claim 1 or 2, wherein the allocation management unit (8) allocates the sub search request to the information processing device (2) based on the DB affinity. Search system.

4. The DB affinity setting unit (9) performs the priority 4. The information retrieval system according to claim 3, wherein a source can be set in advance.

5. The DB affinity setting section (9) i, wherein the D affinity is set based on a processing history of the sub search request in the information processing device (2). Information retrieval system described in the third section of the scope.

6. A free space management unit (10) for managing the free space of the storage unit provided in the information processing device (2),

Based on the free space in the storage unit of the information processing device (2) managed by the allocation management unit (8) and the free space management unit (10), The information retrieval system according to any one of claims 1 to 5, wherein the information retrieval system is assigned to an information processing device (2).

7. It has a processing time prediction unit (1 1) that can predict the time required for processing of the sub search request by the information processing device (2).

The allocation management unit (8) based on the time required for the processing predicted by the processing time prediction unit (11), the sub search request having the longer time is preferentially assigned to the sub search request. The information retrieval system according to any one of claims 1 to 6, wherein the information retrieval system is assigned to the information processing device (2).

8. The sub-database creating unit (6) or the sub-search condition creating unit (7) is not assigned to the information processing device (2) by the assignment managing unit (8). 8. The information search system according to claim 7, wherein the search request is divided to create a plurality of sub search requests.

9. Regarding the sub search request, the DB affinity for the information processing device (2) set by the DB affinity setting unit (9), the sub-database and the processing time prediction unit (11) Of the time required for the processing predicted by An evaluation unit for evaluating at least one or more using an evaluation function is provided. The sub-search request is sent to the information processing device (2) based on the evaluation result by the allocation management unit (8). The information retrieval system according to any one of claims 1 to 9, wherein the information is assigned by lot.

10. An information search method in which a search request including a database to be searched (3) and search conditions for the database (3) is processed in parallel by a plurality of information processing devices (2),

A sub-search request creating step of creating a plurality of sub-search requests having a size equal to or smaller than the capacity of the storage unit provided in the information processing device (2) based on the search request; An assignment management step of allocating the sub search request to the information processing device (2) to process the sub search request created in the sub search request creation step;

An information retrieval method, comprising a step of acquiring and combining processing results of the plurality of information processing devices (2) with respect to the sub-search request.

1 1. An information retrieval device that causes a plurality of information processing devices (2) to process a retrieval request including a database (3) to be retrieved and a retrieval condition for the database (3) in parallel.

A sub-search request creating unit that creates a plurality of sub-search requests having a size equal to or less than the capacity of a storage unit provided in the information processing device (2) based on the search request;

In order to cause the information processing device (2) to process the sub search request created by the sub search request creating unit, an assignment management unit (2) that assigns the sub search request to the information processing device (2) 8) and

An information retrieval apparatus, comprising: a coupling unit (12) for acquiring and coupling processing results relating to the sub-search request by the plurality of information processing apparatuses (2).

1 2. A search request consisting of a database (3) to be searched and search conditions for the database (3) is processed in parallel for a plurality of information processing devices (2). An information retrieval program for causing a computer to execute an information retrieval function

A sub-search request creating unit that creates, based on the search request, a plurality of sub-search requests having a size equal to or less than the capacity of the storage unit provided in the information processing device (2);

An information retrieval program characterized by causing the computer to function as a coupling unit (12) that acquires and combines processing results of the plurality of information processing devices (2) with respect to the sub-search request.

1 3. Causes a computer to execute an information search function that allows multiple information processing devices (2) to process a search request consisting of a database (3) to be searched and search conditions for the database (3) in parallel. A computer-readable recording medium recording an information retrieval program for

The information search program is:

A computer-readable recording device having an information retrieval program recorded thereon, wherein the computer is used as a coupling unit (12) for acquiring and coupling the processing results of the plurality of information processing devices (2) with respect to the sub-search request. Possible recording medium.