WO2011118425A1 - Query optimization system, query optimization device, and query optimization method - Google Patents

Query optimization system, query optimization device, and query optimization method Download PDF

Info

Publication number
WO2011118425A1
WO2011118425A1 PCT/JP2011/055887 JP2011055887W WO2011118425A1 WO 2011118425 A1 WO2011118425 A1 WO 2011118425A1 JP 2011055887 W JP2011055887 W JP 2011055887W WO 2011118425 A1 WO2011118425 A1 WO 2011118425A1
Authority
WO
WIPO (PCT)
Prior art keywords
site
list
temporary
query
processing system
Prior art date
Application number
PCT/JP2011/055887
Other languages
French (fr)
Japanese (ja)
Inventor
伸治 菊地
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2012506941A priority Critical patent/JPWO2011118425A1/en
Publication of WO2011118425A1 publication Critical patent/WO2011118425A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results

Definitions

  • the present invention relates to a query optimization system, and more particularly to a query optimization system that evaluates an arrangement method of intermediate results of a query.
  • the following documents disclose typical existing techniques related to a query optimization method (query optimization system) that evaluates an arrangement method of intermediate results of a query (query).
  • Non-Patent Document 1 D. Kossmann, K.M. Stocker, “Iterative Dynamic Programming: A New Class of Queries Optimization Algorithms”, ACM Transactions on Database Systems, Vol. 25, no. 1, March 2000.
  • Non-Patent Document 2 E. J. et al. Shekita, H .; C. Young, K .; L. Tan, “Multi-Join Optimization for Symmetric Multiprocessors”, Proceedings of the 19th VLDB Conference, 1993.
  • Non-Patent Document 1> refers to a conventional query optimization method for evaluating a query intermediate result placement method, but does not explicitly refer to the intermediate result placement method. .
  • Non-Patent Document 2> specifically refers to writing intermediate results to a storage (storage: external storage device) or the like by a query optimization method.
  • storage storage: external storage device
  • this assumes functional modules in a single system, and does not deal with matters premised on wide-ranging computer resource interchange, procurement, and virtualization, which are the current issues in 2009.
  • ⁇ Patent Document 1> describes an existing technique based on the premise of external placement of intermediate results, not a technique related to query optimization.
  • ⁇ Conventional query optimization method> the query optimization method of the prior art is extracted from ⁇ Non-Patent Document 1> and shown as a specific procedure.
  • Step 1 the specified query is parsed and the set of tables ⁇ R 1 , R 2 ,. . . , R n ⁇ .
  • Step 3 one of the optimal access conditions is extracted from the temporary connection element list for each element table R m implemented in (Step 2.1) in (Step 3.1) under the subordinate (Step 3.1).
  • Step 4 the temporary combined element list is deleted.
  • Step 5 the initial version of the combined element list ⁇ R 1 o , R 2 o , R 3 o,. . . , R n o ⁇ and add it to the optimization plan element list.
  • Step 6 the subordinate processes are repeated until the number of elements in the combined element list becomes 1.
  • Step 6.1 the maximum block size k is set to the smaller of the specified value or the number of elements in the combined element list. Subsequently, in (Step 6.2), the number of bonds is specified as 2. Subsequently, in (Step .6.3), the subordinate processing is repeatedly performed as long as the condition of designated number of connections ⁇ maximum number of blocks k is satisfied.
  • Step .6.3.1 all scenes constituting combinations for the designated number of connections are displayed in the current version ⁇ R 1 o , R 2 o , R 3 o ,. . . , R n o ⁇ to create and add a temporary list. Subsequently, in (Step 6.3.2), the subordinate processing is repeated for all the provisional list elements.
  • Step .6.3.2 [(Step .6.3.2) Subordinate process] First, in (Step 6.3.2.1), one element is extracted from the temporary list. Subsequently, in (Step .6.3.3.2), the subordinate processes are repeated until the temporary list parts 1 and 2 can be defined.
  • Step 6.3.2.2 [Process under (Step 6.3.2.2)] First, in (Step 6.3.3.2.2.1), the element is divided into two, and two combinations of the elements ⁇ R 1 o ⁇ are created, and ⁇ temporary list part 1 ⁇ ⁇ temporary list part 2 ⁇ . Subsequently, in (Step 6.3.3.2.2.2), a combination of ⁇ temporary list part 1 ⁇ ⁇ temporary list part 2 ⁇ is added to the temporary operation list. Subsequently, in (Step .6.3.2.2.2.3), a combination of ⁇ temporary list part 2 ⁇ ⁇ temporary list part 1 ⁇ is added to the temporary operation list.
  • Step .6.3 Continuation of subordinate process 2
  • Step .6.3.5 After (Step .6.3.4) is finished, in (Step .6.3.5) under (Step .6.3), one optimal item is extracted from the optimization plan list and optimized. Add to the plan element list.
  • Step .6.3.6 under (Step .6.3)
  • non-optimal optimization plan lists are organized and held as a library.
  • Step .6.3.7 under (Step .6.3)
  • the temporary list and the temporary operation list are cleared (erased).
  • Step .6.3.8 under (Step .6.3)
  • 1 is added to the number of bonds (number of bonds ⁇ (number of bonds + 1)).
  • connection elements constituting the connection tree ⁇ T k o ⁇ are changed to the current version ⁇ R 1 o , R 2 o , R 3 o of the connection element list. ,. . . , R n o ⁇ .
  • Step 8 one optimization plan is selected.
  • Step 9 the current version of the combined element list ⁇ R 1 o , R 2 o , R 3 o,. . . , R n o ⁇ to re-express the optimization plan.
  • This optimization method largely combines two algorithms.
  • One is “Dynamic Programming Algorithm”. Specifically, the series of areas in (Step .6.3) corresponds. The other is the “greedy method (greedy method) algorithm”, which corresponds to (step 6.4).
  • greyedy method greyedy method
  • step 6.4 a size range of a combined area called a maximum block size is specified, and a combination operation is initially performed unless the maximum block size is exceeded by a dynamic programming algorithm.
  • the method with the lowest cost is selected from the selected calculation result by the algorithm of the greedy method (greedy method), and it is replaced as an intermediate generation result. By repeating the above procedure a plurality of times, the method with the lowest overall cost is selected.
  • FIG. 1 schematically shows a setup for actually executing query optimization processing by the optimization method.
  • the relationship list to be used is composed of five relationships.
  • the relation R 1 , the relation R 2 , the relation R 3 , the relation R 4 , and the relation R 5 are optimized as a result of (Step 5), and the relation R 1 o and the relation R 2 o are obtained.
  • a connection element list 3007 including the relationship R 3 o , the relationship R 4 o , and the relationship R 5 o is created.
  • an initial result 3000 is generated.
  • a join operation of two relations is performed.
  • the symbol of the join operation is denoted by 3006. This corresponds to the process up to (Step .6.3) in the optimization method.
  • a result 3001 of the first-stage two-table join operation including the join candidates 3008, 3009, and 3010 is created.
  • connection candidate 3012 is replaced with a connection tree ⁇ T k1 o ⁇ : 3014. Thereafter, the process returns to (Step 6) and returns to the same state as the initial stage for the second stage processing.
  • the connection tree ⁇ T k1 o ⁇ : 3014, the relationship R 3 o : 3015, and the relationship R 5 o : 3016 remain in the connection element list 3007, and are defined as the result 3003.
  • the join operation of the two relations in the second stage is performed. This corresponds to the process up to (Step .6.3) in the optimization method. As a result, a result 3004 of the second-stage two-table join operation including the join candidates 3017, 3018, and 3019 is created.
  • connection tree 3021 is reconstructed with the element relation R 1 o , the relation R 2 o , the relation R 3 o , the relation R 4 o , and the relation R 5 o of the original joining element list 3007 by (Step 9). Then, a desired optimization plan is created. Thereby, a result 3005 is generated.
  • ⁇ Patent Document 1> relates to a method for arranging replicas to be created, not a method related to query optimization.
  • 2A and 2B show the device configuration based on the description in ⁇ Patent Document 1>.
  • the device configuration largely includes three functional layers.
  • the first functional layer 4100 is a functional layer including application program-1: 4007 and application program-L: 4008.
  • the second functional layer 4200 is a functional layer in which a database is accessed and the data collection unit 4003 is arranged.
  • the third functional layer 4300 is a functional layer including resource management / transaction control units 4011 and 4012, distributed storage 4004, and the like.
  • Each functional layer accesses each other via various networks 4001 and 4002, such as LAN (Local Area Network) / WAN (Wide Area Network) / Internet.
  • networks 4001 and 4002 such as LAN (Local Area Network) / WAN (Wide Area Network) / Internet.
  • Application program-1: 4007 includes a database access client unit-1: 4009.
  • the application program-L: 4008 includes a database access client unit-L: 4010.
  • the database access client unit-1: 4009 and the database access client unit-L: 4010 request and transmit an execution query such as SQL (Structured Query Language) to the data collection unit 4003.
  • SQL Structured Query Language
  • the functional layer including the above-described resource management / transaction control unit 4011 and the above-described distributed storage 4004 is currently described on the assumption of two sites, but is not limited to this.
  • Each site includes resource management / transaction control units 4011 and 4012, respectively.
  • the distributed storage 4004 In the site having the resource management / transaction control unit 4011, the distributed storage 4004 is associated and arranged.
  • the distributed storage 4004 stores a database table area, index area, log (log), journal, and the like.
  • the number of distributed storages is not limited to one and may be two or more. That is, at least one distributed storage exists.
  • Database tablespaces of the distributed storage in 4004 the index area, the table Table corresponding to the relationship R 11. 11: 4014, Table Table corresponding to the relationship R 12. 12: 4015, table Table.corresponding to relation R 1 n . 1n: 4016, table 4018 of intermediate results is stored.
  • an index 4017, a log, and a journal 4019 are stored.
  • the site having the resource management / transaction control unit 4012 corresponds to a replica of the site having the resource management / transaction control unit 4011. Therefore, the same configuration as that of the site having the resource management / transaction control unit 4011 described above is adopted.
  • another distributed storage 4006 is arranged to arrange a partial replica.
  • a resource management / transaction control unit 4013 is arranged and has almost the same configuration as the site having the resource management / transaction control unit 4011 described above.
  • the data collection unit 4003 receives request messages s4030 to the resource management / transaction control units 4011 and 4012 based on the inquiry messages s4029 and s4035 from the database access client units 1 and 4009 and the database access client unit -L: 4010. As a result, it is transferred as s4031, and as a result, the data retrieved and acquired in the distributed storage 4004 and similarly in the distributed storage 4005 are received as response messages s4033 and s4032.
  • the data collection unit 4003 groups a group of queries including a plurality of conditions specified in a plurality of query messages s4029 and s4035, and obtains a query content and a set of conditions for each condition.
  • the data collection unit 4003 determines a condition based on the overlapping range of the condition.
  • the data collection unit 4003 sets an evaluation index related to the validity level according to the number of times of duplication of inquiries.
  • the data collection unit 4003 places a partial replica in the distributed storage 4006 of the resource management / transaction control unit 4013 according to the evaluation index related to the effectiveness.
  • the distributed storage that places the intermediate results of the query will also be widely distributed in the network environment. become.
  • ⁇ Patent Document 1> includes a process for grouping a group of queries and determining a query content and a set of conditions for each condition, determining a condition based on the overlapping range of the condition, and further depending on the number of times the query is duplicated It is disclosed that an evaluation index called “effectiveness” is set, and those having a certain degree of effectiveness are arranged as partial replicas. Although partial replicas are different from intermediate results, placement locations are not evaluated based on reasonable judgment among multiple options. As a result, it is not possible to meet the above problem.
  • Non-Patent Document 1 discloses an overview of the current query optimization method.
  • no proposal has been made on the premise of the placement of intermediate results. Therefore, it does not deal with query optimization based on wide-area computer resource interchange, procurement, and virtualization, which is a current issue in fiscal 2009, and cannot respond to the above-mentioned issues.
  • Non-Patent Document 2> discloses an outline of a query optimization method for writing intermediate results to a storage or the like. However, it is not considered in which distributed storage site the intermediate results of the query generated as a result of optimization should be placed based on reasonable judgment, and as a result, query optimization is performed at the implementation stage. There is a problem of not.
  • the object of the present invention is to optimize the query in the database management system, and in the environment using a large-scale virtualized storage, the intermediate result generated in the course of query optimization is more rational. To optimize the query even at the implementation stage. Further, as a result, the agreed service level decided by the user of the database access client is satisfied.
  • the present invention provides a query optimization system that evaluates an arrangement method of intermediate results of queries in a database management system.
  • this query optimization system numerical information regarding latency and reliability regarding a site holding data is extracted.
  • the latency is a delay time from when a request such as a data transfer request is issued until the request result is returned.
  • the extended cost value is obtained by superimposing the numerical information related to the latency and the numerical information related to the reliability on the cost value of the query.
  • a site having a designated value that is appropriate is selected at the top when ordered by the extended cost value.
  • prefetching is performed by creating a copy of the processing system itself in which some parameters in the algorithm are changed.
  • intermediate results are arranged in one or more sites.
  • the “query optimization method (query optimization system) for evaluating an arrangement method of intermediate results of a query” is based on the device configuration as shown in FIGS. 3A and 3B, and the processing is performed in the environment. .
  • the device configuration largely includes one or more networks and at least three functional layers.
  • various networks 1001 and 1002 such as LAN / WAN / Internet are defined as different ones, but can be considered to be the same as these. Moreover, there is no reason to distinguish here about the kind.
  • the various networks 1001 and 1002 include relay devices such as routers and switches.
  • the first functional layer 1100 is a functional layer including application program-1: 1013 and application program-L: 1014.
  • the second functional layer 1200 is a functional layer including a database access unit-1: 1101 and a database access unit-M: 1012.
  • the third functional layer 1300 is a functional layer including resource management / transaction control units 1015, 1016, 1017, 1018, a distributed storage 1003, and the like.
  • Each functional layer accesses each other via various networks 1001 and 1002 such as the aforementioned LAN / WAN / Internet.
  • each functional layer a computer such as a PC (personal computer), an appliance, a workstation, a mainframe, and a supercomputer is assumed. Further, not limited to computers, relay devices such as routers and switches, intermediate devices such as firewalls and bandwidth control devices, other communication devices, electronic devices, dedicated devices, and the like may be used.
  • the above-described computer or the like is realized by a processor that is driven based on a program and executes predetermined processing, a memory that stores the program and various data, and an interface used for communication with a network. .
  • a CPU Central Processing Unit
  • a microprocessor a microcontroller
  • a semiconductor integrated circuit Integrated Circuit (IC) having a dedicated function
  • semiconductor storage devices such as RAM (Random Access Memory), ROM (Read Only Memory), EEPROM (Electrically Erasable and Programmable Read Only Memory), and HDD Memory (SDHidK)
  • RAM Random Access Memory
  • ROM Read Only Memory
  • EEPROM Electrical Erasable and Programmable Read Only Memory
  • HDD Memory HDD Memory
  • An auxiliary storage device such as State Drive
  • a removable disk such as a DVD (Digital Versatile Disk)
  • a storage medium such as an SD memory card (Secure Digital memory card), or the like is conceivable.
  • a register may also be used.
  • processor and the memory may be integrated.
  • a single chip such as a microcomputer has been developed. Therefore, a case where a one-chip microcomputer mounted on a computer includes a processor and a memory can be considered.
  • Examples of the above interfaces include semiconductor integrated circuits such as boards (motherboards and I / O boards) and chips that support network communication, network adapters such as NIC (Network Interface Card), and communication devices such as expansion cards and antennas.
  • NIC Network Interface Card
  • a communication port such as a connection port (connector) is conceivable.
  • networks include the Internet, LAN (Local Area Network), wireless LAN (Wireless LAN), WAN (Wide Area Network), backbone (Backbone), cable TV (CATV) line, fixed telephone network, mobile phone network, WiMAX (IEEE 802.16a), 3G (3rd Generation), dedicated line (lease line), IrDA (Infrared Data Association), Bluetooth (registered trademark), serial communication line, data bus, and the like are conceivable.
  • each unit (internal configuration) in each functional layer is not limited to a dedicated device, but may be a module or a component.
  • Application program-1: 1013 includes a database access client unit-1: 1060.
  • the database access client unit-1: 1060 requests and transmits an execution query such as SQL to the database access unit-1: 1101.
  • the database access unit-1: 1011 includes a query reception unit 1048, a query analysis unit 1049, a dictionary access unit 1050, an optimization unit 1051, a process execution unit-1: 1053, and a process execution unit-N: 1054.
  • the query reception unit 1048 receives a request execution query from the database access client unit-1: 1060.
  • the dictionary access unit 1050 accesses a dictionary that manages definition information to be included in the request execution query.
  • the query analysis unit 1049 analyzes the request execution query based on the information of the dictionary access unit 1050.
  • the optimization unit 1051 performs optimization of the analyzed request execution query using the “query optimization method for evaluating the arrangement method of intermediate results of queries” according to the present invention.
  • the process execution unit-1: 1053 and the process execution unit-N: 1054 receive the result of the optimization unit 1051, decompose the request execution query, and execute individual execution units.
  • the database access unit-1: 1101 includes a statistical information management unit agent management unit 1052, a transaction management control unit 1055, and a log and journal 1056.
  • the transaction management control unit 1055 executes a system transaction in which the process execution unit-1: 1053 and the process execution unit-N: 1054 generate and manage intermediate results.
  • Log, journal 1056 manages the log and journal of the system transaction.
  • the statistical information management unit agent management unit 1052 collects the results of executing the actual request execution query elements and reflects them in the processing of the optimization unit 1051 described above.
  • the above-described query receiving unit 1048 holds a designated query 1057 that is a request execution query that is actually requested.
  • the above-described process execution unit-1: 1053 and the above-described process execution unit-N: 1054 include 1058 and 1059 which are decomposed execution units.
  • Each site includes resource management / transaction control units 1015, 1016, 1017, and 1018, respectively.
  • the distributed storage 1003 and the distributed storage 1004 are arranged in association with each other.
  • the distributed storage 1003 has a database table area and an index area.
  • Another distributed storage 1004 stores Logs, journals, and the like.
  • the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
  • the site having the resource management / transaction control unit 1015 also includes a performance / error information / statistical information service 1019.
  • the above-mentioned distributed storage 1004 includes a log, journal 1036, performance information, error information 1037, and statistical information 1038 for each object.
  • the performance information and error information 1037 manages general performance information and error information. As information management methods, information integration / listing / grouping / databaseing, etc. can be considered. However, actually, it is not limited to these examples.
  • the statistical information 1038 for each object is stored in the table Table. 1: 1023 and the table Table. 2: 1024 and the above table Table. n: 1025 and the statistical information of the access performance for the index 1029 described above are managed.
  • the site having the resource management / transaction control unit 1016 corresponds to a replica of the site having the resource management / transaction control unit 1015. Therefore, the same configuration as that of the site having the resource management / transaction control unit 1015 described above is adopted.
  • the distributed storage 1005 and the distributed storage 1006 are arranged in association with each other.
  • the distributed storage 1005 has a database table area and an index area.
  • Another distributed storage 1006 stores Log, journal, and the like.
  • the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
  • the site having the resource management / transaction control unit 1016 also includes a performance / error information / statistical information service 1020.
  • the index area corresponds to the relationship R 1, the previous table Table. 1: Table 102, which is a replica of 1023. 1: 1026, equivalent to relation R 2, above table Table. 2: Table 1024 which is a replica of 1024. 2: 1027, corresponding to the relationship R n , and the above table Table. n: A table Table. n: 1028, table 1032 of intermediate results is stored. An index 1031 is stored in the index area.
  • the above-mentioned distributed storage 1006 includes a log, journal 1039, performance information, error information 1040, and statistical information 1041 for each object.
  • the performance information and error information 1040 manages general performance information and error information.
  • the statistical information 1041 for each object is stored in the table Table. 1: 1026 and the table Table. 2: 1027 and the above table Table. n: 1028 and the statistical information of the access performance for the index 1031 is managed.
  • the site having the resource management / transaction control unit 1017 corresponds to a site specialized in index management.
  • the distributed storage 1007 and the distributed storage 1008 are arranged in association with each other.
  • the distributed storage 1007 has a database table area and an index area.
  • Another distributed storage 1008 stores Log, journal, and the like.
  • the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
  • the site having the resource management / transaction control unit 1017 also includes a performance / error information / statistical information service 1021.
  • the intermediate result table 1034 is stored in the table area and index area of the database in the distributed storage 1007.
  • An index 1033 is stored in the index area.
  • the above-described distributed storage 1008 includes a log, journal 1042, performance information, error information 1043, and statistical information 1044 for each object.
  • the performance information and error information 1043 manages general performance information and error information.
  • the statistical information 1044 for each object manages the statistical information on the access performance for the index 1033 on the distributed storage 1007.
  • the site having the resource management / transaction control unit 1018 corresponds to a site specialized for managing intermediate results.
  • a distributed storage 1009 and a distributed storage 1010 are arranged in association with each other.
  • the distributed storage 1009 has a database table area and an index area.
  • Another distributed storage 1010 stores Logs, journals, and the like. In this case, the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
  • the site having the resource management / transaction control unit 1018 also includes a performance / error information / statistical information service 1022.
  • the intermediate result table 1035 is stored in the table area and index area of the database in the distributed storage 1009.
  • the above-described distributed storage 1009 includes Log, journal information 1045, performance information, error information 1046, and statistical information 1047 for each object.
  • the performance information and error information 1046 manages general performance information and error information.
  • the statistical information 1047 for each object manages statistical information on access performance for an arbitrary object on the distributed storage 1010.
  • the aforementioned database access client unit-1: 1060 transmits a request execution query as a message s100 to the query reception unit 1048 in the database access unit-1: 1101.
  • the message s100, described in SQL are marked, it is assumed to include a description of request execution query was performed natural binding relationship R n from the relationship R 1.
  • the query reception unit 1048 passes the specified query expression s101 equivalent to the specified query 1057 to the query analysis unit 1049.
  • the query analysis unit 1049 obtains the definition information s102 from the dictionary access unit 1050, converts the designated query expression s101 as the intermediate expression s103, and passes it to the optimization unit 1051.
  • the optimization unit 1051 calculates an optimized execution plan, and passes a part of the execution plan descriptions s124 and s105 from the process execution unit-1: 1053 to the process execution unit-N: 1054. In the optimization, various latency information and reliability information s104 are obtained from the statistical information management unit agent management unit 1052.
  • the partial execution plan description s105 is passed in the processing execution unit-N: 1054 described above, and the query execution unit 1058 is held in the processing execution unit-N: 1054 described above.
  • the partial execution plan description s124 is passed to the above-described process execution unit-1: 1053, and the query execution unit 1059 is held in the above-described process execution unit-1: 1053.
  • the query execution unit 1059 uses the intermediate result table, whereas the query execution unit 1058 generates the intermediate result table. For this reason, system transaction processing described later is performed.
  • the above-described processing execution unit-N: 1054 executes a query on the site having the above-described resource management / transaction control unit 1015 selected as the optimum site in the replica in order to process the query execution unit 1058. Send a message s106 containing the request.
  • Above resource management / transaction control section 1015 accepts the message s106, in accordance with the contents described in the message s106, tables Table corresponding to the relationship R 1. 1: 1023 and the destination line s107, obtained from it as the drive side, the table Table corresponding relationship R 2. 2: Get the corresponding row s108 from 1024. Furthermore, the binding results in the drive side, by using the index 1029 of the above, after the obtained line ID information s110 of the primary key table Table corresponding to the relationship R n. n: Obtained from 1025 while combining the corresponding row s109.
  • the resource management / transaction control unit 1015 returns the response result, which is intermediate result information for the query execution unit 1058, as the message s111 once to the processing execution unit-N: 1054.
  • the response result described in the message s111 needs to be temporarily recorded in the distributed storage as an intermediate result table. For this reason, after the processing execution unit-N: 1054 determines a storage to be written using a part of the present invention, an intermediate result table including a response result in the site holding the resource management / transaction control unit 1018 described above. A message s112 related to the creation request is sent.
  • the resource management / transaction control unit 1018 When the resource management / transaction control unit 1018 receives the message s112, the resource management / transaction control unit 1018 sends a request s113 for recording the pre-update image in the Log and journal 1045 in order to generate an intermediate result table as a transaction. In this case, there is no real result because there is no interim result table.
  • the resource management / transaction control unit 1018 described above extracts the intermediate result s114 described in the message s112 and writes it in the intermediate result table 1035. Thereafter, the state transits to a transaction commit wait state, and the resource management / transaction control unit 1018 returns the commit wait message s116 to the process execution unit-N: 1054.
  • the aforementioned process execution unit-N: 1054 issues a transaction state management request s117 in advance to the transaction management control unit 1055 in the database access unit-1: 1101.
  • the transaction management control unit 1055 issues a registration request s118 to the Log and journal 1056 regarding the system transaction.
  • the transaction management control unit 1055 determines that there is no particular problem in executing the transaction, the transaction management control unit 1055 notifies the processing execution unit-N: 1054 that is a request source of the system transaction.
  • the process execution unit-N: 1054 transmits a commit request message s119 to the message management / transaction control unit 1018 in response to the message s112 related to the intermediate result table creation request.
  • the commit result writing s120 is performed on the above-described log and journal 1045.
  • the resource management / transaction control unit 1018 described above sends a response message s123 to the requesting process execution unit-N: 1054.
  • the process execution unit-1: 1053 is activated using the intermediate result table 1035.
  • a partial execution plan description s124 is passed, and the execution unit 1059 of the query is held in the above-described processing execution unit-1: 1053.
  • the processing execution unit-1: 1053 described above transmits a message s125 including a query execution request to the site having the resource management / transaction control unit 1018 described above.
  • a message s127 including a query execution request is transmitted to the site having the resource management / transaction control unit 1016 selected as the optimum site in the replica.
  • the resource management / transaction control unit 1018 When the resource management / transaction control unit 1018 receives the message s125 including the query execution request, the resource management / transaction control unit 1018 obtains the corresponding row s126 from the intermediate result table 1035 in accordance with the contents described in the message s125, and executes the requesting process execution unit. -1: Responds to 1059.
  • the resource management / transaction control unit 1016 returns the response result to the query execution unit 1059 as the message s128 to the process execution unit-1-1053.
  • the results of each of the above-described processing execution units-1: 1053 and the above-described processing execution unit-N: 1054 are returned to the query reception unit 1048 as result responses s129 and s130, respectively, and the results are integrated.
  • the response message s131 is returned to the calling database access client unit-1: 1060.
  • the aforementioned optimization unit 1051 obtains various latency information and reliability information s104 from the statistical information management unit agent management unit 1052 when calculating the optimized execution plan. These are information expressed numerically. Therefore, the statistical information management unit agent management unit 1052 acquires and manages the information via the performance / error information / statistical information service 1019: 1020: 1021: 1022 on each site.
  • the performance / error information / statistical information service 1019 described above is arranged at a site managed by the resource management / transaction control unit 1015 described above, and the performance / error information s134 is determined from the performance information / error information 1037 described above.
  • the statistical information s133 is obtained from the statistical information 1038 for each object.
  • the aforementioned performance / error information / statistical information service 1019 integrates them and reports them as a performance / error information / statistical information report s132.
  • the above-described statistical information management unit agent management unit 1052 obtains the performance / error information / statistical information report s132 from the performance / error information / statistical information service 1019: 1020: 1021: 1022 on all sites. As latency information and reliability information.
  • FIG. 5A and FIG. 5B are explanatory diagrams schematically showing a characteristic part extracted in the procedure of the “query optimization method for evaluating the arrangement method of intermediate results of queries” of the present invention.
  • the query optimization method is mainly implemented by the optimization unit 1051 in FIGS. 3A and 3B, and is used by the query analysis unit 1049, the transaction management control unit 1055, and the statistical information management unit agent management unit 1052 in the processing process. Is done. Therefore, the procedure is briefly described below.
  • Step 1 is performed by the query analysis unit 1049.
  • the query analysis unit 1049 performs syntax analysis of the specified query 1057 and refers to a set of tables ⁇ R 1 , R 2 ,. . . , R n ⁇ .
  • Step 2 is performed by the optimization unit 1051.
  • Step 3) (Step .3.1) under began, those optimum access condition from the temporary binding element list for each element table R m which was performed in (step .2.1)
  • the query elements that optimize the cost are selected by taking out and deleting only the extracted ones and deleting others from the temporary joining element list.
  • step .3.2 under, for each element of the temporary coupling element list, all replicas min ⁇ R 11, R 12, ... , R 1r, R 21, ..., R 2r,. . . , R nr ⁇ is created and added to the temporary combined element list.
  • Step 4) is also performed by the optimization unit 1051.
  • These (Step 4) and (Step 5) are one of the characteristic matters of the present invention, and (Step 4) has two procedures under it.
  • Step 5 is also performed by the optimization unit 1051.
  • Step 5.1 the optimizing unit 1051 takes out one entry from the above-described temporary combining element list including the replica portion.
  • Step 5.2.1 numerical information regarding the latency and reliability of each replica of the extracted entry is evaluated.
  • the cost value is described as an evaluation value in the evaluation of each entry, and the smallest value is evaluated as desirable.
  • an extended cost value is defined in which a normalized latency value is superimposed on the cost value and an inverse number of reliability is also superimposed. Then, evaluation is made so that the smallest expansion cost value is the best.
  • the normalized latency value becomes larger as the communication quality of various networks 1002 such as site performance and LAN / WAN / Internet deteriorates or the arrangement distance becomes longer. Also, the lower the site reliability, the greater the inverse. For this reason, the farther the arrangement position is and the worse the quality is, the worse the cost value is evaluated.
  • Step 5.3 the elements of the temporary combination element list to be actually used for the designated top p replicas whose expansion cost values are valid (conforming to predetermined conditions), Decide for each type. At this time, the top p replicas are left and others are deleted from the temporary connection element list.
  • Step .6 the initial version of ⁇ R 11 o coupling element list from the entry remaining on the temporary coupling element list above, ..., R 1p o, R 21 o, ..., R 2p o, . . . , R np o ⁇ is added to the optimization plan element list.
  • Step 8) also includes the characteristic features of the present invention. In particular, (Step .8.3), (Step .8.4), and (Step .8.7) to (Step .8.9).
  • the optimization unit 1051 repeatedly performs the subordinate processing until the number of elements in the combined element list becomes one type except for the replica.
  • Step 8.1 the maximum block size k is set to a small value by the designated value or the number of elements in the combined element list, as can be seen in the conventional method. Thereafter, in (Step .8.2), the initial value of the number of connections is designated as 2. In (Step 8.3), after the processing system of the optimization unit 1051 copies itself, the maximum block size on the copied side is rewritten to (k + 1). Thereafter, in (Step .8.4), the copied processing system is activated. This is to generate an intermediate result table when the maximum block size is k, but is performed in order to pre-determine and rationally determine which distributed storage site to place.
  • the processing system itself of the optimization unit 1051 is copied, the maximum block size is changed to (k + 1), and the evaluation is performed in parallel. It can be obtained.
  • Step 8.5 as long as the condition of designated number of connections ⁇ maximum number of blocks k (the copy side is (k + 1)) is satisfied, the subordinate processing is repeated.
  • step .8.5. 1 (step .8.5.2), (step .8.5.3), (step .8.5.4), (step .8.5.5) ), (Step. 8.5.6), (Step. 8.5.7), and (Step. 8.5.8) are methods that are also confirmed by the conventional method.
  • Step .8.5 Subordinate processing
  • step .8.5.1 all scenes constituting the combination of the specified binding few minutes, the initial version of ⁇ R 11 o coupling element list, ..., R 1p o, R 21 o, ..., R 2p o ,. . . , R np o ⁇ to create and add a temporary list.
  • Step 8.5.2 the subordinate processing is repeated for all the temporary list elements.
  • Step .8.5.2 [(Step .8.5.2) Subordinate processing] First, in (Step .8.5.2.1), one element is extracted from the temporary list. Subsequently, in (Step .8.5.2.2), the subordinate processing is repeated until the temporary list portions 1 and 2 can be defined together.
  • Step 8.5.2.2 [Processing under (Step.8.5.2.2)] First, in (Step 8.5.5.2.2.1), the element is divided into two, and two sets of combinations of the elements ⁇ R kp o ⁇ are created, and ⁇ temporary list part 1 ⁇ ⁇ temporary list part 2 ⁇ . Subsequently, in (Step .8.5.2.2.2.2), a combination of ⁇ temporary list part 1 ⁇ ⁇ temporary list part 2 ⁇ is added to the temporary operation list. Subsequently, in (Step .8.5.2.2.2.3), a combination of ⁇ temporary list part 2 ⁇ ⁇ temporary list part 1 ⁇ is added to the temporary operation list.
  • Step. 8.5 Continuation of subordinate process 1
  • the optimization plan list is initialized in (Step .8.5.3) under (Step .8.5).
  • Step .8.5.4 the subordinate processing is repeated for all the provisional operation list elements.
  • Step.8.5.4.1 one is extracted from the temporary operation list.
  • Step .8.5.4.4 the combined plan in the two optimization plans of (Step .8.5.4.2) and (Step 8.5.4.3) described above. And add it to the optimization plan list.
  • Step .8.5 Continuation of subordinate process 2
  • Step .8.5.5 After (Step .8.5.4) is finished, in (Step .8.5.5) under (Step .8.5), one optimal item is extracted from the optimization plan list and optimized. Add to the plan element list.
  • Step .8.5.6 under (Step .8.5)
  • the non-optimal optimization plan list is organized and held as a library.
  • Step .8.5.7 under (Step .8.5)
  • the temporary list and the temporary operation list are cleared.
  • Step .8.5.8 under (Step .8.5)
  • 1 is added to the number of bonds (number of bonds ⁇ (number of bonds + 1)).
  • Step 8.6 As can be seen in the conventional method, from the elements in the optimization plan element list created in (Step .8.5), the algorithm of greedy method (greedy method) is used. Select a join tree that minimizes the cost function. The algorithm of the greedy method (greedy method) is the same as the conventional one.
  • Step 8.7) is a characteristic matter of the present invention.
  • the processing system of the optimizing unit 1051 that is executing in parallel is the side that performed copying with the maximum block size k (non-copy processing system) or the maximum block size (k + 1).
  • the side that was copied (copy processing system) is evaluated. Depending on the result, the content of subsequent processing differs.
  • Step 8.7.1 When the processing system of the optimization unit 1051 is a copy processing system that has been copied with the maximum block size (k + 1), in (Step 8.7.1), it waits for an inquiry from the non-copy processing system. Thereafter, in (Step 8.7.2), the type of the join tree ⁇ T 1 o ⁇ of the intermediate result table specified at the time of inquiry from the non-copy processing system that has performed copying is determined. In this case, the left side (external side) / driving table or the right side (internal side) / reference table is evaluated, and which table of which site is to be joined with k joins is responded to the non-copy processing system. . Thereafter, in (Step 8.7.3), the process is terminated by itself. That is, the copy processing system is terminated.
  • Step 8.8.7.4 the temporary tree is temporarily placed in the new intermediate result table ⁇ T 1 o ⁇ . To do. Thereafter, in step 8.7.5, an inquiry is made to the processing system of the optimization unit 1051 copied with the maximum block size (k + 1) in order to inquire which site this result should be placed.
  • the processing system of the optimization unit 1051 copied with the maximum block size (k + 1) is normally assumed to end itself in (Step 8.8.7.3), but the case where it has not ended after use is assumed. Considering (step 8.7.5), it is also possible to forcibly terminate.
  • Step 8.8 Continuation of subordinate process 2
  • the placement evaluation function is called, and the top two proposals are extracted in accordance with the creation policy of the site where the intermediate result table is to be placed and its index.
  • an index or the like is created, an additional cost is generated for this purpose, but the search speed is greatly improved because of the index. Therefore, it is necessary to evaluate “creation of index etc.” and “search using index etc.” in a balanced manner.
  • step .f the type of connection tree ⁇ T 1 o ⁇ of the intermediate result table determined in the processing system of the optimization unit 1051 copied with the maximum block size (k + 1) is confirmed.
  • step .f.1.1 the type of the connection tree ⁇ T 1 o ⁇ is left side (external side) / driving table
  • step .f.1.1 the type of the connected tree ⁇ T 1 o ⁇ is the right side (internal side) / reference table
  • step .f.2.1 to (step .f.2.15) are continuously performed.
  • Step .f.1.1 the next join operation destination (in this case, internal side / reference table) in (Step .8.7.2) is confirmed. After that, in (step f. 1.2), all the site groups (including replicas) used in the next join operation destination (inside / reference table) are taken out and added to the temporary site list.
  • Step .f.1.1 In (Step .f.1.1) and (Step .f.1.2), in the processing system of the copied optimization unit 1051, which table and site group is the next join operation destination (in this case, It is determined whether it is used as an internal side / reference table), and it is included in the list. On the other hand, in (Step .f.1.3) to (Step .f.1.7), up to two candidates that can place the intermediate result table among the unused sites. Explore.
  • Step .f.1.3 a list of unused site groups is extracted.
  • a site not included in the temporary site list is taken out and added to the candidate temporary list.
  • Step f.1.4 candidate areas for two locations for search are prepared and initialized. Thereafter, in (Step .f.1.5), the subordinate processing is repeatedly performed for the candidate temporary list element.
  • step .f.1.5.1 the site information is read sequentially.
  • step .f. 1.5.2 numerical information related to latency and reliability related to the site is read from the measurement statistics DB.
  • the subordinate process is performed every time numerical information related to latency and reliability is read.
  • step .f.1.5.2.2 when (step .f.1.5.2) is repeated and the numerical information on latency and reliability is higher evaluation, Replace the site listed in the candidate area for search (replace). No other processing is performed.
  • Step f.1.5 If two sites that are new candidates are identified as a result of performing (Step f.1.5) on all sites in the candidate temporary list, (Step f.1.6)
  • the intermediate result table join tree ⁇ T 1 o ⁇ is added to a temporary site list that means site candidates to be temporarily arranged. Thereafter, in (step .f.1.7), the candidate temporary list is cleared. Subsequently, in (step .f.1.8), each of the site candidates for temporarily arranging the connection tree ⁇ T 1 o ⁇ of the intermediate result table by repeatedly performing the subordinate processing for the site temporary list element. Is specifically evaluated.
  • Step .f.1.8.1 the table creation cost is estimated from the number of creations of the join tree ⁇ T 1 o ⁇ (number of rows created and number of cases).
  • Step .f. 1.8.2 numerical information related to the latency and reliability related to the site is read from the measurement statistics DB.
  • Step .f. 1.8.3 the normalized latency related to the site holding the data obtained in (Step .f. 1.8.2) with respect to the estimation of the table creation cost described above. And the reciprocal number of the numerical information related to the reliability are superimposed to obtain the total cost.
  • Step .f. 1.8.4 a plan number is assigned to the entire cost, and it is added to the evaluation list as one of candidates.
  • step .f. 1.8.5 the possibility of creating an index is determined. If the index can be created, the subordinate process is executed.
  • the index creation cost is estimated from the total cost obtained in (Step.f.1.8.8.3), and the total cost including the index creation cost is calculated again. . After that, in (Step .f.1.8.5.2), a plan number is assigned to the newly calculated overall cost, and it is added to the evaluation list as another candidate.
  • Step .f.1.8 After evaluating all the site candidates that temporarily place the connection tree ⁇ T 1 o ⁇ of the intermediate result table in (Step .f.1.8), in (Step .f.1.9), the evaluation list described above is used. The data in the list is sorted and sorted according to the overall cost. Then, the one with the lowest overall cost is stored as the first plan.
  • a similar configuration means a configuration in which an index exists when an index is created, and a configuration in which no index exists when an index is not created.
  • Step f.1.8.11 the site temporary list and the evaluation list are cleared as post-processing. Thereafter, in (Step .f. 1.8.12), the above-mentioned first proposal and the above-mentioned second proposal are answered.
  • Step .f.2.1 the next join operation destination (in this case, external side / drive table) in (Step 8.7.2) is confirmed. Thereafter, in (step f.2.2), all site groups (including replicas) used at the next join operation destination (external side / drive table) are taken out and added to the temporary site list. After that, in (Step .f2.3), a list of sites that are not used is extracted. Here, a site not included in the temporary site list is taken out and added to the candidate temporary list. In (Step .f.2.4), two candidate areas for search are prepared and initialized. Thereafter, in (step .f.2.5), the subordinate processing is repeatedly performed for the candidate temporary list element.
  • step .f.2.5.1 the site information is read sequentially.
  • step .f.2.5.2 numerical information regarding latency and reliability related to the site is read from the measurement statistics DB.
  • the subordinate process is performed every time numerical information related to latency and reliability is read.
  • the first two sites are set as candidates in the candidate area for search described above.
  • Step .f.2.5 If two sites that are new candidates are identified as a result of performing (Step .f.2.5) on all sites in the candidate temporary list, (Step .f.2.6)
  • the intermediate result table join tree ⁇ T 1 o ⁇ is added to a temporary site list that means site candidates to be temporarily arranged. Thereafter, in (step .f.2.7), the candidate temporary list is cleared. Subsequently, in (step .f. 2.8), each of the site candidates for temporarily arranging the connection tree ⁇ T 1 o ⁇ of the intermediate result table by repeatedly executing the subordinate processing for the site temporary list element Is specifically evaluated.
  • Step .f.2.8.1 the table creation cost is estimated from the number of creations of the join tree ⁇ T 1 o ⁇ (the number of rows and the number of rows created).
  • Step .f. 2.8.2 numerical information related to the latency and reliability related to the site is read from the measurement statistics DB.
  • Step .f. 2.8.3 the normalized latency related to the site holding the data obtained in (Step .f. 2.8.2) with respect to the estimation of the table creation cost described above.
  • the reciprocal number of the numerical information related to the reliability are superimposed to obtain the entire table portion cost.
  • a plan number is assigned to the entire table part cost, and the plan number is added to the table part evaluation list as one of candidates.
  • Step .f.2.1 to (Step .f. 2.8.4) are the same as the above (Step .f.1.2) to the above-mentioned except for (Step .f.2.1.1.1).
  • Step .f. 1.8.4 is equivalent to the process, and the entire table portion cost is calculated.
  • Step .f2.9) to (Step .f.2.15) are dedicated processing contents when the type of the connection tree ⁇ T 1 o ⁇ is the right side (inside) / reference table. It is.
  • step .f. 2.9 when the type of join tree ⁇ T 1 o ⁇ is the right side (inside) / reference table, the total cost of the table portion is small for the purpose of suppressing the overall processing amount. Limited to the top three. Here, from the table part evaluation list, the top three items with the small total table part cost are selected and the others are deleted.
  • Step .f.2.10 the subordinate processing is repeatedly performed for the site temporary list element.
  • Step.f.2.10.1 the creation cost of the above-described hash configured on the memory is estimated.
  • Step .f.2.10.2 superimposes the numerical information related to the normalized latency related to the site obtained in (Step .f. 2.8.2) and the reciprocal of the numerical information related to the reliability described above. And obtain the reference cost.
  • step .f.2.10.3 a plan number is assigned to this reference cost, and it is added to the reference partial evaluation list as one of the candidates.
  • Step .f.2.10.4 the above-mentioned index creation cost is estimated.
  • Step .f.2.10.5 superimposes the numerical information related to the normalized latency related to the site obtained in (Step .f. 2.8.2) and the reciprocal number of the numerical information related to the reliability described above. And obtain the reference cost. Further, in (Step .f.2.10.6), a plan number is assigned to this overall cost, and it is added to the reference partial evaluation list as one of the candidates.
  • Step .f.2.11 the intermediate result table including the above-described join tree ⁇ T 1 o ⁇ and the above-described hash are obtained by combining the total cost of the table part and the reference cost. Evaluate when it is created on a different site.
  • Step .f.2.11 the subordinate processing is repeated for the table partial evaluation list element.
  • step .f.2.11.1 one item is taken out from the table partial evaluation list in the order of the plan number.
  • step .f.2.11.2 the table portion overall cost and the reference cost are each weighted by a constant weight, and the sum is taken to calculate the overall cost. Thereafter, in (Step .f.2.11.2), a plan number is assigned to the calculated overall cost and added to the evaluation list.
  • Step .f.2.12 sorts and ranks the data in the above-described evaluation list according to the overall cost. Then, the one with the lowest overall cost is stored as the first plan.
  • the similar configuration specifically means a configuration in which an index exists when an index is created, and a configuration in which no index exists when an index is not created.
  • step .f.2.14 as a post-process, the site temporary list, the table partial evaluation list, the reference partial evaluation list, and the evaluation list are cleared. Thereafter, in (Step .f.2.15), the above-mentioned first proposal and the above-mentioned second proposal are answered.
  • the placement evaluation function ends the process. After that, based on the result of the arrangement evaluation function, the intermediate generation result is reserved for arrangement in one place or speculatively two places in (Step .8.9).
  • the processing after (Step 8.10) is processing that can also be seen in the conventional method.
  • the elements of the connection tree ⁇ T k1 o , T k2 o ⁇ corresponding to the arranged intermediate result table are added to the current version of the connection element list.
  • the joining elements constituting the joining tree ⁇ T k1 o , T k2 o ⁇ are deleted from the current version of the joining element list.
  • the processing of (Step 8) is the core processing of optimization, and when this processing is omitted, the processing system of the optimization unit 1051 shifts to (Step 9). In (Step 9), the remaining optimization is performed from the current version ⁇ T m o ⁇ of the combined element list, and an optimization plan is added. Thereafter, the processing system of the optimization unit 1051 selects one optimization plan in (Step 10).
  • the processing system of the optimization unit 1051 performs the initial version ⁇ R 11 o ,..., R 1 p o , R 21 o ,..., R 2p o ,. . . , R np o ⁇ and the current version ⁇ T m o ⁇ to re-express the optimization plan. Thereafter (in step 12.12, the processing system of the optimizing unit 1051 transfers a part of the execution plan descriptions s124 and s105 from the above-described processing execution unit-1: 1053 to the above-mentioned processing execution unit- N: It passes to 1054 and a process is complete
  • Step 5A and 5B correspond to (Step 1) to (Step 7).
  • the repetition 2001 after the initial stage 2000 corresponds to (Step 8).
  • “Start of processing system and copy processing system” 2002 in FIGS. 5A and 5B corresponds to (Step .8.3) and (Step .8.4).
  • the side that activates the copy processing system in the processing system of the optimization unit 1051 is represented by a processing sequence 2003 in FIGS. 5A and 5B.
  • the side activated as the copy processing system is expressed by a processing sequence 2004.
  • the core part of the processing content is expressed by the processing sequence 2005 and the processing sequence 2006.
  • the processing sequence 2003 and the processing sequence 2004 proceed with exactly the same processing.
  • a join operation is performed on two relations including the relation ⁇ R 1 o ⁇ and the relation ⁇ R 2 o ⁇ .
  • the processing result of the first stage corresponds to the combined result 2007 in the processing sequence 2003 and corresponds to the combined result 2015 in the processing sequence 2004.
  • a join operation is further performed on the first stage join result.
  • a process of performing a join operation with the relationship ⁇ R 4 o ⁇ and the like.
  • the processing result of the second stage corresponds to the combined result 2008 in the processing sequence 2003 and corresponds to the combined result 2016 in the processing sequence 2004.
  • a plurality of replacement candidates 2009 for the connection tree ⁇ T 1 o ⁇ corresponding to the intermediate generation table are generated. Furthermore, the selection result 2010 of the join tree ⁇ T 1 o ⁇ , which is the optimum intermediate result table, is created from the replacement candidate 2009 by the algorithm of the greedy method (greedy method).
  • the series of processing contents correspond to (Step .8.5) and (Step .8.6).
  • a join operation is further performed on the join result 2016 in the second stage.
  • a join operation is further performed.
  • a join operation is performed between the relation ⁇ R 1 o ⁇ and the relation ⁇ R 2 o ⁇ , and the relation ⁇ R 3 o ⁇ 2022 and the relation ⁇ R 5 o are further obtained for the result of the relation ⁇ R 4 o ⁇ .
  • This corresponds to the process of additionally combining the combination result 2020 of 2021.
  • a plurality of combined result proposals are created, and here, the combined result proposal 2017 corresponds.
  • processing 2018 is performed to specify which replica table is used in k joins when the maximum block size is (k + 1). This corresponds to the first half of (Step 8.7.2).
  • the processing unit of the optimizing unit 1051 waits for an inquiry from the side that activates the copy processing system.
  • the process 2019 is executed, the response 2024 is executed, and the process ends. This corresponds to the latter half of (Step 8.7.2) and (Step 8.7.3).
  • a join tree ⁇ T 1 o ⁇ that is an intermediate result table is designated.
  • the connection tree ⁇ T 1 o ⁇ becomes the left side (external side) / driving table or the right side (internal side) / reference table. All the site groups and table groups are designated as to which replica table is used in the next join operation.
  • post-processing is performed in processing 2014. Specifically, this corresponds to (Step .8.10) and (Step .8.11).
  • the query optimization method of the present invention is a method for optimizing a query in a database management system, and occurs in the course of query optimization, particularly in an environment using a large-scale virtualized storage. This is a method for more rationally arranging intermediate results.
  • queries are premised on an arbitrary number of combined operations of a limit operation and a join operation in relational algebra.
  • This query optimization method extracts numerical information related to latency and reliability related to the site holding the data. Next, the numerical information related to the latency and the numerical information related to the reliability are superimposed on the cost value of the query to obtain an extended cost value. Next, a site having an appropriate designated value at the top when selecting an extension cost value is selected. Next, in order to select a placement destination site for intermediate results generated as a result of optimization, prefetching is performed by creating a copy of the processing system itself in which some parameters in the algorithm are changed. Next, as a result of the prefetching, a query intermediate result placement method characterized by including a step of placing intermediate results on one or more sites is evaluated.
  • the method includes a step of performing prefetching by creating a copy of the processing system itself in which a parameter in the algorithm is partially changed, and a step of placing intermediate results on one or more sites as a result of the prefetching.
  • the elemental technology of a database management system is distributed over a wide range of environments, from functional modules in a single system to a wide network, and is deployed on a large scale.
  • the query optimization technique assumed is also affected, and can be considered as a form of evolution to a technique that assumes an environment distributed throughout the network. Since it is handled over a wide area network, the related technical areas will span several.

Abstract

Provided is a query optimization system, wherein after a rational assessment of the interim results of a query generated as a result of optimization, query optimization is achieved at the implementation stage, taking into account the decentralized storage site in which the query should be positioned. First, numerical data concerning reliability and latency for sites that store data is extracted. Next, the reciprocals of the aforementioned data concerning latency and numerical data concerning reliability are overlaid onto cost values of queries, and the expanded cost is determined. Next, a replica of a high-order specified value appropriate for the cost value is selected. Next, 1 is added to the maximum block size, which is a parameter in the algorithm for selecting a positioning destination site for interim results generated as a result of optimization, and pre-reading is implemented by preparing copies of the processing system itself. Next, as a result of the pre-reading, the interim results are positioned in one or more sites. By doing so, the method of positioning the interim results for queries is evaluated.

Description

クエリ最適化システム、クエリ最適化装置、及びクエリ最適化方法Query optimization system, query optimization device, and query optimization method
 本発明は、クエリ最適化システムに関し、特にクエリの中間成果の配置方式を評価するクエリ最適化システムに関する。 The present invention relates to a query optimization system, and more particularly to a query optimization system that evaluates an arrangement method of intermediate results of a query.
 クエリ(query)の中間成果の配置方式を評価するクエリ最適化方式(クエリ最適化システム)に関連する代表的な既存技術を開示する文献として、以下のような文献がある。 The following documents disclose typical existing techniques related to a query optimization method (query optimization system) that evaluates an arrangement method of intermediate results of a query (query).
 <非特許文献1>D.Kossmann,K.Stocker,”Iterative Dynamic Programming:A New Class of Query Optimization Algorithms”, ACM Transactions on Database Systems, Vol.25, No.1, March 2000. <Non-Patent Document 1> D. Kossmann, K.M. Stocker, “Iterative Dynamic Programming: A New Class of Queries Optimization Algorithms”, ACM Transactions on Database Systems, Vol. 25, no. 1, March 2000.
 <非特許文献2>E.J.Shekita,H.C.Young,K.L.Tan,”Multi-Join Optimization for Symmetric Multiprocessors”,Proceedings of the 19th VLDB Conference, 1993. <Non-Patent Document 2> E. J. et al. Shekita, H .; C. Young, K .; L. Tan, “Multi-Join Optimization for Symmetric Multiprocessors”, Proceedings of the 19th VLDB Conference, 1993.
 <特許文献1>特開2002-222108号公報 <Patent Document 1> JP 2002-222108 A
 <非特許文献1>では、クエリの中間成果の配置方式を評価する従来のクエリ最適化方式について言及しているが、中間成果の配置方式に関して、その手法を明確に言及している訳ではない。 <Non-Patent Document 1> refers to a conventional query optimization method for evaluating a query intermediate result placement method, but does not explicitly refer to the intermediate result placement method. .
 <非特許文献2>では、クエリ最適化方式で中間成果をストレージ(Storage:外部記憶装置)等に書き出すことについて具体的に言及している。但し、これは単一のシステム内の機能モジュールを前提としており、2009年度の今日的な課題である広域な計算機資源の融通、調達、仮想化を前提とした事項を扱っている訳ではない。 <Non-Patent Document 2> specifically refers to writing intermediate results to a storage (storage: external storage device) or the like by a query optimization method. However, this assumes functional modules in a single system, and does not deal with matters premised on wide-ranging computer resource interchange, procurement, and virtualization, which are the current issues in 2009.
 これに対して、<特許文献1>では、クエリ最適化に関する手法ではなく、中間成果の外部配置を前提とした既存技術を説明している。 On the other hand, <Patent Document 1> describes an existing technique based on the premise of external placement of intermediate results, not a technique related to query optimization.
 <従来技術のクエリ最適化方式>
 ここで、<非特許文献1>から従来技術のクエリ最適化方式を抽出し、具体的な手順として示す。
<Conventional query optimization method>
Here, the query optimization method of the prior art is extracted from <Non-Patent Document 1> and shown as a specific procedure.
 (ステップ.1)では、指定クエリの構文解析を行い、参照する表集合{R,R,...,R}を取り出す。 In (Step 1), the specified query is parsed and the set of tables {R 1 , R 2 ,. . . , R n }.
 (ステップ.2)では、その配下の(ステップ.2.1)において、表集合の各要素表Rで索引を効果的に利用する等の想定される全てのアクセスパターンを抽出した上で、仮結合要素リストに追加する。これを全ての要素表Rに対して行う(m=1~n)。これにより、最適化の候補抽出を行うことになる。 In (Step 2), after extracting all the assumed access patterns such as effectively using an index in each element table R m of the table set in (Step 2.1) under it, Add to the temporary join element list. This is performed for all the element tables R m (m = 1 to n). Thereby, optimization candidates are extracted.
 (ステップ.3)では、その配下の(ステップ.3.1)において、(ステップ.2.1)で実施した各要素表Rに関して仮結合要素リストから最適なアクセス条件のものを一つ取り出し、取り出したもののみを残して他を仮結合要素リストから削除することでコストを最適化するクエリ要素の選別を行う。これを全ての要素表Rに対して行う(m=1~n)。 In (Step 3), one of the optimal access conditions is extracted from the temporary connection element list for each element table R m implemented in (Step 2.1) in (Step 3.1) under the subordinate (Step 3.1). The query elements that optimize the cost are selected by leaving only the extracted ones and deleting others from the temporary binding element list. This is performed for all the element tables R m (m = 1 to n).
 (ステップ.4)では、仮結合要素リストを削除する。 In (Step 4), the temporary combined element list is deleted.
 (ステップ.5)では、結合要素リストの初期版{R ,R ,R ,...,R }を作成し、最適化プラン要素リストにも追加する。 (Step 5), the initial version of the combined element list {R 1 o , R 2 o , R 3 o,. . . , R n o } and add it to the optimization plan element list.
 (ステップ.6)では、結合要素リスト内の要素数が1になるまで、配下の処理を繰り返して実施する。 In (Step 6), the subordinate processes are repeated until the number of elements in the combined element list becomes 1.
 [(ステップ.6)配下の処理]
 まず、(ステップ.6.1)では、最大ブロックサイズkを、指定値、もしくは結合要素リストの要素数のうち、小さいほうに設定する。続いて、(ステップ.6.2)では、結合数を2に指定する。続いて、(ステップ.6.3)では、指定結合数≦最大ブロック数kの条件を満足する限り、配下の処理を繰り返して実施する。
[(Step 6) Subordinate processing]
First, in (Step 6.1), the maximum block size k is set to the smaller of the specified value or the number of elements in the combined element list. Subsequently, in (Step 6.2), the number of bonds is specified as 2. Subsequently, in (Step .6.3), the subordinate processing is repeatedly performed as long as the condition of designated number of connections ≦ maximum number of blocks k is satisfied.
 [(ステップ.6.3)配下の処理]
 まず、(ステップ.6.3.1)では、指定結合数分の組合せを構成する全ての場面を、結合要素リストの現版{R ,R ,R ,...,R }から取り出し、仮リストを作成し、追加する。続いて、(ステップ.6.3.2)では、全ての仮リスト要素数分、配下の処理を繰り返して実施する。
[(Step .6.3) Subordinate processing]
First, in (Step .6.3.1), all scenes constituting combinations for the designated number of connections are displayed in the current version {R 1 o , R 2 o , R 3 o ,. . . , R n o } to create and add a temporary list. Subsequently, in (Step 6.3.2), the subordinate processing is repeated for all the provisional list elements.
 [(ステップ.6.3.2)配下の処理]
 まず、(ステップ.6.3.2.1)では、仮リストから1要素を取り出す。続いて、(ステップ.6.3.2.2)では、仮リスト部分1,2が伴に定義できるまで、配下の処理を繰り返して実施する。
[(Step .6.3.2) Subordinate process]
First, in (Step 6.3.2.1), one element is extracted from the temporary list. Subsequently, in (Step .6.3.3.2), the subordinate processes are repeated until the temporary list parts 1 and 2 can be defined.
 [(ステップ.6.3.2.2)配下の処理]
 まず、(ステップ.6.3.2.2.1)では、要素を2つに分割し、要素{R }の組合せ2組を作成し、{仮リスト部分1}{仮リスト部分2}を作成する。続いて、(ステップ.6.3.2.2.2)では、{仮リスト部分1}{仮リスト部分2}の組合せで、仮操作リストに追加する。続いて、(ステップ.6.3.2.2.3)では、{仮リスト部分2}{仮リスト部分1}の組合せで、仮操作リストに追加する。
[Process under (Step 6.3.2.2)]
First, in (Step 6.3.3.2.2.1), the element is divided into two, and two combinations of the elements {R 1 o } are created, and {temporary list part 1} {temporary list part 2 }. Subsequently, in (Step 6.3.3.2.2.2), a combination of {temporary list part 1} {temporary list part 2} is added to the temporary operation list. Subsequently, in (Step .6.3.2.2.2.3), a combination of {temporary list part 2} {temporary list part 1} is added to the temporary operation list.
 [(ステップ.6.3)配下の処理の続き1]
 (ステップ.6.3.2)の終了後、(ステップ.6.3)配下の(ステップ.6.3.3)では、最適化プランリストを初期化する。続いて、(ステップ.6.3.4)では、全ての仮操作リスト要素数分、配下の処理を繰り返して実施する。
[(Step .6.3) Continuation of subordinate processing 1]
After the completion of (Step 6.3.2), in (Step .6.3.3) under (Step .6.3), the optimization plan list is initialized. Subsequently, in (Step. 6.3.4), the subordinate processing is repeated for all the provisional operation list elements.
 [(ステップ.6.3.4)配下の処理]
 まず、(ステップ.6.3.4.1)では、仮操作リストから1つ取り出す。続いて、(ステップ.6.3.4.2)では、取り出した右側{仮リスト部分x(x=1,2)}の最適化プランを最適化プラン要素リストから取り出すか、もしくは作成する。続いて、(ステップ.6.3.4.3)では、取り出した左側{仮リスト部分x(x=1,2)}の最適化プランを最適化プラン要素リストから取り出すか、もしくは作成する。続いて、(ステップ.6.3.4.4)では、前述の(ステップ.6.3.4.2)(ステップ.6.3.4.3)の2つの最適化プランでの結合プランを作成し、最適化プランリストに追加する。
[Processing under (Step. 6.3.4)]
First, in (Step 6.3.4.1), one is extracted from the temporary operation list. Subsequently, in (Step 6.3.4.2), the optimization plan of the extracted right side {provisional list portion x (x = 1, 2)} is extracted from the optimization plan element list or created. Subsequently, in (Step 6.3.4.3), the optimization plan of the extracted left {temporary list portion x (x = 1, 2)} is extracted from the optimization plan element list or created. Subsequently, in (Step 6.3.4.4), the combined plan in the two optimization plans of (Step .6.3.4.2) and (Step 6.3.4.3) described above. And add it to the optimization plan list.
 [(ステップ.6.3)配下の処理の続き2]
 (ステップ.6.3.4)の終了後、(ステップ.6.3)配下の(ステップ.6.3.5)では、最適化プランリストから最適なものを1つ取り出し、それを最適化プラン要素リストに追加する。続いて、(ステップ.6.3)配下の(ステップ.6.3.6)では、最適化プランリストの最適でないものを、ライブラリとして整理保持する。続いて、(ステップ.6.3)配下の(ステップ.6.3.7)では、仮リスト、仮操作リストをクリア(消去)する。続いて、(ステップ.6.3)配下の(ステップ.6.3.8)では、結合数に1を加算(increment)する(結合数←(結合数+1))。
[(Step .6.3) Continuation of subordinate process 2]
After (Step .6.3.4) is finished, in (Step .6.3.5) under (Step .6.3), one optimal item is extracted from the optimization plan list and optimized. Add to the plan element list. Subsequently, in (Step .6.3.6) under (Step .6.3), non-optimal optimization plan lists are organized and held as a library. Subsequently, in (Step .6.3.7) under (Step .6.3), the temporary list and the temporary operation list are cleared (erased). Subsequently, in (Step .6.3.8) under (Step .6.3), 1 is added to the number of bonds (number of bonds ← (number of bonds + 1)).
 [(ステップ.6)配下の処理の続き]
 (ステップ.6.3)の終了後、(ステップ.6)配下の(ステップ.6.4)では、最適化プラン要素リスト内の要素から、貪欲法(欲張り法)のアルゴリズム(Greedy Algorithm)を用いてコストを計算する評価関数を最小にする結合木の選択を行う。続いて、(ステップ.6)配下の(ステップ.6.5)では、結合木を{T }とする。続いて、(ステップ.6)配下の(ステップ.6.6)では、結合木{T }の要素を結合要素リストの現版{R ,R ,R ,...,R }に追加する。続いて、(ステップ.6)配下の(ステップ.6.7)では、結合木{T }を構成する結合要素を結合要素リストの現版{R ,R ,R ,...,R }から削除する。
[(Step 6) Continuation of subordinate processing]
After the completion of (Step 6.3), in (Step .6.4) under (Step .6), the algorithm (Greedy Algorithm) of the greedy method (greedy method) is calculated from the elements in the optimization plan element list. Use to select a join tree that minimizes the cost function that calculates the cost. Subsequently, in (Step .6.5) under (Step .6), the connection tree is set to {T k o }. Subsequently, in (Step .6.6) under (Step .6), the elements of the connection tree {T k o } are replaced with the current version {R 1 o , R 2 o , R 3 o,. . . , R n o }. Subsequently, in (Step .6.7) under (Step 6), the connection elements constituting the connection tree {T k o } are changed to the current version {R 1 o , R 2 o , R 3 o of the connection element list. ,. . . , R n o }.
 (ステップ.6)の終了後、(ステップ.7)では、結合要素リストの現版{T }から、残った最適化を実施し、最適化プラン追加する。 After the completion of (Step 6), in (Step 7), the remaining optimization is performed from the current version {T m o } of the combined element list, and an optimization plan is added.
 続いて、(ステップ.8)では、最適化プランを1つ選択する。 Subsequently, in (Step 8), one optimization plan is selected.
 続いて、(ステップ.9)では、結合要素リストの現版{R ,R ,R ,...,R }を使い、最適化プランを表現し直す。 Subsequently, in (Step 9), the current version of the combined element list {R 1 o , R 2 o , R 3 o,. . . , R n o } to re-express the optimization plan.
 続いて、(ステップ.10)では、最適化プランを応答する。 Subsequently, in (Step 10), the optimization plan is returned.
 当該最適化手法では、大きく2つのアルゴリズムを組み合わせている。一つは「動的計画法(Dynamic Programming)のアルゴリズム」である。具体的には、(ステップ.6.3)の一連の領域が該当する。もう一つは「貪欲法(欲張り法)のアルゴリズム」であり、(ステップ.6.4)等が該当する。この方法では、最大ブロックサイズと呼ぶ結合領域の大きさ範囲を指定し、当初、動的計画法のアルゴリズムにより、その最大ブロックサイズを超えない限り、結合演算を実施する。その後、選択した演算結果から貪欲法(欲張り法)のアルゴリズムでコストの最も小さい方法を選択し、それを中間生成結果として置き換える。以上の手順を複数回、繰り返すことで、全体的にコストの最も小さい方法を選択することになる。 当 該 This optimization method largely combines two algorithms. One is “Dynamic Programming Algorithm”. Specifically, the series of areas in (Step .6.3) corresponds. The other is the “greedy method (greedy method) algorithm”, which corresponds to (step 6.4). In this method, a size range of a combined area called a maximum block size is specified, and a combination operation is initially performed unless the maximum block size is exceeded by a dynamic programming algorithm. After that, the method with the lowest cost is selected from the selected calculation result by the algorithm of the greedy method (greedy method), and it is replaced as an intermediate generation result. By repeating the above procedure a plurality of times, the method with the lowest overall cost is selected.
 当該最適化手法のアルゴリズムを平易に説明するため、それを使って実施された場合のモデルについて、図1を用いて説明する。図1には、当該最適化手法により実際にクエリ最適化の処理を実施する際の段取りを模式的に記したものを示す。ここで、例示のため、利用する関係リストは、5つの関係から構成されているとする。具体的には、関係R、関係R、関係R、関係R、関係Rであり、それぞれを(ステップ.5)までで最適化した結果、関係R 、関係R 、関係R 、関係R 、関係R を含む結合要素リスト3007を作成する。この結果、初期段階の結果3000が生成される。 In order to explain the algorithm of the optimization method in a simple manner, a model when implemented using the algorithm will be described with reference to FIG. FIG. 1 schematically shows a setup for actually executing query optimization processing by the optimization method. Here, for the sake of illustration, it is assumed that the relationship list to be used is composed of five relationships. Specifically, the relation R 1 , the relation R 2 , the relation R 3 , the relation R 4 , and the relation R 5 are optimized as a result of (Step 5), and the relation R 1 o and the relation R 2 o are obtained. , A connection element list 3007 including the relationship R 3 o , the relationship R 4 o , and the relationship R 5 o is created. As a result, an initial result 3000 is generated.
 第1段階では、2つの関係の結合演算を行う。図1では、結合演算のシンボルを3006で記す。これは、当該最適化手法では、(ステップ.6.3)までの処理に相当する。この結果、結合候補3008、3009、3010を含んだ第1段階の2表間結合演算の結果3001が作成される。 In the first stage, a join operation of two relations is performed. In FIG. 1, the symbol of the join operation is denoted by 3006. This corresponds to the process up to (Step .6.3) in the optimization method. As a result, a result 3001 of the first-stage two-table join operation including the join candidates 3008, 3009, and 3010 is created.
 同様に、第2段階では、前述の第1段階の2表間結合の結果3001を基に、3つの関係の結合演算を行う。これは、当該最適化手法では、(ステップ.6.3)までの処理に相当する。この結果、結合候補3011、3012、3013を含んだ第2段階の3表間結合演算の結果3002が作成される。更に、これを基に、貪欲法(欲張り法)のアルゴリズムでコスト評価を行う。当該最適化手法では、(ステップ.6.4)に相当する。この結果、コスト評価が最小の結合候補3012が選択される。なお、貪欲法(欲張り法)のアルゴリズムは一般的であるため、詳細記述はしない。 Similarly, in the second stage, three relational join operations are performed based on the result 3001 of the two-table join in the first stage described above. This corresponds to the process up to (Step .6.3) in the optimization method. As a result, a result 3002 of the second-stage three-table join operation including the join candidates 3011, 3012, and 3013 is created. Further, based on this, cost evaluation is performed by an algorithm of a greedy method (greedy method). This optimization method corresponds to (Step .6.4). As a result, the combination candidate 3012 having the smallest cost evaluation is selected. In addition, since the algorithm of the greedy method (greedy method) is common, detailed description is not carried out.
 第3段階では、最大ブロックサイズが指定の2を超えるため、一度、中間成果をストレージに書き出すことになる。この場合、結合候補3012を結合木{Tk1 }:3014に置き換える。その後、(ステップ.6)に戻り、第二段目の処理のため、初期段階と同様な状態に戻る。この場合、結合要素リスト3007には、結合木{Tk1 }:3014、並びに関係R :3015、関係R :3016が残り、その結果3003として定義される。 In the third stage, since the maximum block size exceeds the specified value 2, intermediate results are once written to the storage. In this case, the connection candidate 3012 is replaced with a connection tree {T k1 o }: 3014. Thereafter, the process returns to (Step 6) and returns to the same state as the initial stage for the second stage processing. In this case, the connection tree {T k1 o }: 3014, the relationship R 3 o : 3015, and the relationship R 5 o : 3016 remain in the connection element list 3007, and are defined as the result 3003.
 第4段階では、2段目の2つの関係の結合演算を行う。これは、当該最適化手法では、(ステップ.6.3)までの処理に相当する。この結果、結合候補3017、3018、3019を含んだ第2段階の2表間結合演算の結果3004が作成される。 In the fourth stage, the join operation of the two relations in the second stage is performed. This corresponds to the process up to (Step .6.3) in the optimization method. As a result, a result 3004 of the second-stage two-table join operation including the join candidates 3017, 3018, and 3019 is created.
 第5段階では、第3段階と同様の処理を行い、3表間の結合演算を行う。この場合、関係R :3015、関係R :3016を先に結合演算した結果と結合木{Tk1 }:3014の結合演算を3表間で行い、結合候補3020が選択され、その上で結合要素リスト3007に反映される。この時点で、(ステップ.6)で記載した条件を満たさないため、(ステップ.7)に移行することになる。その後、(ステップ.9)により結合木を当初の結合要素リスト3007の要素関係R 、関係R 、関係R 、関係R 、関係R で結合木3021を再構成し、所望の最適化案を作成する。それにより、結果3005として生成される。 In the fifth stage, processing similar to that in the third stage is performed, and a join operation between the three tables is performed. In this case, a join operation of the join tree {T k1 o }: 3014 with the result of the join operation of the relation R 3 o : 3015 and the relation R 5 o : 3016 is performed between the three tables, and the join candidate 3020 is selected. After that, it is reflected in the combined element list 3007. At this time, since the condition described in (Step 6) is not satisfied, the process proceeds to (Step 7). After that, the connection tree 3021 is reconstructed with the element relation R 1 o , the relation R 2 o , the relation R 3 o , the relation R 4 o , and the relation R 5 o of the original joining element list 3007 by (Step 9). Then, a desired optimization plan is created. Thereby, a result 3005 is generated.
 <特許文献1>では、クエリ最適化に関する手法ではなく、作成されるレプリカの配置方法に関係するものである。図2A、図2Bに、<特許文献1>の記載を基に、装置構成を記したものを示す。装置構成は、大きく3つの機能層を含む。第1の機能層4100は、アプリケーションプログラム-1:4007、アプリケーションプログラム-L:4008が含まれる機能層である。第2の機能層4200は、データベースをアクセスしデータ収集部4003が配置される機能層である。第3の機能層4300は、リソース管理/トランザクション制御部4011、4012や分散ストレージ4004等が含まれる機能層である。各々の機能層は、LAN(Local Area Network)/WAN(Wide Area Network)/Internet等、各種ネットワーク4001、4002を介して相互にアクセスする。 <Patent Document 1> relates to a method for arranging replicas to be created, not a method related to query optimization. 2A and 2B show the device configuration based on the description in <Patent Document 1>. The device configuration largely includes three functional layers. The first functional layer 4100 is a functional layer including application program-1: 4007 and application program-L: 4008. The second functional layer 4200 is a functional layer in which a database is accessed and the data collection unit 4003 is arranged. The third functional layer 4300 is a functional layer including resource management / transaction control units 4011 and 4012, distributed storage 4004, and the like. Each functional layer accesses each other via various networks 4001 and 4002, such as LAN (Local Area Network) / WAN (Wide Area Network) / Internet.
 アプリケーションプログラム-1:4007は、データベースアクセスクライアント部-1:4009を含む。同様に、アプリケーションプログラム-L:4008は、データベースアクセスクライアント部-L:4010を含む。データベースアクセスクライアント部-1:4009及びデータベースアクセスクライアント部-L:4010は、データ収集部4003にSQL(Structured Query Language)等の実行クエリを要求及び送信する。 Application program-1: 4007 includes a database access client unit-1: 4009. Similarly, the application program-L: 4008 includes a database access client unit-L: 4010. The database access client unit-1: 4009 and the database access client unit-L: 4010 request and transmit an execution query such as SQL (Structured Query Language) to the data collection unit 4003.
 前述のリソース管理/トランザクション制御部4011や前述の分散ストレージ4004等が含まれる機能層には、現在、2つのサイトを前提に記しているが、これに限定される訳ではない。各サイトは、各々、リソース管理/トランザクション制御部4011、4012を含む。 The functional layer including the above-described resource management / transaction control unit 4011 and the above-described distributed storage 4004 is currently described on the assumption of two sites, but is not limited to this. Each site includes resource management / transaction control units 4011 and 4012, respectively.
 リソース管理/トランザクション制御部4011を持つサイトには、分散ストレージ4004が関係付けられて配置されている。当該分散ストレージ4004には、データベースの表領域、索引領域、Log(ログ),ジャーナル等が格納される。この場合、分散ストレージは1つとは限らず、2つ以上でも良い。すなわち、分散ストレージは、少なくとも1つ存在する。 In the site having the resource management / transaction control unit 4011, the distributed storage 4004 is associated and arranged. The distributed storage 4004 stores a database table area, index area, log (log), journal, and the like. In this case, the number of distributed storages is not limited to one and may be two or more. That is, at least one distributed storage exists.
 当該分散ストレージ4004内のデータベースの表領域、索引領域には、関係R11に相当する表Table.11:4014、関係R12に相当する表Table.12:4015、関係R に相当する表Table.1n:4016、中間成果の表4018が格納される。また、索引領域には、索引4017、Log,ジャーナル4019が格納される。 Database tablespaces of the distributed storage in 4004, the index area, the table Table corresponding to the relationship R 11. 11: 4014, Table Table corresponding to the relationship R 12. 12: 4015, table Table.corresponding to relation R 1 n . 1n: 4016, table 4018 of intermediate results is stored. In the index area, an index 4017, a log, and a journal 4019 are stored.
 リソース管理/トランザクション制御部4012を持つサイトは、リソース管理/トランザクション制御部4011を持つサイトのレプリカに相当する。このため、前述のリソース管理/トランザクション制御部4011を持つサイトと同じ構成を取る。 The site having the resource management / transaction control unit 4012 corresponds to a replica of the site having the resource management / transaction control unit 4011. Therefore, the same configuration as that of the site having the resource management / transaction control unit 4011 described above is adopted.
 データ収集部4003に対して、部分レプリカを配置するため、他の分散ストレージ4006が配置されている。ここには、リソース管理/トランザクション制御部4013が配置され、前述のリソース管理/トランザクション制御部4011を持つサイトと概ね同じ構成を取る。 In the data collection unit 4003, another distributed storage 4006 is arranged to arrange a partial replica. Here, a resource management / transaction control unit 4013 is arranged and has almost the same configuration as the site having the resource management / transaction control unit 4011 described above.
 ここで、データ収集部4003は、データベースアクセスクライアント部-1、4009、データベースアクセスクライアント部-L:4010からの問い合わせメッセージs4029、s4035に基づき、リソース管理/トランザクション制御部4011、4012に対する要求メッセージs4030、s4031として転送し、その結果、当該分散ストレージ4004、同様に、分散ストレージ4005内で検索及び取得されたデータを応答メッセージs4033、s4032として受信する。 Here, the data collection unit 4003 receives request messages s4030 to the resource management / transaction control units 4011 and 4012 based on the inquiry messages s4029 and s4035 from the database access client units 1 and 4009 and the database access client unit -L: 4010. As a result, it is transferred as s4031, and as a result, the data retrieved and acquired in the distributed storage 4004 and similarly in the distributed storage 4005 are received as response messages s4033 and s4032.
 続けて、データ収集部4003は、複数の問い合わせメッセージs4029、s4035内で指定された複数の条件を含む問い合わせ群をグループ化し、この条件毎に問い合わせの内容及び条件の集合を求める。また、データ収集部4003は、当該条件の重複範囲に基づいて条件を判定する。更に、データ収集部4003は、問い合わせの重複回数に応じて有効度に関する評価指標を設定する。 Subsequently, the data collection unit 4003 groups a group of queries including a plurality of conditions specified in a plurality of query messages s4029 and s4035, and obtains a query content and a set of conditions for each condition. In addition, the data collection unit 4003 determines a condition based on the overlapping range of the condition. Furthermore, the data collection unit 4003 sets an evaluation index related to the validity level according to the number of times of duplication of inquiries.
 データ収集部4003は、この有効度に関する評価指標に応じて、リソース管理/トランザクション制御部4013の分散ストレージ4006に部分的なレプリカを配置することになる。 The data collection unit 4003 places a partial replica in the distributed storage 4006 of the resource management / transaction control unit 4013 according to the evaluation index related to the effectiveness.
 前述のように、過去の個別技術(従来技術)は、大きくはクエリ最適化方式に関するもの、中間成果の外部配置を前提としたレプリカ技術に関するものに分類される。そして、課題については、以下のようになる。 As described above, past individual technologies (conventional technologies) are roughly classified into those related to the query optimization method and those related to the replica technology based on the external arrangement of intermediate results. The issues are as follows.
 例えば、データベース管理システムの要素技術が、ネットワーク全体に広く、緩やかに連携する環境に分散し、大規模に展開された場合、クエリの中間成果を配置する分散ストレージも広くネットワーク環境に分散されることになる。 For example, if the elemental technology of the database management system is widely distributed across the entire network and distributed in a loosely linked environment, and deployed on a large scale, the distributed storage that places the intermediate results of the query will also be widely distributed in the network environment. become.
 しかし、従来のクエリの最適化手法では、最適化の結果発生するクエリの中間成果を合理的な判断の下で、どの分散ストレージサイトに配置するべきかが考慮されておらず、結果としてクエリの最適化が実施段階で為されていないという課題が存在する。 However, conventional query optimization methods do not consider which distributed storage sites should be placed with reasonable judgment on the intermediate results of queries that occur as a result of optimization. There is a problem that optimization has not been done in the implementation stage.
 この結果、データベースアクセスクライアントの利用者が取り決めた合意サービス水準を満足し得ないという課題も発生する。これを前述の事例毎に記すと以下のようになる。 As a result, there is a problem that the agreed service level decided by the user of the database access client cannot be satisfied. This is described as follows for each case described above.
 <特許文献1>では、問い合わせ群をグループ化し、この条件毎に問い合わせの内容及び条件の集合を求める処理を含み、当該条件の重複範囲に基づいて条件を判定、更に、問い合わせの重複回数に応じて有効度と言う評価指標を設定した上で、その有効度がある程度のものを部分レプリカとして配置することを開示している。部分レプリカは中間成果とは異なるが、配置箇所を複数の選択肢の中から合理的な判断の下で評価している訳ではない。この結果、上記の課題に対して応えることができない。 <Patent Document 1> includes a process for grouping a group of queries and determining a query content and a set of conditions for each condition, determining a condition based on the overlapping range of the condition, and further depending on the number of times the query is duplicated It is disclosed that an evaluation index called “effectiveness” is set, and those having a certain degree of effectiveness are arranged as partial replicas. Although partial replicas are different from intermediate results, placement locations are not evaluated based on reasonable judgment among multiple options. As a result, it is not possible to meet the above problem.
 <非特許文献1>では、現在のクエリ最適化方式の概要について開示している。しかし、レプリカ等の利用は言及されているものの、中間成果の配置を前提とした提案が成されていない。このため、2009年度の今日的な課題である広域な計算機資源の融通、調達、仮想化を前提としたクエリの最適化を扱っている訳ではなく、上記の課題に対して応えることができない。 <Non-Patent Document 1> discloses an overview of the current query optimization method. However, although the use of replicas is mentioned, no proposal has been made on the premise of the placement of intermediate results. Therefore, it does not deal with query optimization based on wide-area computer resource interchange, procurement, and virtualization, which is a current issue in fiscal 2009, and cannot respond to the above-mentioned issues.
 <非特許文献2>では、中間成果をストレージ等に書き出すクエリ最適化方式の概要について開示している。しかし、最適化の結果発生するクエリの中間成果を合理的な判断の下で、どの分散ストレージサイトに配置するべきかが考慮されておらず、結果としてクエリの最適化が実施段階で為されていないという課題が存在する。 <Non-Patent Document 2> discloses an outline of a query optimization method for writing intermediate results to a storage or the like. However, it is not considered in which distributed storage site the intermediate results of the query generated as a result of optimization should be placed based on reasonable judgment, and as a result, query optimization is performed at the implementation stage. There is a problem of not.
特開2002-222108号公報JP 2002-222108 A
 本発明の目的は、データベース管理システム内部のクエリを最適化することであり、特に大規模に仮想化されたストレージを利用する環境において、クエリの最適化の途上で発生する中間成果をより合理的に配置し、実施段階でもクエリの最適化を実現することである。更に、この結果、データベースアクセスクライアントの利用者が取り決めた合意サービス水準を満足させるようにすることである。 The object of the present invention is to optimize the query in the database management system, and in the environment using a large-scale virtualized storage, the intermediate result generated in the course of query optimization is more rational. To optimize the query even at the implementation stage. Further, as a result, the agreed service level decided by the user of the database access client is satisfied.
 上記の目的を達成するため、本発明では、データベース管理システムのクエリの中間成果の配置方式を評価するクエリ最適化システムを提供する。このクエリ最適化システムでは、データを保持するサイトに関するレイテンシ(Latency)及び信頼性に関する数値情報を取り出す。なお、レイテンシとは、データの転送要求などのリクエストを発してから、リクエストの結果が返ってくるまでにかかる遅延時間のことである。また、当該レイテンシに関する数値情報並びに当該信頼性に関する数値情報をクエリのコスト値に重畳し拡張コスト値を求める。また、拡張コスト値で序列化した場合に上位で妥当な(所定の条件に適合する)指定値を持つサイトを選定する。また、最適化の結果発生する中間成果の配置先サイトを選定するため、アルゴリズム中のパラメータを一部変更した処理系自身のコピーを作成することで先読みを実施する。また、当該先読みの結果、中間成果を1つ以上のサイトに配置する。 In order to achieve the above object, the present invention provides a query optimization system that evaluates an arrangement method of intermediate results of queries in a database management system. In this query optimization system, numerical information regarding latency and reliability regarding a site holding data is extracted. Note that the latency is a delay time from when a request such as a data transfer request is issued until the request result is returned. The extended cost value is obtained by superimposing the numerical information related to the latency and the numerical information related to the reliability on the cost value of the query. In addition, a site having a designated value that is appropriate (adapts to a predetermined condition) is selected at the top when ordered by the extended cost value. In addition, in order to select a placement destination site for intermediate results generated as a result of optimization, prefetching is performed by creating a copy of the processing system itself in which some parameters in the algorithm are changed. In addition, as a result of the prefetching, intermediate results are arranged in one or more sites.
 これにより、配置の選定を合理的に実施できるようになる。 This makes it possible to rationally select the placement.
従来技術の「クエリ最適化方式」により生成されるクエリ最適化結果の過程を例示した説明図である。It is explanatory drawing which illustrated the process of the query optimization result produced | generated by the "query optimization system" of a prior art. 従来技術の「クエリ最適化方式」の一部を実現する装置構成を記したブロック図である。It is the block diagram which described the apparatus structure which implement | achieves a part of "query optimization system" of a prior art. 従来技術の「クエリ最適化方式」の一部を実現する装置構成を記したブロック図である。It is the block diagram which described the apparatus structure which implement | achieves a part of "query optimization system" of a prior art. 本発明の「クエリの中間成果の配置方式を評価するクエリ最適化方式」を実現する装置構成を記したブロック図である。It is the block diagram which described the apparatus structure which implement | achieves the "query optimization system which evaluates the arrangement | positioning system of the intermediate result of a query" of this invention. 本発明の「クエリの中間成果の配置方式を評価するクエリ最適化方式」を実現する装置構成を記したブロック図である。It is the block diagram which described the apparatus structure which implement | achieves the "query optimization system which evaluates the arrangement | positioning system of the intermediate result of a query" of this invention. 本発明の「クエリの中間成果の配置方式を評価するクエリ最適化方式」を実現する装置構成の上で、当該クエリ最適化方式を実施する際の全体の処理内容を記したブロック図である。It is the block diagram which described the whole processing content at the time of implementing the said query optimization system on the apparatus structure which implement | achieves the "query optimization system which evaluates the arrangement | positioning system of the intermediate result of a query" of this invention. 本発明の「クエリの中間成果の配置方式を評価するクエリ最適化方式」を実現する装置構成の上で、当該クエリ最適化方式を実施する際の全体の処理内容を記したブロック図である。It is the block diagram which described the whole processing content at the time of implementing the said query optimization system on the apparatus structure which implement | achieves the "query optimization system which evaluates the arrangement | positioning system of the intermediate result of a query" of this invention. 本発明の「クエリの中間成果の配置方式を評価するクエリ最適化方式」の手順で、特徴的な部分を抜き出して、模式的に記した説明図である。It is explanatory drawing which extracted and extracted the characteristic part by the procedure of the "query optimization system which evaluates the arrangement | positioning system of the intermediate result of a query" of this invention. 本発明の「クエリの中間成果の配置方式を評価するクエリ最適化方式」の手順で、特徴的な部分を抜き出して、模式的に記した説明図である。It is explanatory drawing which extracted and extracted the characteristic part by the procedure of the "query optimization system which evaluates the arrangement | positioning system of the intermediate result of a query" of this invention.
 以下に、本発明の第1実施形態について添付図面を参照して説明する。 Hereinafter, a first embodiment of the present invention will be described with reference to the accompanying drawings.
 <基本構成に関する説明>
 本発明である「クエリの中間成果の配置方式を評価するクエリ最適化方式(クエリ最適化システム)」は、図3A、図3Bのような装置構成を前提とし、その環境において処理が実施される。
<Explanation on basic configuration>
The “query optimization method (query optimization system) for evaluating an arrangement method of intermediate results of a query” according to the present invention is based on the device configuration as shown in FIGS. 3A and 3B, and the processing is performed in the environment. .
 そこで、本発明である「クエリの中間成果の配置方式を評価するクエリ最適化方式」の説明の前段として、前提とする図3A、図3Bの装置構成、図4A、図4Bで記される全体の処理内容について説明する。 Therefore, as a pre-stage of the description of the “query optimization method for evaluating the arrangement method of the intermediate result of the query” according to the present invention, the presumed apparatus configuration of FIG. 3A and FIG. 3B and the whole described in FIG. 4A and FIG. 4B The processing contents of will be described.
 図3A、図3Bの装置構成は、代表的であり、最も本質的な構成要素のみを取り出して定義している。このため、これ以外の構成のものを含んでも構わない。装置構成は、大きく1つ以上のネットワークと、少なくとも3つの機能層を含む。ネットワークとして、LAN/WAN/Internet等、各種ネットワーク1001、1002を別のものとして定義しているが、これらと同じものと考えることも可能である。また、その種類に関して、ここでは区別する理由は存在しない。なお、各種ネットワーク1001、1002には、ルータやスイッチ等の中継装置も含まれるものとする。 3A and 3B are representative, and only the most essential components are extracted and defined. For this reason, the thing of a structure other than this may be included. The device configuration largely includes one or more networks and at least three functional layers. As the network, various networks 1001 and 1002 such as LAN / WAN / Internet are defined as different ones, but can be considered to be the same as these. Moreover, there is no reason to distinguish here about the kind. The various networks 1001 and 1002 include relay devices such as routers and switches.
 3つの機能層のうち、第1の機能層1100は、アプリケーションプログラム-1:1013、アプリケーションプログラム-L:1014が含まれる機能層である。第2の機能層1200は、データベースアクセス部-1:1011、データベースアクセス部-M:1012が含まれる機能層である。第3の機能層1300は、リソース管理/トランザクション制御部1015、1016、1017、1018や分散ストレージ1003等が含まれる機能層である。 Among the three functional layers, the first functional layer 1100 is a functional layer including application program-1: 1013 and application program-L: 1014. The second functional layer 1200 is a functional layer including a database access unit-1: 1101 and a database access unit-M: 1012. The third functional layer 1300 is a functional layer including resource management / transaction control units 1015, 1016, 1017, 1018, a distributed storage 1003, and the like.
 各々の機能層は、前述のLAN/WAN/Internet等、各種ネットワーク1001、1002を介して相互にアクセスする。 Each functional layer accesses each other via various networks 1001 and 1002 such as the aforementioned LAN / WAN / Internet.
 [ハードウェアの例示]
 各々の機能層の例として、PC(パソコン)、アプライアンス(appliance)、ワークステーション、メインフレーム、スーパーコンピュータ等の計算機を想定している。また、計算機に限らず、ルータやスイッチ等の中継装置、ファイアウォールや帯域制御装置等の中間装置、その他の通信機器、電子機器、専用デバイス等でも良い。
[Hardware example]
As an example of each functional layer, a computer such as a PC (personal computer), an appliance, a workstation, a mainframe, and a supercomputer is assumed. Further, not limited to computers, relay devices such as routers and switches, intermediate devices such as firewalls and bandwidth control devices, other communication devices, electronic devices, dedicated devices, and the like may be used.
 図示しないが、上記のような計算機等は、プログラムに基づいて駆動し所定の処理を実行するプロセッサと、当該プログラムや各種データを記憶するメモリと、ネットワークとの通信に用いられるインタフェースによって実現される。 Although not shown, the above-described computer or the like is realized by a processor that is driven based on a program and executes predetermined processing, a memory that stores the program and various data, and an interface used for communication with a network. .
 上記のプロセッサの例として、CPU(Central Processing Unit)、マイクロプロセッサ(microprocessor)、マイクロコントローラ、或いは、専用の機能を有する半導体集積回路(Integrated Circuit(IC))等が考えられる。 As examples of the above processor, a CPU (Central Processing Unit), a microprocessor, a microcontroller, or a semiconductor integrated circuit (Integrated Circuit (IC)) having a dedicated function can be considered.
 上記のメモリの例として、RAM(Random Access Memory)、ROM(Read Only Memory)、EEPROM(Electrically Erasable and Programmable Read Only Memory)やフラッシュメモリ等の半導体記憶装置、HDD(Hard Disk Drive)やSSD(Solid State Drive)等の補助記憶装置、又は、DVD(Digital Versatile Disk)等のリムーバブルディスクや、SDメモリカード(Secure Digital memory card)等の記憶媒体(メディア)等が考えられる。また、レジスタ(register)でも良い。 As examples of the above memory, semiconductor storage devices such as RAM (Random Access Memory), ROM (Read Only Memory), EEPROM (Electrically Erasable and Programmable Read Only Memory), and HDD Memory (SDHidK) An auxiliary storage device such as State Drive), a removable disk such as a DVD (Digital Versatile Disk), a storage medium such as an SD memory card (Secure Digital memory card), or the like is conceivable. A register may also be used.
 なお、上記のプロセッサ及び上記のメモリは、一体化していても良い。例えば、近年では、マイコン等の1チップ化が進んでいる。従って、計算機に搭載される1チップマイコンが、プロセッサ及びメモリを備えている事例が考えられる。 Note that the processor and the memory may be integrated. For example, in recent years, a single chip such as a microcomputer has been developed. Therefore, a case where a one-chip microcomputer mounted on a computer includes a processor and a memory can be considered.
 上記のインタフェースの例として、ネットワーク通信に対応した基板(マザーボード、I/Oボード)やチップ等の半導体集積回路、NIC(Network Interface Card)等のネットワークアダプタや同様の拡張カード、アンテナ等の通信装置、接続口(コネクタ)等の通信ポート等が考えられる。 Examples of the above interfaces include semiconductor integrated circuits such as boards (motherboards and I / O boards) and chips that support network communication, network adapters such as NIC (Network Interface Card), and communication devices such as expansion cards and antennas. A communication port such as a connection port (connector) is conceivable.
 また、ネットワークの例として、インターネット、LAN(Local Area Network)、無線LAN(Wireless LAN)、WAN(Wide Area Network)、バックボーン(Backbone)、ケーブルテレビ(CATV)回線、固定電話網、携帯電話網、WiMAX(IEEE 802.16a)、3G(3rd Generation)、専用線(lease line)、IrDA(Infrared Data Association)、Bluetooth(登録商標)、シリアル通信回線、データバス等が考えられる。 Examples of networks include the Internet, LAN (Local Area Network), wireless LAN (Wireless LAN), WAN (Wide Area Network), backbone (Backbone), cable TV (CATV) line, fixed telephone network, mobile phone network, WiMAX (IEEE 802.16a), 3G (3rd Generation), dedicated line (lease line), IrDA (Infrared Data Association), Bluetooth (registered trademark), serial communication line, data bus, and the like are conceivable.
 すなわち、各々の機能層内の各部(内部構成)は、専用デバイスに限らず、モジュール(module)又はコンポーネント(component)でも良い。 That is, each unit (internal configuration) in each functional layer is not limited to a dedicated device, but may be a module or a component.
 但し、実際には、これらの例に限定されない。 However, actually, it is not limited to these examples.
 アプリケーションプログラム-1:1013は、データベースアクセスクライアント部-1:1060を含む。データベースアクセスクライアント部-1:1060は、データベースアクセス部-1:1011にSQL等の実行クエリを要求及び送信する。 Application program-1: 1013 includes a database access client unit-1: 1060. The database access client unit-1: 1060 requests and transmits an execution query such as SQL to the database access unit-1: 1101.
 データベースアクセス部-1:1011は、クエリ受付部1048と、クエリ解析部1049と、ディクショナリアクセス部1050と、最適化部1051と、処理実行部-1:1053と、処理実行部-N:1054を含む。クエリ受付部1048は、当該データベースアクセスクライアント部-1:1060からの要求実行クエリを受け付ける。ディクショナリアクセス部1050は、その要求実行クエリに含まれるべき定義情報を管理するディクショナリに対するアクセスを行う。クエリ解析部1049は、当該ディクショナリアクセス部1050の情報を基に要求実行クエリを分析する。最適化部1051は、解析した要求実行クエリの最適化を、本発明である「クエリの中間成果の配置方式を評価するクエリ最適化方式」を用いて行う。処理実行部-1:1053及び処理実行部-N:1054は、当該最適化部1051の結果を受けて、前述の要求実行クエリを分解し、個別の実行単位を実施する。 The database access unit-1: 1011 includes a query reception unit 1048, a query analysis unit 1049, a dictionary access unit 1050, an optimization unit 1051, a process execution unit-1: 1053, and a process execution unit-N: 1054. Including. The query reception unit 1048 receives a request execution query from the database access client unit-1: 1060. The dictionary access unit 1050 accesses a dictionary that manages definition information to be included in the request execution query. The query analysis unit 1049 analyzes the request execution query based on the information of the dictionary access unit 1050. The optimization unit 1051 performs optimization of the analyzed request execution query using the “query optimization method for evaluating the arrangement method of intermediate results of queries” according to the present invention. The process execution unit-1: 1053 and the process execution unit-N: 1054 receive the result of the optimization unit 1051, decompose the request execution query, and execute individual execution units.
 更に、データベースアクセス部-1:1011は、統計情報管理部エージェント管理部1052と、トランザクション管理制御部1055と、Log,ジャーナル1056も含む。トランザクション管理制御部1055は、処理実行部-1:1053並びに処理実行部-N:1054が、中間成果を生成管理するシステムトランザクションを実施する。Log,ジャーナル1056は、システムトランザクションのログとジャーナルを管理する。統計情報管理部エージェント管理部1052は、実際の要求実行クエリの要素を実施した結果を収集し、前述の最適化部1051の処理に反映する。 Further, the database access unit-1: 1101 includes a statistical information management unit agent management unit 1052, a transaction management control unit 1055, and a log and journal 1056. The transaction management control unit 1055 executes a system transaction in which the process execution unit-1: 1053 and the process execution unit-N: 1054 generate and manage intermediate results. Log, journal 1056 manages the log and journal of the system transaction. The statistical information management unit agent management unit 1052 collects the results of executing the actual request execution query elements and reflects them in the processing of the optimization unit 1051 described above.
 前述のクエリ受付部1048は、実際に要求された要求実行クエリである指定クエリ1057を保持する。前述の処理実行部-1:1053並びに前述の処理実行部-N:1054は、分解された実行単位である1058、並びに1059を含む。 The above-described query receiving unit 1048 holds a designated query 1057 that is a request execution query that is actually requested. The above-described process execution unit-1: 1053 and the above-described process execution unit-N: 1054 include 1058 and 1059 which are decomposed execution units.
 図3A、図3Bでは、前述のリソース管理/トランザクション制御部1015や前述の分散ストレージ1003等が含まれる機能層に4つのサイトを記しているが、これに限定される訳ではない。各サイトは、各々、リソース管理/トランザクション制御部1015、1016、1017、1018を含む。 3A and 3B show four sites in the functional layer including the above-described resource management / transaction control unit 1015 and the above-described distributed storage 1003. However, the present invention is not limited to this. Each site includes resource management / transaction control units 1015, 1016, 1017, and 1018, respectively.
 リソース管理/トランザクション制御部1015を持つサイトには、分散ストレージ1003、並びに分散ストレージ1004が関係付けられて配置されている。当該分散ストレージ1003は、データベースの表領域、索引領域を持つ。別の分散ストレージ1004には、Log,ジャーナル等が格納される。この場合、分散ストレージは2つとは限らず、1つもしくは3つ以上でも良い。すなわち、分散ストレージは、少なくとも1つ存在する。 In the site having the resource management / transaction control unit 1015, the distributed storage 1003 and the distributed storage 1004 are arranged in association with each other. The distributed storage 1003 has a database table area and an index area. Another distributed storage 1004 stores Logs, journals, and the like. In this case, the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
 更に、リソース管理/トランザクション制御部1015を持つサイトは、性能/エラー情報/統計情報サービス1019も含む。 Further, the site having the resource management / transaction control unit 1015 also includes a performance / error information / statistical information service 1019.
 当該分散ストレージ1003内のデータベースの表領域、索引領域には、関係Rに相当する表Table.1:1023、関係Rに相当する表Table.2:1024、関係Rに相当する表Table.n:1025、中間成果の表1030が格納される。また、索引領域には、索引1029が格納される。 Database tablespaces in the distributed storage 1003, the index area, the table Table corresponding to the relationship R 1. 1: 1023, Table Table corresponding relationship R 2. 2: 1024, Table Table corresponding to the relationship R n. n: 1025, table 1030 of intermediate results is stored. An index 1029 is stored in the index area.
 前述の分散ストレージ1004は、Log,ジャーナル1036と、性能情報,エラー情報1037と、オブジェクト毎の統計情報1038を含む。性能情報,エラー情報1037は、全般的な性能情報とエラー情報を管理する。なお、情報の管理手法としては、情報の統合/リスト化/グループ化/データベース化等が考えられる。但し、実際には、これらの例に限定されない。オブジェクト毎の統計情報1038は、分散ストレージ1003上の前述の表Table.1:1023と、前述の表Table.2:1024と、前述の表Table.n:1025と、前述の索引1029に対するアクセス性能の統計情報を管理する。 The above-mentioned distributed storage 1004 includes a log, journal 1036, performance information, error information 1037, and statistical information 1038 for each object. The performance information and error information 1037 manages general performance information and error information. As information management methods, information integration / listing / grouping / databaseing, etc. can be considered. However, actually, it is not limited to these examples. The statistical information 1038 for each object is stored in the table Table. 1: 1023 and the table Table. 2: 1024 and the above table Table. n: 1025 and the statistical information of the access performance for the index 1029 described above are managed.
 リソース管理/トランザクション制御部1016を持つサイトは、リソース管理/トランザクション制御部1015を持つサイトのレプリカに相当する。このため、前述のリソース管理/トランザクション制御部1015を持つサイトと同じ構成を取る。 The site having the resource management / transaction control unit 1016 corresponds to a replica of the site having the resource management / transaction control unit 1015. Therefore, the same configuration as that of the site having the resource management / transaction control unit 1015 described above is adopted.
 リソース管理/トランザクション制御部1016を持つサイトには、分散ストレージ1005、並びに分散ストレージ1006が関係付けられて配置されている。当該分散ストレージ1005は、データベースの表領域、索引領域を持つ。別の分散ストレージ1006には、Log,ジャーナル等が格納される。この場合、分散ストレージは2つとは限らず、1つもしくは3つ以上でも良い。すなわち、分散ストレージは、少なくとも1つ存在する。 In the site having the resource management / transaction control unit 1016, the distributed storage 1005 and the distributed storage 1006 are arranged in association with each other. The distributed storage 1005 has a database table area and an index area. Another distributed storage 1006 stores Log, journal, and the like. In this case, the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
 更に、リソース管理/トランザクション制御部1016を持つサイトは、性能/エラー情報/統計情報サービス1020も含む。 Further, the site having the resource management / transaction control unit 1016 also includes a performance / error information / statistical information service 1020.
 当該分散ストレージ1005内のデータベースの表領域、索引領域には、関係Rに相当し、前述の表Table.1:1023のレプリカである表Table.1:1026、関係Rに相当し、前述の表Table.2:1024のレプリカである表Table.2:1027、関係Rに相当し、前述の表Table.n:1025のレプリカである表Table.n:1028、中間成果の表1032が格納される。また、索引領域には、索引1031が格納される。 Database tablespaces in the distributed storage 1005, the index area corresponds to the relationship R 1, the previous table Table. 1: Table 102, which is a replica of 1023. 1: 1026, equivalent to relation R 2, above table Table. 2: Table 1024 which is a replica of 1024. 2: 1027, corresponding to the relationship R n , and the above table Table. n: A table Table. n: 1028, table 1032 of intermediate results is stored. An index 1031 is stored in the index area.
 前述の分散ストレージ1006は、Log,ジャーナル1039と、性能情報,エラー情報1040と、オブジェクト毎の統計情報1041を含む。性能情報,エラー情報1040は、全般的な性能情報とエラー情報を管理する。オブジェクト毎の統計情報1041は、分散ストレージ1005上の前述の表Table.1:1026と、前述の表Table.2:1027と、前述の表Table.n:1028と、前述の索引1031に対するアクセス性能の統計情報を管理する。 The above-mentioned distributed storage 1006 includes a log, journal 1039, performance information, error information 1040, and statistical information 1041 for each object. The performance information and error information 1040 manages general performance information and error information. The statistical information 1041 for each object is stored in the table Table. 1: 1026 and the table Table. 2: 1027 and the above table Table. n: 1028 and the statistical information of the access performance for the index 1031 is managed.
 リソース管理/トランザクション制御部1017を持つサイトは、索引管理に特化したサイトに相当する。リソース管理/トランザクション制御部1017を持つサイトには、分散ストレージ1007、並びに分散ストレージ1008が関係付けられて配置されている。当該分散ストレージ1007は、データベースの表領域、索引領域を持つ。別の分散ストレージ1008には、Log,ジャーナル等が格納される。この場合、分散ストレージは2つとは限らず、1つもしくは3つ以上でも良い。すなわち、分散ストレージは、少なくとも1つ存在する。 The site having the resource management / transaction control unit 1017 corresponds to a site specialized in index management. At the site having the resource management / transaction control unit 1017, the distributed storage 1007 and the distributed storage 1008 are arranged in association with each other. The distributed storage 1007 has a database table area and an index area. Another distributed storage 1008 stores Log, journal, and the like. In this case, the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
 更に、リソース管理/トランザクション制御部1017を持つサイトは、性能/エラー情報/統計情報サービス1021も含む。 Furthermore, the site having the resource management / transaction control unit 1017 also includes a performance / error information / statistical information service 1021.
 当該分散ストレージ1007内のデータベースの表領域、索引領域には、中間成果の表1034が格納される。また、索引領域には、索引1033が格納される。 The intermediate result table 1034 is stored in the table area and index area of the database in the distributed storage 1007. An index 1033 is stored in the index area.
 前述の分散ストレージ1008は、Log,ジャーナル1042と、性能情報,エラー情報1043と、オブジェクト毎の統計情報1044を含む。性能情報,エラー情報1043は、全般的な性能情報とエラー情報を管理する。オブジェクト毎の統計情報1044は、分散ストレージ1007上の前述の索引1033に対するアクセス性能の統計情報を管理する。 The above-described distributed storage 1008 includes a log, journal 1042, performance information, error information 1043, and statistical information 1044 for each object. The performance information and error information 1043 manages general performance information and error information. The statistical information 1044 for each object manages the statistical information on the access performance for the index 1033 on the distributed storage 1007.
 リソース管理/トランザクション制御部1018を持つサイトは、中間成果の管理に特化したサイトに相当する。リソース管理/トランザクション制御部1018を持つサイトには、分散ストレージ1009、並びに分散ストレージ1010が関係付けられて配置されている。当該分散ストレージ1009は、データベースの表領域、索引領域を持つ。別の分散ストレージ1010には、Log,ジャーナル等が格納される。この場合、分散ストレージは2つとは限らず、1つもしくは3つ以上でも良い。すなわち、分散ストレージは、少なくとも1つ存在する。 The site having the resource management / transaction control unit 1018 corresponds to a site specialized for managing intermediate results. At a site having the resource management / transaction control unit 1018, a distributed storage 1009 and a distributed storage 1010 are arranged in association with each other. The distributed storage 1009 has a database table area and an index area. Another distributed storage 1010 stores Logs, journals, and the like. In this case, the number of distributed storages is not limited to two and may be one or three or more. That is, at least one distributed storage exists.
 更に、リソース管理/トランザクション制御部1018を持つサイトは、性能/エラー情報/統計情報サービス1022も含む。 Furthermore, the site having the resource management / transaction control unit 1018 also includes a performance / error information / statistical information service 1022.
 当該分散ストレージ1009内のデータベースの表領域、索引領域には、中間成果の表1035が格納される。 The intermediate result table 1035 is stored in the table area and index area of the database in the distributed storage 1009.
 前述の分散ストレージ1009は、Log,ジャーナル情報1045と、性能情報,エラー情報1046と、オブジェクト毎の統計情報1047を含む。性能情報,エラー情報1046は、全般的な性能情報とエラー情報を管理する。オブジェクト毎の統計情報1047は、分散ストレージ1010上の任意オブジェクトに対するアクセス性能の統計情報を管理する。 The above-described distributed storage 1009 includes Log, journal information 1045, performance information, error information 1046, and statistical information 1047 for each object. The performance information and error information 1046 manages general performance information and error information. The statistical information 1047 for each object manages statistical information on access performance for an arbitrary object on the distributed storage 1010.
 <全体動作の説明>
 ここでは、本発明である「クエリの中間成果の配置方式を評価するクエリ最適化方式」の説明の前段として、図4A、図4Bで記される全体の処理内容について説明する。
<Description of overall operation>
Here, the entire processing contents shown in FIGS. 4A and 4B will be described as the first stage of the description of the “query optimization method for evaluating the arrangement method of intermediate results of queries” according to the present invention.
 前述のデータベースアクセスクライアント部-1:1060は、データベースをアクセスする場合、データベースアクセス部-1:1011内のクエリ受付部1048に、要求実行クエリをメッセージs100として送信する。メッセージs100には、SQLでの記述が記されており、関係Rから関係Rの自然結合を実施した要求実行クエリの記述が含まれているものとする。 When accessing the database, the aforementioned database access client unit-1: 1060 transmits a request execution query as a message s100 to the query reception unit 1048 in the database access unit-1: 1101. The message s100, described in SQL are marked, it is assumed to include a description of request execution query was performed natural binding relationship R n from the relationship R 1.
 当該クエリ受付部1048は、指定クエリ1057と同等な指定クエリ表現s101をクエリ解析部1049に渡す。クエリ解析部1049は、ディクショナリアクセス部1050から定義情報s102を入手し、指定クエリ表現s101を中間表現s103として変換し、最適化部1051に渡す。 The query reception unit 1048 passes the specified query expression s101 equivalent to the specified query 1057 to the query analysis unit 1049. The query analysis unit 1049 obtains the definition information s102 from the dictionary access unit 1050, converts the designated query expression s101 as the intermediate expression s103, and passes it to the optimization unit 1051.
 最適化部1051は、最適化された実行プランを算出し、その一部の実行プラン記述s124、s105を前述の処理実行部-1:1053から前述の処理実行部-N:1054に渡す。最適化に当たっては、統計情報管理部エージェント管理部1052から種々のレイテンシ(Latency)情報、信頼性情報s104を入手する。 The optimization unit 1051 calculates an optimized execution plan, and passes a part of the execution plan descriptions s124 and s105 from the process execution unit-1: 1053 to the process execution unit-N: 1054. In the optimization, various latency information and reliability information s104 are obtained from the statistical information management unit agent management unit 1052.
 図4A、図4Bの例では、前述の処理実行部-N:1054において、一部実行プラン記述s105が渡され、クエリの実行単位1058が前述の処理実行部-N:1054内に保持される。また、前述の処理実行部-1:1053において、一部実行プラン記述s124が渡され、クエリの実行単位1059が前述の処理実行部-1:1053内に保持される。特にクエリの実行単位1059では、中間成果表を利用するのに対して、クエリの実行単位1058では当該中間成果表を生成することになる。このため、後に説明するシステムトランザクション処理が実施されることになる。 In the example of FIGS. 4A and 4B, the partial execution plan description s105 is passed in the processing execution unit-N: 1054 described above, and the query execution unit 1058 is held in the processing execution unit-N: 1054 described above. . Further, the partial execution plan description s124 is passed to the above-described process execution unit-1: 1053, and the query execution unit 1059 is held in the above-described process execution unit-1: 1053. In particular, the query execution unit 1059 uses the intermediate result table, whereas the query execution unit 1058 generates the intermediate result table. For this reason, system transaction processing described later is performed.
 前述の処理実行部-N:1054は、クエリの実行単位1058を処理するため、レプリカの中で最適なサイトとして選択された前述のリソース管理/トランザクション制御部1015を持つサイトに対して、クエリ実行要求を含むメッセージs106を送信する。 The above-described processing execution unit-N: 1054 executes a query on the site having the above-described resource management / transaction control unit 1015 selected as the optimum site in the replica in order to process the query execution unit 1058. Send a message s106 containing the request.
 前述のリソース管理/トランザクション制御部1015は、メッセージs106を受け付けると、当該メッセージs106に記載された内容に従い、関係Rに相当する表Table.1:1023から該当行s107を入手し、これを駆動側として、関係Rに相当する表Table.2:1024から該当行s108を結合しながら入手する。更に、その結合結果を駆動側として、前述の索引1029を利用し、主キーの行ID情報s110を入手の後、関係Rに相当する表Table.n:1025から該当行s109を結合しながら入手する。 Above resource management / transaction control section 1015 accepts the message s106, in accordance with the contents described in the message s106, tables Table corresponding to the relationship R 1. 1: 1023 and the destination line s107, obtained from it as the drive side, the table Table corresponding relationship R 2. 2: Get the corresponding row s108 from 1024. Furthermore, the binding results in the drive side, by using the index 1029 of the above, after the obtained line ID information s110 of the primary key table Table corresponding to the relationship R n. n: Obtained from 1025 while combining the corresponding row s109.
 その後、前述のリソース管理/トランザクション制御部1015は、クエリの実行単位1058に対する中間成果情報である応答結果をメッセージs111として、一度、前述の処理実行部-N:1054に戻す。 Thereafter, the resource management / transaction control unit 1015 returns the response result, which is intermediate result information for the query execution unit 1058, as the message s111 once to the processing execution unit-N: 1054.
 当該メッセージs111に記載された応答結果は、中間成果表として一時的に分散ストレージ等に記録される必要がある。このため、前述の処理実行部-N:1054は本発明の一部を使って書き出すストレージを決定した後、前述のリソース管理/トランザクション制御部1018を保持するサイトに応答結果を含んだ中間成果表の作成要求に関するメッセージs112を送付する。 The response result described in the message s111 needs to be temporarily recorded in the distributed storage as an intermediate result table. For this reason, after the processing execution unit-N: 1054 determines a storage to be written using a part of the present invention, an intermediate result table including a response result in the site holding the resource management / transaction control unit 1018 described above. A message s112 related to the creation request is sent.
 前述のリソース管理/トランザクション制御部1018は、当該メッセージs112を受信するとトランザクションとして中間成果表を生成するため、更新前イメージをLog,ジャーナル1045に記録するための要求s113を送付する。この場合は、中間成果表が存在していないため、実質は存在しない。 When the resource management / transaction control unit 1018 receives the message s112, the resource management / transaction control unit 1018 sends a request s113 for recording the pre-update image in the Log and journal 1045 in order to generate an intermediate result table as a transaction. In this case, there is no real result because there is no interim result table.
 その後、前述のリソース管理/トランザクション制御部1018は、当該メッセージs112に記載された中間成果s114を取り出し、中間成果表1035に書き込む。その後、トランザクションコミット待ちの状態に遷移し、前述のリソース管理/トランザクション制御部1018はコミット待ちメッセージs116を前述の処理実行部-N:1054に戻す。 Thereafter, the resource management / transaction control unit 1018 described above extracts the intermediate result s114 described in the message s112 and writes it in the intermediate result table 1035. Thereafter, the state transits to a transaction commit wait state, and the resource management / transaction control unit 1018 returns the commit wait message s116 to the process execution unit-N: 1054.
 前述の処理実行部-N:1054は、データベースアクセス部-1:1011内のトランザクション管理制御部1055に対して、トランザクションの状態管理要求s117を事前に出している。当該トランザクション管理制御部1055は、システムトランザクションに関してLog,ジャーナル1056に登録要求s118を発行している。 The aforementioned process execution unit-N: 1054 issues a transaction state management request s117 in advance to the transaction management control unit 1055 in the database access unit-1: 1101. The transaction management control unit 1055 issues a registration request s118 to the Log and journal 1056 regarding the system transaction.
 以上の結果、当該トランザクション管理制御部1055は、トランザクションを実施する上で特に課題にならないと判断した場合、当該システムトランザクションの要求元である処理実行部-N:1054にその旨を通知する。処理実行部-N:1054は、中間成果表の作成要求に関するメッセージs112に対するコミット要求メッセージs119を前述のリソース管理/トランザクション制御部1018に送信する。 As a result, when the transaction management control unit 1055 determines that there is no particular problem in executing the transaction, the transaction management control unit 1055 notifies the processing execution unit-N: 1054 that is a request source of the system transaction. The process execution unit-N: 1054 transmits a commit request message s119 to the message management / transaction control unit 1018 in response to the message s112 related to the intermediate result table creation request.
 前述のリソース管理/トランザクション制御部1018が前述のコミット要求メッセージs119を受信すると、前述のLog,ジャーナル1045に対してコミット結果書き込みs120を実施する。 When the above-described resource management / transaction control unit 1018 receives the above-described commit request message s119, the commit result writing s120 is performed on the above-described log and journal 1045.
 その後、当該中間成果表1035の作成に関するトランザクションの実施結果に関係して前述のリソース管理/トランザクション制御部1018のサイト内でオブジェクト毎の統計情報1047に実績データを書き込む要求s121、並びに性能情報,エラー情報1046を書き込む要求s122を発行する。 Thereafter, the request s121 for writing the actual data to the statistical information 1047 for each object in the site of the resource management / transaction control unit 1018 in relation to the execution result of the transaction related to the creation of the intermediate result table 1035, the performance information, and the error A request s122 for writing information 1046 is issued.
 一連の動作を終了したならば、前述のリソース管理/トランザクション制御部1018は、応答メッセージs123を要求元である処理実行部-N:1054に送付する。 When the series of operations is completed, the resource management / transaction control unit 1018 described above sends a response message s123 to the requesting process execution unit-N: 1054.
 その後、当該中間成果表1035を利用する形で、処理実行部-1:1053が起動する。当該処理実行部-1:1053において、一部実行プラン記述s124が渡され、クエリの実行単位1059が前述の処理実行部-1:1053内に保持されている。 After that, the process execution unit-1: 1053 is activated using the intermediate result table 1035. In the processing execution unit-1: 1053, a partial execution plan description s124 is passed, and the execution unit 1059 of the query is held in the above-described processing execution unit-1: 1053.
 前述の処理実行部-1:1053はクエリの実行単位1059を処理するため、前述のリソース管理/トランザクション制御部1018を持つサイトに対してクエリ実行要求を含むメッセージs125を送信する。それと伴に、レプリカの中で最適なサイトとして選択された前述のリソース管理/トランザクション制御部1016を持つサイトに対してクエリ実行要求を含むメッセージs127を送信する。 In order to process the query execution unit 1059, the processing execution unit-1: 1053 described above transmits a message s125 including a query execution request to the site having the resource management / transaction control unit 1018 described above. At the same time, a message s127 including a query execution request is transmitted to the site having the resource management / transaction control unit 1016 selected as the optimum site in the replica.
 前述のリソース管理/トランザクション制御部1018が、クエリ実行要求を含むメッセージs125を受け付けると、当該メッセージs125に記載された内容に従い、中間成果表1035から該当行s126を入手し、要求元の処理実行部-1:1059に応答する。 When the resource management / transaction control unit 1018 receives the message s125 including the query execution request, the resource management / transaction control unit 1018 obtains the corresponding row s126 from the intermediate result table 1035 in accordance with the contents described in the message s125, and executes the requesting process execution unit. -1: Responds to 1059.
 また、前述のリソース管理/トランザクション制御部1016が、クエリ実行要求を含むメッセージs127を受け付けると、当該メッセージs127に記載された内容に従い、関係R(k+1)に相当する表Table.(k+1)等から該当行を入手する。 When the above-described resource management / transaction control unit 1016 receives the message s127 including the query execution request, the table Table.1 corresponding to the relationship R (k + 1) according to the content described in the message s127. Get the corresponding line from (k + 1) etc.
 その後、前述のリソース管理/トランザクション制御部1016は、クエリの実行単位1059に対する応答結果をメッセージs128として、前述の処理実行部-1:1053に戻す。 After that, the resource management / transaction control unit 1016 returns the response result to the query execution unit 1059 as the message s128 to the process execution unit-1-1053.
 各々の前述の処理実行部-1:1053、並びに前述の処理実行部-N:1054の結果は、それぞれ結果応答s129、s130として前述のクエリ受付部1048に戻され、その結果を統合した上で、呼び出し元のデータベースアクセスクライアント部-1:1060へ応答メッセージs131として戻される。 The results of each of the above-described processing execution units-1: 1053 and the above-described processing execution unit-N: 1054 are returned to the query reception unit 1048 as result responses s129 and s130, respectively, and the results are integrated. The response message s131 is returned to the calling database access client unit-1: 1060.
 前述の最適化部1051は、最適化された実行プランを算出する際に統計情報管理部エージェント管理部1052から種々のレイテンシ情報、信頼性情報s104を入手する。これらは数値で表現された情報である。このため、当該統計情報管理部エージェント管理部1052は、各サイト上の性能/エラー情報/統計情報サービス1019:1020:1021:1022を介して、それらの情報を入手、管理する。 The aforementioned optimization unit 1051 obtains various latency information and reliability information s104 from the statistical information management unit agent management unit 1052 when calculating the optimized execution plan. These are information expressed numerically. Therefore, the statistical information management unit agent management unit 1052 acquires and manages the information via the performance / error information / statistical information service 1019: 1020: 1021: 1022 on each site.
 例えば、前述のリソース管理/トランザクション制御部1015が管理するサイトには、前述の性能/エラー情報/統計情報サービス1019が配置されており、前述の性能情報,エラー情報1037からは性能/エラー情報s134を、前述のオブジェクト毎の統計情報1038からは統計情報s133を入手する。 For example, the performance / error information / statistical information service 1019 described above is arranged at a site managed by the resource management / transaction control unit 1015 described above, and the performance / error information s134 is determined from the performance information / error information 1037 described above. The statistical information s133 is obtained from the statistical information 1038 for each object.
 その後、前述の性能/エラー情報/統計情報サービス1019は、それらを統合して性能/エラー情報/統計情報報告s132として報告する。前述の統計情報管理部エージェント管理部1052は、全てのサイト上の性能/エラー情報/統計情報サービス1019:1020:1021:1022から、当該性能/エラー情報/統計情報報告s132を入手し、各サイトのレイテンシ情報、信頼性情報として管理する。 Thereafter, the aforementioned performance / error information / statistical information service 1019 integrates them and reports them as a performance / error information / statistical information report s132. The above-described statistical information management unit agent management unit 1052 obtains the performance / error information / statistical information report s132 from the performance / error information / statistical information service 1019: 1020: 1021: 1022 on all sites. As latency information and reliability information.
 <クエリの中間成果の配置方式を評価するクエリ最適化方式>
 図5A、図5Bは、本発明の「クエリの中間成果の配置方式を評価するクエリ最適化方式」の手順で、特徴的な部分を抜き出して、模式的に記した説明図である。当該クエリ最適化方式は、図3A、図3Bにおける最適化部1051を中心に実装され、その処理過程の中でクエリ解析部1049、トランザクション管理制御部1055、統計情報管理部エージェント管理部1052が利用される。そこで、以下簡単にその手順を記す。
<Query optimization method that evaluates the layout of query intermediate results>
FIG. 5A and FIG. 5B are explanatory diagrams schematically showing a characteristic part extracted in the procedure of the “query optimization method for evaluating the arrangement method of intermediate results of queries” of the present invention. The query optimization method is mainly implemented by the optimization unit 1051 in FIGS. 3A and 3B, and is used by the query analysis unit 1049, the transaction management control unit 1055, and the statistical information management unit agent management unit 1052 in the processing process. Is done. Therefore, the procedure is briefly described below.
 (ステップ.1)は、クエリ解析部1049により実施される。クエリ解析部1049は、指定クエリ1057の構文解析を行い、参照する表集合{R,R,...,R}を取り出す。 (Step 1) is performed by the query analysis unit 1049. The query analysis unit 1049 performs syntax analysis of the specified query 1057 and refers to a set of tables {R 1 , R 2 ,. . . , R n }.
 (ステップ.2)は、最適化部1051により実施される。このステップでは、その配下の(ステップ.2.1)において、最適化部1051は、表集合の各要素表Rで索引を効果的に利用する等の想定される全てのアクセスパターンを抽出した上で、仮結合要素リストに追加する。これを全ての要素表Rに対して行う(m=1~n)。これにより、最適化の候補抽出を行うことになる。 (Step 2) is performed by the optimization unit 1051. In this step, in the subordinate (step 2.1), the optimization unit 1051 extracts all possible access patterns such as effectively using an index in each element table R m of the table set. Above, add to the temporary binding element list. This is performed for all the element tables R m (m = 1 to n). Thereby, optimization candidates are extracted.
 (ステップ.3)も、最適化部1051により実施される。このステップでは、大きく2つの事項を表集合の全ての各要素表Rで行う(m=1~n)。 (Step 3) is also performed by the optimization unit 1051. In this step, two major items are performed for each element table R m in the table set (m = 1 to n).
 [(ステップ.3)配下の処理]
 (ステップ.3)では、始めは配下の(ステップ.3.1)において、(ステップ.2.1)で実施した各要素表Rに関して仮結合要素リストから最適なアクセス条件のものを一つ取り出し、取り出したもののみを残して他を仮結合要素リストから削除することでコストを最適化するクエリ要素の選別を行う。続けて、配下の(ステップ.3.2)において、仮結合要素リストの各要素に対して、全レプリカ分{R11,R12,…,R1r,R21,…,R2r,...,Rnr}のエントリを作成し、仮結合要素リストに追加する。
[(Step 3) Subordinate processing]
One (Step .3) (Step .3.1) under began, those optimum access condition from the temporary binding element list for each element table R m which was performed in (step .2.1) The query elements that optimize the cost are selected by taking out and deleting only the extracted ones and deleting others from the temporary joining element list. Subsequently, in (step .3.2) under, for each element of the temporary coupling element list, all replicas min {R 11, R 12, ... , R 1r, R 21, ..., R 2r,. . . , R nr } is created and added to the temporary combined element list.
 (ステップ.4)も、最適化部1051により実施される。この(ステップ.4)と(ステップ.5)は、本発明の特徴的な事項の一つであり、(ステップ.4)では配下に2つの手続きを持つ。(ステップ.4)では、仮結合要素リスト中のエントリRmkに対して、配下の処理を繰り返して実施する(k=1~r)。 (Step 4) is also performed by the optimization unit 1051. These (Step 4) and (Step 5) are one of the characteristic matters of the present invention, and (Step 4) has two procedures under it. In (Step 4), the subordinate processing is repeated for the entry R mk in the temporary coupling element list (k = 1 to r).
 [(ステップ.4)配下の処理]
 (ステップ.4.1)において、当該最適化部1051はレプリカ部分も含めて前述の仮結合要素リストから1エントリを取り出し、そのエントリを保持する分散ストレージサイトに相当するレイテンシ及び信頼性に関する数値情報を前述の統計情報管理部エージェント管理部1052から取り出す。その後、(ステップ.4.2)において、前述のレイテンシ及び信頼性に関する数値情報を、仮結合要素リストから取り出した前述のエントリに付与する形で、前述の仮結合要素リストを更新する(m=1~n)。
[(Step 4) Subordinate processing]
In (Step 4.1), the optimizing unit 1051 takes out one entry from the temporary coupling element list including the replica part, and numerical information on latency and reliability corresponding to the distributed storage site holding the entry. Is extracted from the above-described statistical information management unit agent management unit 1052. Thereafter, in (Step 4.2), the above-described temporary coupling element list is updated in such a manner that the numerical information related to the latency and reliability is added to the above-described entry extracted from the temporary coupling element list (m = 1-n).
 (ステップ.5)も、最適化部1051により実施される。(ステップ.5)では、仮結合要素リスト中のエントリRmkに対して、配下の処理を繰り返して実施する(m=1~n)。 (Step 5) is also performed by the optimization unit 1051. In (Step 5), the subordinate processing is repeated for the entry R mk in the temporary coupling element list (m = 1 to n).
 [(ステップ.5)配下の処理]
 ここでは、(ステップ.5.1)において、該最適化部1051は、レプリカ部分も含めて前述の仮結合要素リストから1エントリを取り出す。また、(ステップ.5.2)において、取り出したエントリRmkに対して、配下の処理を繰り返して実施する(k=1~r)。(ステップ.5.2)配下の(ステップ.5.2.1)において、取り出したエントリの各レプリカのレイテンシ及び信頼性に関する数値情報を評価する。
[(Step 5) Subordinate processing]
Here, in (Step 5.1), the optimizing unit 1051 takes out one entry from the above-described temporary combining element list including the replica portion. In (Step 5.2), the subordinate process is repeated for the extracted entry R mk (k = 1 to r). (Step 5.2) In the subordinate (Step 5.2.1), numerical information regarding the latency and reliability of each replica of the extracted entry is evaluated.
 当該評価について、通常、各エントリの評価には、そのコスト値が評価値として記載されており、この最も小さいものが望ましいものとして評価される。本発明では、このコスト値に対して正規化したレイテンシ値を重畳した上で、信頼性の逆数も重畳した、拡張コスト値を定義する。そして、当該拡張コスト値の最小のものを最良とするように評価する。 Regarding this evaluation, normally, the cost value is described as an evaluation value in the evaluation of each entry, and the smallest value is evaluated as desirable. In the present invention, an extended cost value is defined in which a normalized latency value is superimposed on the cost value and an inverse number of reliability is also superimposed. Then, evaluation is made so that the smallest expansion cost value is the best.
 正規化されたレイテンシ値は、サイトの性能やLAN/WAN/Internet等、各種ネットワーク1002の通信品質が悪化、もしくは配置距離が長い程、大きな値となる。また、サイトの信頼性が低い程、この逆数も大きな値となる。このため、配置位置が遠く、劣悪な品質なものほど、コスト値は悪く評価されることになる。 The normalized latency value becomes larger as the communication quality of various networks 1002 such as site performance and LAN / WAN / Internet deteriorates or the arrangement distance becomes longer. Also, the lower the site reliability, the greater the inverse. For this reason, the farther the arrangement position is and the worse the quality is, the worse the cost value is evaluated.
 その後、(ステップ.5.3)において、拡張コスト値が妥当な(所定の条件に適合する)指定された上位p個のレプリカを対象に、実際に利用するべき仮結合要素リストの要素、その種類毎に決める。このとき、上位p個のレプリカを残し、他を仮結合要素リストから削除する。 Then, in (Step 5.3), the elements of the temporary combination element list to be actually used for the designated top p replicas whose expansion cost values are valid (conforming to predetermined conditions), Decide for each type. At this time, the top p replicas are left and others are deleted from the temporary connection element list.
 (ステップ.6)では、前述の仮結合要素リスト上で残ったエントリから結合要素リストの初期版{R11 ,…,R1p ,R21 ,…,R2p ,...,Rnp }を作成し、最適化プラン要素リストにも追加する。 (Step .6), the initial version of {R 11 o coupling element list from the entry remaining on the temporary coupling element list above, ..., R 1p o, R 21 o, ..., R 2p o,. . . , R np o } is added to the optimization plan element list.
 その後、(ステップ.7)では、仮結合要素リストを削除する。 After that, in (Step 7), the temporary combined element list is deleted.
 (ステップ.8)でも、本発明の特徴的な事項を含む。特に(ステップ.8.3)、(ステップ.8.4)、並びに(ステップ.8.7)~(ステップ.8.9)である。 (Step 8) also includes the characteristic features of the present invention. In particular, (Step .8.3), (Step .8.4), and (Step .8.7) to (Step .8.9).
 (ステップ.8)において、最適化部1051は、結合要素リスト内の要素数がレプリカを除き1種類になるまで、配下の処理を繰り返し実施する。 In (Step 8), the optimization unit 1051 repeatedly performs the subordinate processing until the number of elements in the combined element list becomes one type except for the replica.
 [(ステップ.8)配下の処理]
 (ステップ.8.1)では、従来の方式でも見られるように、最大ブロックサイズkを指定値、もしくは結合要素リストの要素数で、小さいものに設定する。その後、(ステップ.8.2)では、結合数の初期値を2に指定する。(ステップ.8.3)では、最適化部1051の処理系が自分自身をコピーした後、コピーした側の最大ブロックサイズを(k+1)に書き換える。その後、(ステップ.8.4)では、コピーした処理系を起動する。これは、最大ブロックサイズがkの場合に中間成果表を生成することになるが、どの分散ストレージサイトに配置するか、を先読みして合理的に決めるために実施する。
[(Step 8) Subordinate processing]
In (Step 8.1), the maximum block size k is set to a small value by the designated value or the number of elements in the combined element list, as can be seen in the conventional method. Thereafter, in (Step .8.2), the initial value of the number of connections is designated as 2. In (Step 8.3), after the processing system of the optimization unit 1051 copies itself, the maximum block size on the copied side is rewritten to (k + 1). Thereafter, in (Step .8.4), the copied processing system is activated. This is to generate an intermediate result table when the maximum block size is k, but is performed in order to pre-determine and rationally determine which distributed storage site to place.
 当該中間成果表を配置するには、前述のレイテンシ及び信頼性に関する数値情報を含めて評価するだけではなく、その中間成果表を使って再度結合演算処理を行う場合にコスト値全体が最小化するように取り決める必要がある。このためには、どのレプリカを使ってアクセスされるかを把握する必要がある。 In order to place the intermediate result table, not only the above-mentioned numerical information on latency and reliability is evaluated, but also the total cost value is minimized when the intermediate result table is used to perform the joint operation process again. It is necessary to arrange as follows. For this purpose, it is necessary to know which replica is used for access.
 このためには、実際には、最大ブロックサイズがkの場合に前述の中間成果表に書き込みを行うが、これを(k+1)とし、k個の結合でどのレプリカ表を用いるか、を判定の上で、そのレプリカ表が配置される分散ストレージサイト、もしくはその近傍の分散ストレージサイトに中間成果表を配置することで前述の拡張コスト値を小さくする必要がある。 For this purpose, in actuality, when the maximum block size is k, the above-mentioned intermediate result table is written, but this is set to (k + 1), and it is determined which replica table is used for k joins. In the above, it is necessary to reduce the above-mentioned expansion cost value by arranging the intermediate result table at the distributed storage site where the replica table is arranged or at a distributed storage site in the vicinity thereof.
 このためには、最適化部1051の処理系自身をコピーして最大ブロックサイズを(k+1)に変更し並列に評価させることで、評価の前提条件を一致したものに指定でき、本来望ましいものを求めることが可能となる。 For this purpose, the processing system itself of the optimization unit 1051 is copied, the maximum block size is changed to (k + 1), and the evaluation is performed in parallel. It can be obtained.
 (ステップ.8.5)では、指定結合数≦最大ブロック数k(コピー側は(k+1))の条件を満足する限り、配下の処理を繰り返して実施する。 In (Step 8.5), as long as the condition of designated number of connections ≦ maximum number of blocks k (the copy side is (k + 1)) is satisfied, the subordinate processing is repeated.
 なお、(ステップ.8.5.1)、(ステップ.8.5.2)、(ステップ.8.5.3)、(ステップ.8.5.4)、(ステップ.8.5.5)、(ステップ.8.5.6)、(ステップ.8.5.7)、(ステップ.8.5.8)は、従来の方法でも確認される方式である。 In addition, (step .8.5. 1), (step .8.5.2), (step .8.5.3), (step .8.5.4), (step .8.5.5) ), (Step. 8.5.6), (Step. 8.5.7), and (Step. 8.5.8) are methods that are also confirmed by the conventional method.
 [(ステップ.8.5)配下の処理]
 まず、(ステップ.8.5.1)では、指定結合数分の組合せを構成する全ての場面を、結合要素リストの初期版{R11 ,…,R1p ,R21 ,…,R2p ,...,Rnp }から取り出し、仮リストを作成し、追加する。続いて、(ステップ.8.5.2)では、全ての仮リスト要素数分、配下の処理を繰り返して実施する。
[(Step .8.5) Subordinate processing]
First, in (step .8.5.1), all scenes constituting the combination of the specified binding few minutes, the initial version of {R 11 o coupling element list, ..., R 1p o, R 21 o, ..., R 2p o ,. . . , R np o } to create and add a temporary list. Subsequently, in (Step 8.5.2), the subordinate processing is repeated for all the temporary list elements.
 [(ステップ.8.5.2)配下の処理]
 まず、(ステップ.8.5.2.1)では、仮リストから1要素を取り出す。続いて、(ステップ.8.5.2.2)では、仮リスト部分1,2が伴に定義できるまで、配下の処理を繰り返して実施する。
[(Step .8.5.2) Subordinate processing]
First, in (Step .8.5.2.1), one element is extracted from the temporary list. Subsequently, in (Step .8.5.2.2), the subordinate processing is repeated until the temporary list portions 1 and 2 can be defined together.
 [(ステップ.8.5.2.2)配下の処理]
 まず、(ステップ.8.5.2.2.1)では、要素を2つに分割し、要素{Rkp }の組合せ2組を作成し、{仮リスト部分1}{仮リスト部分2}を作成する。続いて、(ステップ.8.5.2.2.2)では、{仮リスト部分1}{仮リスト部分2}の組合せで、仮操作リストに追加する。続いて、(ステップ.8.5.2.2.3)では、{仮リスト部分2}{仮リスト部分1}の組合せで、仮操作リストに追加する。
[Processing under (Step.8.5.2.2)]
First, in (Step 8.5.5.2.2.1), the element is divided into two, and two sets of combinations of the elements {R kp o } are created, and {temporary list part 1} {temporary list part 2 }. Subsequently, in (Step .8.5.2.2.2.2), a combination of {temporary list part 1} {temporary list part 2} is added to the temporary operation list. Subsequently, in (Step .8.5.2.2.2.3), a combination of {temporary list part 2} {temporary list part 1} is added to the temporary operation list.
 [(ステップ.8.5)配下の処理の続き1]
 (ステップ.8.5.2)の終了後、(ステップ.8.5)配下の(ステップ.8.5.3)では、最適化プランリストを初期化する。続いて、(ステップ.8.5.4)では、全ての仮操作リスト要素数分、配下の処理を繰り返して実施する。
[(Step. 8.5) Continuation of subordinate process 1]
After the completion of (Step .8.5.2), the optimization plan list is initialized in (Step .8.5.3) under (Step .8.5). Subsequently, in (Step .8.5.4), the subordinate processing is repeated for all the provisional operation list elements.
 [(ステップ.8.5.4)配下の処理]
 まず、(ステップ.8.5.4.1)では、仮操作リストから1つ取り出す。続いて、(ステップ.8.5.4.2)では、取り出した右側{仮リスト部分x(x=1,2)}の最適化プランを最適化プラン要素リストから取り出すか、もしくは作成する。続いて、(ステップ.8.5.4.3)では、取り出した左側{仮リスト部分x(x=1,2)}の最適化プランを最適化プラン要素リストから取り出すか、もしくは作成する。続いて、(ステップ.8.5.4.4)では、前述の(ステップ.8.5.4.2)(ステップ.8.5.4.3)の2つの最適化プランでの結合プランを作成し、最適化プランリストに追加する。
[(Step. 8.5.4) Subordinate processing]
First, in (Step .8.5.4.1), one is extracted from the temporary operation list. Subsequently, in (Step .8.5.4.2), the optimization plan of the extracted right side {temporary list portion x (x = 1, 2)} is extracted from the optimization plan element list or created. Subsequently, in (Step .8.5.4.3), the optimization plan of the extracted left {temporary list portion x (x = 1, 2)} is extracted from the optimization plan element list or created. Subsequently, in (Step .8.5.4.4), the combined plan in the two optimization plans of (Step .8.5.4.2) and (Step 8.5.4.3) described above. And add it to the optimization plan list.
 [(ステップ.8.5)配下の処理の続き2]
 (ステップ.8.5.4)の終了後、(ステップ.8.5)配下の(ステップ.8.5.5)では、最適化プランリストから最適なものを1つ取り出し、それを最適化プラン要素リストに追加する。続いて、(ステップ.8.5)配下の(ステップ.8.5.6)では、最適化プランリストの最適でないものを、ライブラリとして整理保持する。続いて、(ステップ.8.5)配下の(ステップ.8.5.7)では、仮リスト、仮操作リストをクリアする。続いて、(ステップ.8.5)配下の(ステップ.8.5.8)では、結合数に1を加算(increment)する(結合数←(結合数+1))。
[(Step .8.5) Continuation of subordinate process 2]
After (Step .8.5.4) is finished, in (Step .8.5.5) under (Step .8.5), one optimal item is extracted from the optimization plan list and optimized. Add to the plan element list. Subsequently, in (Step .8.5.6) under (Step .8.5), the non-optimal optimization plan list is organized and held as a library. Subsequently, in (Step .8.5.7) under (Step .8.5), the temporary list and the temporary operation list are cleared. Subsequently, in (Step .8.5.8) under (Step .8.5), 1 is added to the number of bonds (number of bonds ← (number of bonds + 1)).
 [(ステップ.8)配下の処理の続き1]
 (ステップ.8.6)では、従来の方式でも見られるように、(ステップ.8.5)で作成された最適化プラン要素リスト内の要素から、貪欲法(欲張り法)のアルゴリズムを用いてコストを計算する評価関数を最小にする結合木の選択を行う。貪欲法(欲張り法)のアルゴリズムは、従来のものと同様である。
[(Step 8) Continuation of subordinate processing 1]
In (Step 8.6), as can be seen in the conventional method, from the elements in the optimization plan element list created in (Step .8.5), the algorithm of greedy method (greedy method) is used. Select a join tree that minimizes the cost function. The algorithm of the greedy method (greedy method) is the same as the conventional one.
 (ステップ.8.7)は、本発明の特徴的な事項である。(ステップ.8.7)では、並行して実施している最適化部1051の処理系が最大ブロックサイズkにてコピーを実施した側(非コピー処理系)か、最大ブロックサイズ(k+1)にてコピーされた側(コピー処理系)か、を評価する。その結果に応じて、後の処理内容が異なる。 (Step 8.7) is a characteristic matter of the present invention. In (Step 8.7), the processing system of the optimizing unit 1051 that is executing in parallel is the side that performed copying with the maximum block size k (non-copy processing system) or the maximum block size (k + 1). The side that was copied (copy processing system) is evaluated. Depending on the result, the content of subsequent processing differs.
 [コピー処理系の場合]
 最適化部1051の処理系が最大ブロックサイズ(k+1)にてコピーされたコピー処理系である場合、(ステップ.8.7.1)では、非コピー処理系からの問い合わせ待ちとなる。その後、(ステップ.8.7.2)では、コピーを実施した非コピー処理系からの問い合わせの際に指定される中間成果表の結合木{T }種類を判定する。この場合、左側(外部側)/駆動表か、右側(内部側)/参照表を評価し、k個の結合でどのサイトのどの表と結合演算を行うか、を非コピー処理系に応答する。その後、(ステップ.8.7.3)では、自ら終了する。すなわち、コピー処理系を終了する。
[For copy processing system]
When the processing system of the optimization unit 1051 is a copy processing system that has been copied with the maximum block size (k + 1), in (Step 8.7.1), it waits for an inquiry from the non-copy processing system. Thereafter, in (Step 8.7.2), the type of the join tree {T 1 o } of the intermediate result table specified at the time of inquiry from the non-copy processing system that has performed copying is determined. In this case, the left side (external side) / driving table or the right side (internal side) / reference table is evaluated, and which table of which site is to be joined with k joins is responded to the non-copy processing system. . Thereafter, in (Step 8.7.3), the process is terminated by itself. That is, the copy processing system is terminated.
 [非コピー処理系の場合]
 最適化部1051の処理系が最大ブロックサイズkにてコピーを実施した側の場合、(ステップ.8.7.4)では、結合木を新たな中間成果表{T }に仮置きをする。その後、(ステップ.8.7.5)で、この結果をいずれのサイトに配置するべきかを問い合わせるため、最大ブロックサイズ(k+1)にてコピーされた最適化部1051の処理系に問い合わせる。
[Non-copy processing system]
In the case where the processing system of the optimization unit 1051 performs copying with the maximum block size k, in (Step 8.8.7.4), the temporary tree is temporarily placed in the new intermediate result table {T 1 o }. To do. Thereafter, in step 8.7.5, an inquiry is made to the processing system of the optimization unit 1051 copied with the maximum block size (k + 1) in order to inquire which site this result should be placed.
 最大ブロックサイズ(k+1)にてコピーされた最適化部1051の処理系は、通常、(ステップ.8.7.3)で自ら終了することを前提とするが、利用後に終了されていない場合を考慮して(ステップ.8.7.5)で、強制終了させることも可能である。 The processing system of the optimization unit 1051 copied with the maximum block size (k + 1) is normally assumed to end itself in (Step 8.8.7.3), but the case where it has not ended after use is assumed. Considering (step 8.7.5), it is also possible to forcibly terminate.
 [(ステップ.8)配下の処理の続き2]
 その後、(ステップ.8.8)では、配置評価関数を呼び出し、前述の中間成果表を配置するべきサイトとその索引等の作成方針で最適な上位2案を取り出す。索引等を作成する場合、そのために追加コストが生じることになるが、索引故に検索速度は大きく改善することになる。そのため、「索引等作成」と「索引等を利用した検索」をバランス良く評価する必要がある。
[(Step 8) Continuation of subordinate process 2]
Thereafter, in (Step 8.8), the placement evaluation function is called, and the top two proposals are extracted in accordance with the creation policy of the site where the intermediate result table is to be placed and its index. When an index or the like is created, an additional cost is generated for this purpose, but the search speed is greatly improved because of the index. Therefore, it is necessary to evaluate “creation of index etc.” and “search using index etc.” in a balanced manner.
 [配置評価関数]
 当該配置評価関数は、以下の手順で実施される。
 最初の(ステップ.f)では、最大ブロックサイズ(k+1)にてコピーされた前述の最適化部1051の処理系内で判断された中間成果表の結合木{T }種類を確認する。結合木{T }種類が左側(外部側)/駆動表になる場合、(ステップ.f.1.1)~(ステップ.f.1.12)を連続して実施する。また、結合木{T }種類が右側(内部側)/参照表になる場合、(ステップ.f.2.1)~(ステップ.f.2.15)を連続して実施する。
[Placement evaluation function]
The said arrangement | positioning evaluation function is implemented in the following procedures.
In the first step (step .f), the type of connection tree {T 1 o } of the intermediate result table determined in the processing system of the optimization unit 1051 copied with the maximum block size (k + 1) is confirmed. When the type of the connection tree {T 1 o } is left side (external side) / driving table, (step .f.1.1) to (step .f.1.12) are continuously performed. Further, when the type of the connected tree {T 1 o } is the right side (internal side) / reference table, (step .f.2.1) to (step .f.2.15) are continuously performed.
 [結合木{T }種類が左側(外部側)/駆動表になる場合]
 (ステップ.f.1.1)では、(ステップ.8.7.2)での次の結合演算先(ここでは内部側/参照表)を確認する。その後、(ステップ.f.1.2)では、次の結合演算先(内部側/参照表)で利用されるサイト群全てを(レプリカも含めて)取り出して、サイト仮リストに追加する。
[When the join tree {T 1 o } type is left side (external side) / driving table]
In (Step .f.1.1), the next join operation destination (in this case, internal side / reference table) in (Step .8.7.2) is confirmed. After that, in (step f. 1.2), all the site groups (including replicas) used in the next join operation destination (inside / reference table) are taken out and added to the temporary site list.
 (ステップ.f.1.1)、(ステップ.f.1.2)では、コピーされた前述の最適化部1051の処理系内で、どの表とサイト群が次結合演算先(この場合は内部側/参照表)として利用されているかを判定し、それをリストに含める。これに対して(ステップ.f.1.3)~(ステップ.f.1.7)では、利用されていないサイト群の中で、中間成果表を配置することのできる候補を最大2つまで探索する。 In (Step .f.1.1) and (Step .f.1.2), in the processing system of the copied optimization unit 1051, which table and site group is the next join operation destination (in this case, It is determined whether it is used as an internal side / reference table), and it is included in the list. On the other hand, in (Step .f.1.3) to (Step .f.1.7), up to two candidates that can place the intermediate result table among the unused sites. Explore.
 具体的には、(ステップ.f.1.3)では、利用されないサイト群の一覧を抽出する。ここでは、サイト仮リストに含まれないサイトを取り出し、候補仮リストに追加する。(ステップ.f.1.4)では、探索用の2箇所分の候補エリアを準備し、初期化する。その後、(ステップ.f.1.5)では、候補仮リスト要素分、配下の処理を繰り返し実施する。 Specifically, in (Step .f.1.3), a list of unused site groups is extracted. Here, a site not included in the temporary site list is taken out and added to the candidate temporary list. In (Step f.1.4), candidate areas for two locations for search are prepared and initialized. Thereafter, in (Step .f.1.5), the subordinate processing is repeatedly performed for the candidate temporary list element.
 [(ステップ.f.1.5)配下の処理]
 (ステップ.f.1.5.1)では、順次、サイト情報を読み出す。(ステップ.f.1.5.2)では、計測統計DBから、当該サイトに関するレイテンシ及び信頼性に関する数値情報を読み出す。また、レイテンシ及び信頼性に関する数値情報を読み出す毎に、配下の処理を実施する。
[Processing under (Step.f.1.5)]
In (step .f.1.5.1), the site information is read sequentially. In (step .f. 1.5.2), numerical information related to latency and reliability related to the site is read from the measurement statistics DB. In addition, the subordinate process is performed every time numerical information related to latency and reliability is read.
 [候補エリアで初期化されたものが残っている場合]
 (ステップ.f.1.5.2.1)において、最初の2サイトを候補として、前述の探索用の候補エリアにセットする。
[When there is something initialized in the candidate area]
In (Step .f.1.5.2.1), the first two sites are set as candidates in the above-described search candidate area.
 [候補エリアが既にセットされ、サイト仮リストに登録されていない場合]
 (ステップ.f.1.5.2.2)において、(ステップ.f.1.5.2)を繰り返して実施した結果、レイテンシ及び信頼性に関する数値情報が、より高評価である場合、当該探索用の候補エリアに記載されているサイトを置き換える(リプレイス)。それ以外は処理しない。
[If the candidate area is already set and not registered in the temporary site list]
In (step .f.1.5.2.2), when (step .f.1.5.2) is repeated and the numerical information on latency and reliability is higher evaluation, Replace the site listed in the candidate area for search (replace). No other processing is performed.
 候補仮リスト内の全てのサイトに対して(ステップ.f.1.5)を実施した結果、新たな候補となる2つのサイトが特定されたならば、(ステップ.f.1.6)では、中間成果表の結合木{T }を一時的に配置するサイト候補を意味するサイト仮リストに追加する。その後、(ステップ.f.1.7)では、候補仮リストをクリアする。続けて、(ステップ.f.1.8)では、サイト仮リスト要素分、配下の処理を繰り返し実施して、中間成果表の結合木{T }を一時的に配置するサイト候補の各々を具体的に評価する。 If two sites that are new candidates are identified as a result of performing (Step f.1.5) on all sites in the candidate temporary list, (Step f.1.6) The intermediate result table join tree {T 1 o } is added to a temporary site list that means site candidates to be temporarily arranged. Thereafter, in (step .f.1.7), the candidate temporary list is cleared. Subsequently, in (step .f.1.8), each of the site candidates for temporarily arranging the connection tree {T 1 o } of the intermediate result table by repeatedly performing the subordinate processing for the site temporary list element. Is specifically evaluated.
 [(ステップ.f.1.8)配下の処理]
 (ステップ.f.1.8.1)では、結合木{T }の作成件数(作成される行数及び件数)から表作成コストを見積もる。その後、(ステップ.f.1.8.2)では、計測統計DBから、そのサイトに関するレイテンシ及び信頼性に関する数値情報を読み出す。その後、(ステップ.f.1.8.3)では、前述の表作成コストの見積りに対し、(ステップ.f.1.8.2)で得たデータを保持するサイトに関する正規化されたレイテンシに関する数値情報並びに当該信頼性に関する数値情報の逆数を重畳し、全体コストを求める。更に、(ステップ.f.1.8.4)では、全体コストに対して計画番号を付け、候補の一つとして評価リストに追加する。そして、(ステップ.f.1.8.5)では、索引の作成可能性を判定する。索引が作成可能である場合は、配下の処理を実施する。
[Process under (Step.f.1.8)]
In (Step .f.1.8.1), the table creation cost is estimated from the number of creations of the join tree {T 1 o } (number of rows created and number of cases). After that, in (Step .f. 1.8.2), numerical information related to the latency and reliability related to the site is read from the measurement statistics DB. Thereafter, in (Step .f. 1.8.3), the normalized latency related to the site holding the data obtained in (Step .f. 1.8.2) with respect to the estimation of the table creation cost described above. And the reciprocal number of the numerical information related to the reliability are superimposed to obtain the total cost. Further, in (Step .f. 1.8.4), a plan number is assigned to the entire cost, and it is added to the evaluation list as one of candidates. In (step .f. 1.8.5), the possibility of creating an index is determined. If the index can be created, the subordinate process is executed.
 [索引が作成可能である場合]
 (ステップ.f.1.8.5.1)では、(ステップ.f.1.8.3)で得た全体コストから索引作成コストを見積り、索引作成コストも組み込んだ全体コストを改めて計算する。その後、(ステップ.f.1.8.5.2)では、改めて計算した全体コストに対して計画番号を付け、別候補として評価リストに追加する。
[If the index can be created]
In (Step.f.1.8.5.1), the index creation cost is estimated from the total cost obtained in (Step.f.1.8.8.3), and the total cost including the index creation cost is calculated again. . After that, in (Step .f.1.8.5.2), a plan number is assigned to the newly calculated overall cost, and it is added to the evaluation list as another candidate.
 (ステップ.f.1.8)で中間成果表の結合木{T }を一時的に配置するサイト候補全てを評価した後、(ステップ.f.1.9)において、前述の評価リスト内のデータに関して、全体コストでソートし序列化を行う。そして、全体コストが最小のものを第一案として記憶する。 After evaluating all the site candidates that temporarily place the connection tree {T 1 o } of the intermediate result table in (Step .f.1.8), in (Step .f.1.9), the evaluation list described above is used. The data in the list is sorted and sorted according to the overall cost. Then, the one with the lowest overall cost is stored as the first plan.
 その後、(ステップ.f.1.8.10)において、第一案と同類の構成で配置先が異なり、全体コストが次に小さいものを第二案として記憶する。ここで、同類の構成とは、具体的には、索引が作成される場合は索引が存在する構成、索引が作成されない場合は索引が存在しない構成を意味する。 After that, in (step .f.1.8.10), the one with the same arrangement as the first plan, the arrangement destination is different, and the next lowest overall cost is stored as the second plan. Here, a similar configuration means a configuration in which an index exists when an index is created, and a configuration in which no index exists when an index is not created.
 その後、(ステップ.f.1.8.11)において、後処理として、サイト仮リスト、評価リストをクリアする。その後、(ステップ.f.1.8.12)において、前述の第一案、並びに前述の第二案を回答する。 After that, in (Step f.1.8.11), the site temporary list and the evaluation list are cleared as post-processing. Thereafter, in (Step .f. 1.8.12), the above-mentioned first proposal and the above-mentioned second proposal are answered.
 ここまでが、結合木{T }種類が左側(外部側)/駆動表になる場合の処理である。 Up to this point, the processing is performed when the type of the connection tree {T 1 o } is left (external side) / driving table.
 [結合木{T }種類が右側(内部側)/参照表になる場合]
 結合木{T }種類が右側(内部側)/参照表になる場合は、結合木{T }種類が左側(外部側)/駆動表になる場合と異なり、処理が複雑である。これはハッシュ結合を利用する場合が存在するためである。特に中間生成表である結合木{T }をストレージに書き込んだ後、ハッシュが作成されることになる。しかし、ハッシュはメモリ上に作成されるデータ構造であるため、ストレージに書き込むサイトとハッシュを保持するサイトが別サイト及び装置であることも可能である。
[When the join tree {T 1 o } type is right (inside) / reference table]
When the type of the join tree {T 1 o } becomes the right side (inside) / reference table, the process is complicated, unlike the case where the type of the join tree {T 1 o } becomes the left side (outside) / drive table. . This is because there are cases where hash join is used. In particular, a hash is created after writing the join tree {T 1 o }, which is an intermediate generation table, to the storage. However, since the hash is a data structure created on the memory, it is possible that the site that writes to the storage and the site that holds the hash are different sites and devices.
 (ステップ.f.2.1)では、(ステップ.8.7.2)での次の結合演算先(ここでは外部側/駆動表)を確認する。その後、(ステップ.f.2.2)では、次の結合演算先(外部側/駆動表)で利用されるサイト群全てを(レプリカも含めて)取り出して、サイト仮リストに追加する。その後、(ステップ.f.2.3)では、利用されないサイト群の一覧を抽出する。ここでは、サイト仮リストに含まれないサイトを取り出し、候補仮リストに追加する。(ステップ.f.2.4)では、探索用の2箇所分の候補エリアを準備し、初期化する。その後、(ステップ.f.2.5)では、候補仮リスト要素分、配下の処理を繰り返し実施する。 In (Step .f.2.1), the next join operation destination (in this case, external side / drive table) in (Step 8.7.2) is confirmed. Thereafter, in (step f.2.2), all site groups (including replicas) used at the next join operation destination (external side / drive table) are taken out and added to the temporary site list. After that, in (Step .f2.3), a list of sites that are not used is extracted. Here, a site not included in the temporary site list is taken out and added to the candidate temporary list. In (Step .f.2.4), two candidate areas for search are prepared and initialized. Thereafter, in (step .f.2.5), the subordinate processing is repeatedly performed for the candidate temporary list element.
 [(ステップ.f.2.5)配下の処理]
 (ステップ.f.2.5.1)では、順次、サイト情報を読み出す。(ステップ.f.2.5.2)では、計測統計DBから、当該サイトに関するレイテンシ及び信頼性に関する数値情報を読み出す。また、レイテンシ及び信頼性に関する数値情報を読み出す毎に、配下の処理を実施する。
[Processing under (Step.f.2.5)]
In (step .f.2.5.1), the site information is read sequentially. In (step .f.2.5.2), numerical information regarding latency and reliability related to the site is read from the measurement statistics DB. In addition, the subordinate process is performed every time numerical information related to latency and reliability is read.
 [候補エリアで初期化されたものが残っている場合]
 (ステップ.f.2.5.2.1)において、最初の2サイトを候補として、前述の探索用の候補エリアにセットする。
[When there is something initialized in the candidate area]
In (step .f.2.5.2.1), the first two sites are set as candidates in the candidate area for search described above.
 [候補エリアが既にセットされ、サイト仮リストに登録されていない場合]
 (ステップ.f.2.5.2.2)において、(ステップ.f.2.5.2)を繰り返して実施した結果、レイテンシ及び信頼性に関する数値情報が、より高評価である場合、当該探索用の候補エリアに記載されているサイトを置き換える(リプレイス)。それ以外は処理しない。
[If the candidate area is already set and not registered in the temporary site list]
In (Step .f.2.5.2.2), if (Step .f.2.5.2) is repeated and the numerical information on latency and reliability is higher, Replace the site listed in the candidate area for search (replace). No other processing is performed.
 候補仮リスト内の全てのサイトに対して(ステップ.f.2.5)を実施した結果、新たな候補となる2つのサイトが特定されたならば、(ステップ.f.2.6)では、中間成果表の結合木{T }を一時的に配置するサイト候補を意味するサイト仮リストに追加する。その後、(ステップ.f.2.7)では、候補仮リストをクリアする。続けて、(ステップ.f.2.8)では、サイト仮リスト要素分、配下の処理を繰り返し実施して、中間成果表の結合木{T }を一時的に配置するサイト候補の各々を具体的に評価する。 If two sites that are new candidates are identified as a result of performing (Step .f.2.5) on all sites in the candidate temporary list, (Step .f.2.6) The intermediate result table join tree {T 1 o } is added to a temporary site list that means site candidates to be temporarily arranged. Thereafter, in (step .f.2.7), the candidate temporary list is cleared. Subsequently, in (step .f. 2.8), each of the site candidates for temporarily arranging the connection tree {T 1 o } of the intermediate result table by repeatedly executing the subordinate processing for the site temporary list element Is specifically evaluated.
 [(ステップ.f.2.8)配下の処理]
 (ステップ.f.2.8.1)では、結合木{T }の作成件数(作成される行数及び件数)から表作成コストを見積もる。その後、(ステップ.f.2.8.2)では、計測統計DBから、そのサイトに関するレイテンシ及び信頼性に関する数値情報を読み出す。その後、(ステップ.f.2.8.3)では、前述の表作成コストの見積りに対し、(ステップ.f.2.8.2)で得たデータを保持するサイトに関する正規化されたレイテンシに関する数値情報並びに当該信頼性に関する数値情報の逆数を重畳し、表部分全体コストを求める。更に、(ステップ.f.2.8.4)では、表部分全体コストに対して計画番号を付け、候補の一つとして表部分評価リストに追加する。
[Process under (Step.f.2.8)]
In (Step .f.2.8.1), the table creation cost is estimated from the number of creations of the join tree {T 1 o } (the number of rows and the number of rows created). Thereafter, in (Step .f. 2.8.2), numerical information related to the latency and reliability related to the site is read from the measurement statistics DB. Thereafter, in (Step .f. 2.8.3), the normalized latency related to the site holding the data obtained in (Step .f. 2.8.2) with respect to the estimation of the table creation cost described above. And the reciprocal number of the numerical information related to the reliability are superimposed to obtain the entire table portion cost. Further, in (Step .f. 2.8.4), a plan number is assigned to the entire table part cost, and the plan number is added to the table part evaluation list as one of candidates.
 (ステップ.f.2.1)~(ステップ.f.2.8.4)は、(ステップ.f.2.1.1)を除き、前述の(ステップ.f.1.2)~前述の(ステップ.f.1.8.4)と等価な処理であり、表部分全体コストが算出される。これに対して(ステップ.f.2.9)~(ステップ.f.2.15)は、結合木{T }種類が右側(内部側)/参照表になる場合の専用の処理内容である。 (Step .f.2.1) to (Step .f. 2.8.4) are the same as the above (Step .f.1.2) to the above-mentioned except for (Step .f.2.1.1.1). (Step .f. 1.8.4) is equivalent to the process, and the entire table portion cost is calculated. On the other hand, (Step .f2.9) to (Step .f.2.15) are dedicated processing contents when the type of the connection tree {T 1 o } is the right side (inside) / reference table. It is.
 (ステップ.f.2.9)では、結合木{T }種類が右側(内部側)/参照表になる場合、全体の処理量を抑制させる目的で前述の表部分全体コストが小さいものの上位3つに限定する。ここでは、表部分評価リストから、表部分全体コストが小さいものの上位3つを選択し、他を削除する。 In (step .f. 2.9), when the type of join tree {T 1 o } is the right side (inside) / reference table, the total cost of the table portion is small for the purpose of suppressing the overall processing amount. Limited to the top three. Here, from the table part evaluation list, the top three items with the small total table part cost are selected and the others are deleted.
 続けて、(ステップ.f.2.10)では、サイト仮リスト要素分、配下の処理を繰り返し実施する。 Subsequently, in (Step .f.2.10), the subordinate processing is repeatedly performed for the site temporary list element.
 [(ステップ.f.2.10)配下の処理]
 (ステップ.f.2.10.1)では、メモリ上に構成される前述のハッシュの作成コストを見積もる。(ステップ.f.2.10.2)では、特に(ステップ.f.2.8.2)で得たサイトに関する正規化されたレイテンシに関する数値情報並びに前述の信頼性に関する数値情報の逆数を重畳し、参照コストを求める。更に、(ステップ.f.2.10.3)では、この参照コストに対して計画番号を付け、候補の一つとして参照部分評価リストに追加する。
[Process under (Step.f.2.10)]
In (Step .f.2.10.1), the creation cost of the above-described hash configured on the memory is estimated. (Step .f.2.10.2) superimposes the numerical information related to the normalized latency related to the site obtained in (Step .f. 2.8.2) and the reciprocal of the numerical information related to the reliability described above. And obtain the reference cost. Furthermore, in (step .f.2.10.3), a plan number is assigned to this reference cost, and it is added to the reference partial evaluation list as one of the candidates.
 また、ハッシュ結合を利用しない場合も考えられる。その場合、索引が作成される可能性もある。そこで、(ステップ.f.2.10.4)では、前述の索引の作成コストを見積もる。(ステップ.f.2.10.5)では、特に(ステップ.f.2.8.2)で得たサイトに関する正規化されたレイテンシに関する数値情報並びに前述の信頼性に関する数値情報の逆数を重畳し、参照コストを求める。更に、(ステップ.f.2.10.6)では、この全体コストに対して計画番号を付け、候補の一つとして参照部分評価リストに追加する。 Also, there may be cases where hash join is not used. In that case, an index may be created. Therefore, in (Step .f.2.10.4), the above-mentioned index creation cost is estimated. (Step .f.2.10.5) superimposes the numerical information related to the normalized latency related to the site obtained in (Step .f. 2.8.2) and the reciprocal number of the numerical information related to the reliability described above. And obtain the reference cost. Further, in (Step .f.2.10.6), a plan number is assigned to this overall cost, and it is added to the reference partial evaluation list as one of the candidates.
 続いて、(ステップ.f.2.11)では、前述の表部分全体コストと参照コストの組合せを行うことで、前述の結合木{T }を含む中間成果表と、前述のハッシュが異なるサイトに作成される場合を評価する。ここでは、(ステップ.f.2.11)において、表部分評価リスト要素分、配下の処理を繰り返し実施する。 Subsequently, in (Step .f.2.11), the intermediate result table including the above-described join tree {T 1 o } and the above-described hash are obtained by combining the total cost of the table part and the reference cost. Evaluate when it is created on a different site. Here, in (Step .f.2.11), the subordinate processing is repeated for the table partial evaluation list element.
 [(ステップ.f.2.11)配下の処理]
 (ステップ.f.2.11.1)では、表部分評価リストから、計画番号順に1件取り出す。(ステップ.f.2.11.2)では、表部分全体コストと参照コストを、それぞれ定数重み付けをした上で総和を取り、全体コストを算出する。その後、(ステップ.f.2.11.2)では、算出した全体コストに対して計画番号を付け、評価リストに追加する。
[Process under (Step.f.2.11)]
In (step .f.2.11.1), one item is taken out from the table partial evaluation list in the order of the plan number. In (Step .f.2.11.2), the table portion overall cost and the reference cost are each weighted by a constant weight, and the sum is taken to calculate the overall cost. Thereafter, in (Step .f.2.11.2), a plan number is assigned to the calculated overall cost and added to the evaluation list.
 (ステップ.f.2.12)では、前述の評価リスト内のデータに関して、全体コストでソートし序列化を行う。そして、全体コストが最小のものを第一案として記憶する。 (Step .f.2.12) sorts and ranks the data in the above-described evaluation list according to the overall cost. Then, the one with the lowest overall cost is stored as the first plan.
 その後、(ステップ.f.2.13)において、第一案と同類の構成で配置先が異なり、全体コストが次に小さいものを第二案として記憶する。ここで、同類の構成とは具体的には、索引が作成される場合は索引が存在する構成、索引が作成されない場合は索引が存在しない構成を意味する。 After that, in (Step .f.2.13), the configuration with the same configuration as the first plan, the placement destination is different, and the next lowest overall cost is stored as the second plan. Here, the similar configuration specifically means a configuration in which an index exists when an index is created, and a configuration in which no index exists when an index is not created.
 その後、(ステップ.f.2.14)において、後処理として、サイト仮リスト、表部分評価リスト、参照部分評価リスト、評価リストをクリアする。その後、(ステップ.f.2.15)において、前述の第一案、並びに前述の第二案を回答する。 Thereafter, in (step .f.2.14), as a post-process, the site temporary list, the table partial evaluation list, the reference partial evaluation list, and the evaluation list are cleared. Thereafter, in (Step .f.2.15), the above-mentioned first proposal and the above-mentioned second proposal are answered.
 以上の手順に基づき、当該配置評価関数は処理を終了する。その後、当該配置評価関数の結果に基づき、(ステップ.8.9)で1箇所もしくは投機的に2箇所に中間生成結果を配置予約する。 Based on the above procedure, the placement evaluation function ends the process. After that, based on the result of the arrangement evaluation function, the intermediate generation result is reserved for arrangement in one place or speculatively two places in (Step .8.9).
 (ステップ.8.10)以降の処理は、従来の方式でも見られる処理である。(ステップ.8.10)では、配置した中間成果表に相当する結合木{Tk1 ,Tk2 }の要素を結合要素リストの現版に追加する。これを受けて、(ステップ.8.11)では、当該結合木{Tk1 ,Tk2 }を構成する結合要素を結合要素リストの現版から削除する。 The processing after (Step 8.10) is processing that can also be seen in the conventional method. In (Step .8.10), the elements of the connection tree {T k1 o , T k2 o } corresponding to the arranged intermediate result table are added to the current version of the connection element list. In response to this, in (step 8.11), the joining elements constituting the joining tree {T k1 o , T k2 o } are deleted from the current version of the joining element list.
 (ステップ.8)の処理が最適化の中核部分の処理であり、この部分を抜けた場合、最適化部1051の処理系は(ステップ.9)に移行することになる。(ステップ.9)では、結合要素リストの現版{T }から、残った最適化を実施し、最適化プラン追加する。その後、最適化部1051の処理系は(ステップ.10)で最適化プランを1つ選択する。 The processing of (Step 8) is the core processing of optimization, and when this processing is omitted, the processing system of the optimization unit 1051 shifts to (Step 9). In (Step 9), the remaining optimization is performed from the current version {T m o } of the combined element list, and an optimization plan is added. Thereafter, the processing system of the optimization unit 1051 selects one optimization plan in (Step 10).
 その後、最適化部1051の処理系は、(ステップ.11)でレプリカを含めた結合要素リストの初期版{R11 ,…,R1p ,R21 ,…,R2p ,...,Rnp }、並びに現版{T }を使い、最適化プランを表現し直す。その後、(ステップ.12において最適化部1051の処理系は、算出した最適化プランから、その一部の実行プラン記述s124、s105を前述の処理実行部-1:1053から前述の処理実行部-N:1054に渡して処理を終了する。 Thereafter, the processing system of the optimization unit 1051 performs the initial version {R 11 o ,..., R 1 p o , R 21 o ,..., R 2p o ,. . . , R np o } and the current version {T m o } to re-express the optimization plan. Thereafter (in step 12.12, the processing system of the optimizing unit 1051 transfers a part of the execution plan descriptions s124 and s105 from the above-described processing execution unit-1: 1053 to the above-mentioned processing execution unit- N: It passes to 1054 and a process is complete | finished.
 図5A、図5Bの初期段階2000は、(ステップ.1)から(ステップ.7)に相当する。また、初期段階2000の後の繰り返し2001は、(ステップ.8)に相当する。ここでは、既にアクセス方法を最適化した関係{R ,R ,...,R }を想定する。 5A and 5B correspond to (Step 1) to (Step 7). The repetition 2001 after the initial stage 2000 corresponds to (Step 8). Here, the relationship {R 1 o , R 2 o,. . . , R n o }.
 図5A、図5Bの“処理系、並びにコピー処理系の起動”2002は、(ステップ.8.3)、(ステップ.8.4)に相当する。最適化部1051の処理系でコピー処理系を起動する側は、図5A、図5Bでは処理シーケンス2003で表現される。これに対して、コピー処理系として起動される側は、処理シーケンス2004で表現される。特に処理内容の中核部分である(ステップ.8.5)は、処理シーケンス2005、並びに処理シーケンス2006で表現されることになる。 “Start of processing system and copy processing system” 2002 in FIGS. 5A and 5B corresponds to (Step .8.3) and (Step .8.4). The side that activates the copy processing system in the processing system of the optimization unit 1051 is represented by a processing sequence 2003 in FIGS. 5A and 5B. On the other hand, the side activated as the copy processing system is expressed by a processing sequence 2004. In particular, the core part of the processing content (step .8.5) is expressed by the processing sequence 2005 and the processing sequence 2006.
 このため、処理シーケンス2003においては、最大ブロックサイズkを2と設定した場合、その2段階目で中間成果表を分散ストレージに書き込むため、結合処理を中断するのに対して、処理シーケンス2004では最大ブロックサイズが(k+1)の3段階目まで、処理が進むことになる。 For this reason, in the processing sequence 2003, when the maximum block size k is set to 2, the intermediate result table is written to the distributed storage in the second stage, so the join processing is interrupted, whereas in the processing sequence 2004, the maximum Processing proceeds to the third stage of the block size (k + 1).
 当初、処理シーケンス2003、並びに処理シーケンス2004は、全く同じ処理で進むことになる。第1段階の処理ではコスト計算を最小化するため、関係{R }、並びに関係{R }を始めとする2つの関係について結合演算が実施される。この結果、複数の結合結果案が作成されるが、第1段階の処理結果は、処理シーケンス2003では結合結果2007に相当し、処理シーケンス2004では結合結果2015に相当する。 Initially, the processing sequence 2003 and the processing sequence 2004 proceed with exactly the same processing. In the first stage of processing, in order to minimize the cost calculation, a join operation is performed on two relations including the relation {R 1 o } and the relation {R 2 o }. As a result, a plurality of combined result proposals are created. The processing result of the first stage corresponds to the combined result 2007 in the processing sequence 2003 and corresponds to the combined result 2015 in the processing sequence 2004.
 第2段階の処理では、第1段階の結合結果に対して、更に、結合演算を実施する。例えば関係{R }、並びに関係{R }間で結合演算をした結果に対して、関係{R }と結合演算させる処理等が相当する。この結果、複数の結合結果案が作成されるが、第2段階の処理結果は、処理シーケンス2003では結合結果2008に相当し、処理シーケンス2004では結合結果2016に相当する。 In the second stage process, a join operation is further performed on the first stage join result. For example, for the result of performing a join operation between the relationship {R 1 o } and the relationship {R 2 o }, a process of performing a join operation with the relationship {R 4 o }, and the like. As a result, a plurality of combined result proposals are created. The processing result of the second stage corresponds to the combined result 2008 in the processing sequence 2003 and corresponds to the combined result 2016 in the processing sequence 2004.
 その後、処理シーケンス2003では、中間生成表に相当する結合木{T }への置き換え候補2009が複数生成される。更に、貪欲法(欲張り法)のアルゴリズムにより当該置き換え候補2009から最適な中間成果表である結合木{T }の選択結果2010が作成されることになる。一連の処理内容は、(ステップ.8.5)、(ステップ.8.6)に相当する。 Thereafter, in the processing sequence 2003, a plurality of replacement candidates 2009 for the connection tree {T 1 o } corresponding to the intermediate generation table are generated. Furthermore, the selection result 2010 of the join tree {T 1 o }, which is the optimum intermediate result table, is created from the replacement candidate 2009 by the algorithm of the greedy method (greedy method). The series of processing contents correspond to (Step .8.5) and (Step .8.6).
 これに対して処理シーケンス2004では、3段階目まで処理が進むため、第2段階の結合結果2016に対して更に、結合演算を実施する。第1段階の結合演算に対して、更に、結合演算を実施する。例えば関係{R }、並びに関係{R }間で結合演算をし、更に、関係{R }の結果に対して、関係{R }2022、並びに関係{R }2021の結合結果2020を追加結合させる処理等が相当する。この結果、複数の結合結果案が作成されるが、ここでは結合結果案2017が相当する。 On the other hand, in the processing sequence 2004, since the processing proceeds to the third stage, a join operation is further performed on the join result 2016 in the second stage. For the first stage join operation, a join operation is further performed. For example, a join operation is performed between the relation {R 1 o } and the relation {R 2 o }, and the relation {R 3 o } 2022 and the relation {R 5 o are further obtained for the result of the relation {R 4 o }. } This corresponds to the process of additionally combining the combination result 2020 of 2021. As a result, a plurality of combined result proposals are created, and here, the combined result proposal 2017 corresponds.
 その後、処理2018を実施し、最大ブロックサイズが(k+1)の場合にk個の結合でどのレプリカ表を用いるか、を特定する。これは、(ステップ8.7.2)の前半部分に相当する。 Thereafter, processing 2018 is performed to specify which replica table is used in k joins when the maximum block size is (k + 1). This corresponds to the first half of (Step 8.7.2).
 更に、最適化部1051の処理系でコピー処理系を起動する側からの問い合わせを待ち合わせ、問い合わせ2023の結果、処理2019を実施し、応答2024を実施した後、終了する。これは、(ステップ8.7.2)の後半部分、並びに(ステップ.8.7.3)に相当する。 Further, the processing unit of the optimizing unit 1051 waits for an inquiry from the side that activates the copy processing system. As a result of the inquiry 2023, the process 2019 is executed, the response 2024 is executed, and the process ends. This corresponds to the latter half of (Step 8.7.2) and (Step 8.7.3).
 ここで、前述の問い合わせ2023には、中間成果表である結合木{T }が指定される。これに対して前述の応答2024には、当該結合木{T }が左側(外部側)/駆動表になるか、右側(内部側)/参照表になるかが指定されると伴に、次の結合演算でどのレプリカ表を用いるか、その全てのサイト群と表群が指定されることになる。 Here, in the above-described query 2023, a join tree {T 1 o } that is an intermediate result table is designated. On the other hand, in the response 2024 described above, it is specified whether the connection tree {T 1 o } becomes the left side (external side) / driving table or the right side (internal side) / reference table. All the site groups and table groups are designated as to which replica table is used in the next join operation.
 これに対して最適化部1051の処理系でコピー処理系を起動する側の処理シーケンス2003では、処理2012で第(k+1)段階の想定結合先の問い合わせ2023を行う。その後、該当する応答2024を受信する。これは、(ステップ8.7.5)により実施される。 On the other hand, in the processing sequence 2003 on the side in which the copy processing system is started in the processing system of the optimization unit 1051, in the processing 2012, the (k + 1) -th stage assumed connection destination query 2023 is performed. Thereafter, the corresponding response 2024 is received. This is performed by (step 8.7.5).
 その後、処理2013において中間成果表の配置を行う分散ストレージサイト等の特定を行う。これは、(ステップ8.8)の配置評価関数内で評価を行った後、(ステップ8.9)を通して実施される。当該配置評価関数は、前述の[配置評価関数]のアルゴリズムに従う。 After that, in processing 2013, a distributed storage site or the like for arranging the intermediate result table is specified. This is performed through (Step 8.9) after performing the evaluation in the arrangement evaluation function of (Step 8.8). The arrangement evaluation function follows the algorithm of the [placement evaluation function] described above.
 更に、処理2014で後処理を行う。具体的には、(ステップ.8.10)、(ステップ.8.11)に相当する。 Further, post-processing is performed in processing 2014. Specifically, this corresponds to (Step .8.10) and (Step .8.11).
 以上のように、本発明クエリ最適化方式は、データベース管理システム内部のクエリを最適化する手法であり、特に大規模に仮想化されたストレージを利用する環境において、クエリの最適化の途上で発生する中間成果をより合理的に配置するための手法である。特にクエリでは関係代数に於ける制限演算と結合演算の任意個数の複合演算を前提とする。 As described above, the query optimization method of the present invention is a method for optimizing a query in a database management system, and occurs in the course of query optimization, particularly in an environment using a large-scale virtualized storage. This is a method for more rationally arranging intermediate results. In particular, queries are premised on an arbitrary number of combined operations of a limit operation and a join operation in relational algebra.
 このクエリ最適化方式では、データを保持するサイトに関するレイテンシ及び信頼性に関する数値情報を取り出す。次に、当該レイテンシに関する数値情報並びに当該信頼性に関する数値情報をクエリのコスト値に重畳し拡張コスト値を求める。次に、拡張コスト値で序列化した場合に上位で妥当な指定値を持つサイトを選定する。次に、最適化の結果発生する中間成果の配置先サイトを選定するため、アルゴリズム中のパラメータを一部変更した処理系自身のコピーを作成することで先読みを実施する。次に、当該先読みの結果、中間成果を1つ以上のサイトに配置するステップを含むことを特徴とするクエリの中間成果の配置方式を評価する。 This query optimization method extracts numerical information related to latency and reliability related to the site holding the data. Next, the numerical information related to the latency and the numerical information related to the reliability are superimposed on the cost value of the query to obtain an extended cost value. Next, a site having an appropriate designated value at the top when selecting an extension cost value is selected. Next, in order to select a placement destination site for intermediate results generated as a result of optimization, prefetching is performed by creating a copy of the processing system itself in which some parameters in the algorithm are changed. Next, as a result of the prefetching, a query intermediate result placement method characterized by including a step of placing intermediate results on one or more sites is evaluated.
 従来のクエリの最適化手法では、最適化の結果発生するクエリの中間成果を合理的な判断の下で、どの分散ストレージサイトに配置するべきかが考慮されておらず、結果としてクエリの最適化が実施段階で為されていない課題が存在し、この結果、データベースアクセスクライアントの利用者が取り決めた合意サービス水準を満足し得ない課題も発生し得る。 Traditional query optimization techniques do not take into account which distributed storage sites should be placed with reasonable judgment on the intermediate results of queries that result from optimization, resulting in query optimization However, there is a problem that has not been made in the implementation stage, and as a result, a problem that cannot satisfy the agreed service level decided by the user of the database access client may occur.
 しかし、本発明では、データベース管理システムのクエリの最適化手法として、データを保持するサイトに関するレイテンシ及び信頼性に関する数値情報を取り出す(ステップ、当該レイテンシに関する数値情報並びに当該信頼性に関する数値情報をクエリのコスト値に重畳し拡張コスト値を求めるステップ、拡張コスト値で序列化した場合に上位で妥当な指定値を持つサイトを選定するステップ、最適化の結果発生する中間成果の配置先サイトを選定するため、アルゴリズム中のパラメータを一部変更した処理系自身のコピーを作成することで先読みを実施するステップ、当該先読みの結果、中間成果を1つ以上のサイトに配置するステップを含むことを特徴とするクエリの中間成果の配置方式を評価するクエリ最適化方式を提供するため、配置の選定を合理的に実施できるようになり、従来課題を解決できる。 However, in the present invention, as a query optimization method of the database management system, numerical information related to latency and reliability related to a site holding data is extracted (step, numerical information related to the latency and numerical information related to the reliability are extracted from the query. The step of obtaining the extended cost value by superimposing it on the cost value, the step of selecting a site having a reasonable designated value when ordered by the extended cost value, and selecting the site where the intermediate result generated as a result of the optimization is placed Therefore, the method includes a step of performing prefetching by creating a copy of the processing system itself in which a parameter in the algorithm is partially changed, and a step of placing intermediate results on one or more sites as a result of the prefetching. To provide a query optimization method that evaluates the placement method of intermediate results of queries Reasonably be able to implement the selection of the arrangement, it can solve the conventional problems.
 本発明は、データベース管理システムの要素技術が、単一のシステム内の機能モジュールからネットワーク全体に広く、緩やかに連携する環境に分散し、大規模に展開された結果、従来のデータベース管理システム環境を前提としているクエリの最適化手法も影響を受け、ネットワーク全体に分散した環境を前提とした手法に進化する際の一形態と考えることができる。広域なネットワーク全体に渡り扱われることから、関連する技術領域は幾つかに跨ることに成る。 In the present invention, the elemental technology of a database management system is distributed over a wide range of environments, from functional modules in a single system to a wide network, and is deployed on a large scale. The query optimization technique assumed is also affected, and can be considered as a form of evolution to a technique that assumes an environment distributed throughout the network. Since it is handled over a wide area network, the related technical areas will span several.
 以上、本発明の実施形態を詳述してきたが、実際には、上記の実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の変更があっても本発明に含まれる。 The embodiments of the present invention have been described in detail above. However, actually, the present invention is not limited to the above-described embodiments, and modifications within a scope not departing from the gist of the present invention are included in the present invention.
 <備考>
 なお、本出願は、日本出願番号2010-068822に基づく優先権を主張するものであり、日本出願番号2010-068822における開示内容は引用により本出願に組み込まれる。
<Remarks>
In addition, this application claims the priority based on the Japanese application number 2010-068822, and the disclosed content in the Japanese application number 2010-068822 is incorporated in this application by reference.

Claims (13)

  1.  データベース管理システムにおいて、クエリの中間成果の配置方式を評価するクエリ最適化システムであって、
     データを保持するサイトに関するレイテンシ及び信頼性に関する数値情報を取り出す装置と、
     取り出したレイテンシ及び信頼性に関する数値情報をクエリのコスト値に重畳し拡張コスト値を求める装置と、
     拡張コスト値で序列化した場合に上位で所定の条件に適合する指定値を持つサイトを選定する装置と、
     アルゴリズム中のパラメータを一部変更した処理系自身のコピーを作成して先読みを実施する装置と、
     前記先読みの結果、前記所定の条件に適合する指定値を持つサイトの中から、最適化の結果発生する中間成果の配置先サイトを選定し、前記中間成果を、前記選定された配置先サイトのうち少なくとも1つのサイトに配置する装置と
    を含む
     クエリ最適化システム。
    In a database management system, a query optimization system for evaluating an arrangement method of intermediate results of a query,
    An apparatus for extracting numerical information on latency and reliability related to a site holding data;
    An apparatus for obtaining an extended cost value by superimposing the extracted numerical information on latency and reliability on the cost value of the query;
    A device that selects a site having a specified value that meets a predetermined condition at the top when ordered by an extended cost value;
    A device that performs a read-ahead by creating a copy of the processing system itself in which some parameters in the algorithm are changed,
    As a result of the prefetching, a site to which an intermediate result generated as a result of optimization is selected from sites having specified values that meet the predetermined condition, and the intermediate result is selected from the selected destination site. A query optimization system including a device installed at at least one site.
  2.  請求項1に記載のクエリ最適化システムであって、
     クエリの構文を解析し、参照する表集合の各要素表で想定されるアクセスパターンを抽出し、仮結合要素リストに追加する装置と、
     前記各要素表に関して前記仮結合要素リストから最適なアクセス条件のものを取り出して残し、他を前記仮結合要素リストから削除する装置と、
     前記仮結合要素リストの各要素に対して、全レプリカ分のエントリを作成し、前記仮結合要素リストに追加する装置と、
     前記仮結合要素リストからエントリを取り出し、当該エントリにレイテンシ及び信頼性に関する数値情報を付与し、前記仮結合要素リストを更新する装置と、
     前記仮結合要素リストのエントリのレイテンシ及び信頼性に関する数値情報を評価する装置と、
     評価の結果、上位の所定数のレプリカを残し、他を前記仮結合要素リストから削除する装置と、
     前記仮結合要素リストのエントリに基づいて、結合要素リストの初期版を作成し、最適化プラン要素リストに追加する装置と、
     前記仮結合要素リストを削除する装置と
    を更に含む
     クエリ最適化システム。
    The query optimization system of claim 1,
    A device that parses the query syntax, extracts an access pattern assumed in each element table of the reference table set, and adds it to the temporary join element list;
    An apparatus that takes out the optimal access condition from the temporary connection element list for each element table and leaves it, and deletes the other from the temporary connection element list;
    An apparatus for creating entries for all replicas for each element of the temporary combining element list and adding the entries to the temporary combining element list;
    An apparatus for retrieving an entry from the temporary combining element list, giving numerical information about latency and reliability to the entry, and updating the temporary combining element list;
    An apparatus for evaluating numerical information related to latency and reliability of the entry of the temporary coupling element list;
    As a result of the evaluation, a device that leaves a predetermined number of high-order replicas and deletes others from the temporary coupling element list;
    An apparatus for creating an initial version of the combined element list based on the entry of the temporary combined element list and adding it to the optimized plan element list;
    A query optimizing system further comprising: an apparatus for deleting the temporary joining element list.
  3.  請求項2に記載のクエリ最適化システムであって、
     同じ処理系をコピーし、コピーした処理系を起動する装置と、
     前記最適化プラン要素リスト内の要素から、貪欲法(欲張り法)のアルゴリズムを用いてコストを計算する評価関数を最小にする結合木の選択を行う装置と、
     自身がコピー処理系か非コピー処理系かを判断する装置と、
     自身がコピー処理系である場合、非コピー処理系からの問い合わせ待ちを行い、コピーを実施した非コピー処理系からの問い合わせの際に指定される中間成果表の結合木の種類を判定し、外部側/駆動表か、内部側/参照表かを評価し、どのサイトのどの表と結合演算を行うか、を非コピー処理系に応答し、コピー処理系を終了する装置と、
     自身が非コピー処理系である場合、結合木を新たな中間成果表に仮置きし、コピー処理系に問い合わせる装置と、
     配置評価関数を呼び出し、前記中間成果表を配置するべきサイトと当該サイトの索引の作成方針で最適な案を取り出す装置と、
     前記配置評価関数の結果に基づき、所定の箇所に前記中間生成結果を配置する装置と
    を更に含む
     クエリ最適化システム。
    The query optimization system according to claim 2,
    A device that copies the same processing system and starts the copied processing system;
    An apparatus for selecting a connection tree that minimizes an evaluation function for calculating a cost using an algorithm of a greedy method (greedy method) from elements in the optimization plan element list;
    A device that determines whether it is a copy processing system or a non-copy processing system;
    If it is a copy processing system, it waits for an inquiry from the non-copy processing system, determines the type of the intermediate tree for the intermediate result table specified when making an inquiry from the non-copy processing system that performed the copy, A device that evaluates which side / driving table or internal side / reference table, and which table at which site to perform a join operation in response to the non-copy processing system, and ends the copy processing system,
    If the device itself is a non-copy processing system, a device that temporarily places the join tree in a new intermediate result table and inquires of the copy processing system,
    A device that calls a placement evaluation function and retrieves an optimal plan based on a site on which the intermediate result table is to be placed and an index creation policy of the site;
    A query optimization system further comprising: an apparatus that arranges the intermediate generation result at a predetermined location based on the result of the arrangement evaluation function.
  4.  請求項3に記載のクエリ最適化システムであって、
     前記配置評価関数の処理において、次の結合演算先を確認し、次の結合演算先で利用されるサイト群を取り出して、サイト仮リストに追加する装置と、
     前記サイト仮リストに含まれないサイトを取り出し、候補仮リストに追加する装置と、
     探索用の候補エリアを準備し、初期化する装置と、
     順次、サイト情報を読み出し、当該サイトに関するレイテンシ及び信頼性に関する数値情報を読み出す装置と、
     候補エリアで初期化されたものが残っている場合、最初の所定数のサイトを候補として、前記探索用の候補エリアにセットする装置と、
     候補エリアが既にセットされ、前記サイト仮リストに登録されていない場合、レイテンシ及び信頼性に関する数値情報がより高評価であれば、前記探索用の候補エリアに記載されているサイトを置き換える装置と、
     新たな候補となるサイトが特定された場合、当該サイトを前記サイト仮リストに追加する装置と、
     中間成果表の結合木の作成件数から表作成コストを見積もる装置と、
     前記表作成コストの見積りに対し、正規化されたレイテンシに関する数値情報並びに信頼性に関する数値情報の逆数を重畳し、全体コストを求め、全体コストに対して計画番号を付け、候補の一つとして評価リストに追加する装置と、
     索引の作成可能性を判定し、索引が作成可能である場合、全体コストから索引作成コストを見積り、索引作成コストも組み込んだ全体コストを改めて計算し、改めて計算した全体コストに対して計画番号を付け、別候補として前記評価リストに追加する装置と、
     前記評価リスト内のデータに関して、全体コストでソートし序列化を行い、全体コストが最小のものを第一案として記憶する装置と、
     前記第一案と同類の構成で配置先が異なり、全体コストが次に小さいものを第二案として記憶する装置と、
     前記第一案、及び前記第二案を回答する装置と
    を更に含む
     クエリ最適化システム。
    The query optimization system according to claim 3,
    In the processing of the placement evaluation function, a device for confirming the next join operation destination, taking out a site group used in the next join operation destination, and adding it to the temporary site list;
    A device that takes out sites not included in the temporary site list and adds them to the candidate temporary list;
    A device for preparing and initializing a candidate area for search;
    A device that sequentially reads site information and reads numerical information related to latency and reliability related to the site;
    A device that sets the initial predetermined number of sites as candidates in the candidate area for search, if the initialized ones in the candidate areas remain;
    If a candidate area has already been set and is not registered in the temporary site list, if numerical information on latency and reliability is more highly evaluated, a device that replaces the site described in the candidate area for search;
    When a new candidate site is identified, a device for adding the site to the temporary site list,
    A device that estimates the cost of creating a table from the number of join trees created in the intermediate results table;
    Superimposing the numerical information related to normalized latency and the inverse of the numerical information related to reliability on the estimate of the table creation cost, the total cost is obtained, and the plan number is assigned to the total cost and evaluated as one of the candidates. A device to add to the list;
    If the index creation possibility is determined and the index can be created, the index creation cost is estimated from the total cost, the total cost including the index creation cost is calculated again, and the plan number is assigned to the newly calculated total cost. A device to be added to the evaluation list as another candidate,
    Regarding the data in the evaluation list, the apparatus sorts and ranks according to the total cost, and stores the one with the minimum total cost as a first proposal,
    An apparatus for storing a second plan that has the same configuration as the first plan but has a different placement destination and the next lowest overall cost,
    A query optimization system further comprising: an apparatus that answers the first proposal and the second proposal.
  5.  データベース管理システムにおいて、クエリの中間成果の配置方式を評価するクエリ最適化装置であって、
     データを保持するサイトに関するレイテンシ及び信頼性に関する数値情報を取り出す手段と、
     取り出したレイテンシ及び信頼性に関する数値情報をクエリのコスト値に重畳し拡張コスト値を求める手段と、
     拡張コスト値で序列化した場合に上位で所定の条件に適合する指定値を持つサイトを選定する手段と、
     アルゴリズム中のパラメータを一部変更した処理系自身のコピーを作成して先読みを実施する手段と、
     前記先読みの結果、前記所定の条件に適合する指定値を持つサイトの中から、最適化の結果発生する中間成果の配置先サイトを選定し、前記中間成果を、前記選定された配置先サイトのうち少なくとも1つのサイトに配置する手段と
    を具備する
     クエリ最適化装置。
    In a database management system, a query optimization device for evaluating an arrangement method of intermediate results of a query,
    Means for retrieving numerical information about latency and reliability of the site holding the data;
    Means for superimposing the extracted latency and reliability numerical information on the query cost value to obtain an extended cost value;
    A means for selecting a site having a specified value that meets a predetermined condition at the top when ordered by extended cost value,
    A means of making a copy of the processing system itself in which some of the parameters in the algorithm are changed and performing prefetching;
    As a result of the prefetching, a site to which an intermediate result generated as a result of optimization is selected from sites having specified values that meet the predetermined condition, and the intermediate result is selected from the selected destination site. A query optimizing device comprising means for disposing at least one of the sites.
  6.  請求項5に記載のクエリ最適化装置であって、
     クエリの構文を解析し、参照する表集合の各要素表で想定されるアクセスパターンを抽出し、仮結合要素リストに追加する手段と、
     前記各要素表に関して前記仮結合要素リストから最適なアクセス条件のものを取り出して残し、他を前記仮結合要素リストから削除する手段と、
     前記仮結合要素リストの各要素に対して、全レプリカ分のエントリを作成し、前記仮結合要素リストに追加する手段と、
     前記仮結合要素リストからエントリを取り出し、当該エントリにレイテンシ及び信頼性に関する数値情報を付与し、前記仮結合要素リストを更新する手段と、
     前記仮結合要素リストのエントリのレイテンシ及び信頼性に関する数値情報を評価する手段と、
     評価の結果、上位の所定数のレプリカを残し、他を前記仮結合要素リストから削除する手段と、
     前記仮結合要素リストのエントリに基づいて、結合要素リストの初期版を作成し、最適化プラン要素リストに追加する手段と、
     前記仮結合要素リストを削除する手段と
    を更に具備する
     クエリ最適化装置。
    The query optimization device according to claim 5,
    A means of analyzing the syntax of the query, extracting an access pattern assumed in each element table of the table set to be referenced, and adding it to the temporary join element list;
    Means for taking out each of the element tables from the temporary connection element list and leaving the optimal access condition, and deleting others from the temporary connection element list;
    Means for creating entries for all replicas for each element of the temporary combining element list and adding to the temporary combining element list;
    Means for taking out an entry from the temporary binding element list, giving numerical information about latency and reliability to the entry, and updating the temporary binding element list;
    Means for evaluating numerical information relating to latency and reliability of the entries of the temporary binding element list;
    As a result of the evaluation, the means for leaving the upper predetermined number of replicas and deleting others from the temporary coupling element list;
    Means for creating an initial version of the combined element list based on the entry of the temporary combined element list and adding it to the optimized plan element list;
    Query optimizing apparatus, further comprising means for deleting the temporary joining element list.
  7.  請求項6に記載のクエリ最適化装置であって、
     同じ処理系をコピーし、コピーした処理系を起動する手段と、
     前記最適化プラン要素リスト内の要素から、貪欲法(欲張り法)のアルゴリズムを用いてコストを計算する評価関数を最小にする結合木の選択を行う手段と、
     自身がコピー処理系か非コピー処理系かを判断する手段と、
     自身がコピー処理系である場合、非コピー処理系からの問い合わせ待ちを行い、コピーを実施した非コピー処理系からの問い合わせの際に指定される中間成果表の結合木の種類を判定し、外部側/駆動表か、内部側/参照表かを評価し、どのサイトのどの表と結合演算を行うか、を非コピー処理系に応答し、コピー処理系を終了する手段と、
     自身が非コピー処理系である場合、結合木を新たな中間成果表に仮置きし、コピー処理系に問い合わせる手段と、
     配置評価関数を呼び出し、前記中間成果表を配置するべきサイトと当該サイトの索引の作成方針で最適な案を取り出す手段と、
     前記配置評価関数の結果に基づき、所定の箇所に前記中間生成結果を配置する手段と
    を更に具備する
     クエリ最適化装置。
    The query optimization device according to claim 6,
    A means for copying the same processing system and starting the copied processing system;
    Means for selecting, from the elements in the optimization plan element list, a join tree that minimizes an evaluation function for calculating a cost using an algorithm of greedy method (greedy method);
    Means for determining whether it is a copy processing system or a non-copy processing system;
    If it is a copy processing system, it waits for an inquiry from the non-copy processing system, determines the type of the intermediate tree for the intermediate result table specified when making an inquiry from the non-copy processing system that performed the copy, Means for evaluating the side / driving table or the internal side / reference table, responding to the non-copy processing system to which table at which site to perform the join operation, and terminating the copy processing system;
    If it is a non-copy processing system, a means for temporarily placing the connection tree in a new intermediate result table and inquiring the copy processing system,
    Means for invoking a placement evaluation function and taking out the optimum plan based on the site where the intermediate result table is to be placed and the index creation policy of the site;
    A query optimization device further comprising: means for arranging the intermediate generation result at a predetermined location based on the result of the arrangement evaluation function.
  8.  請求項7に記載のクエリ最適化装置であって、
     前記配置評価関数の処理において、次の結合演算先を確認し、次の結合演算先で利用されるサイト群を取り出して、サイト仮リストに追加する手段と、
     前記サイト仮リストに含まれないサイトを取り出し、候補仮リストに追加する手段と、
     探索用の候補エリアを準備し、初期化する手段と、
     順次、サイト情報を読み出し、当該サイトに関するレイテンシ及び信頼性に関する数値情報を読み出す手段と、
     候補エリアで初期化されたものが残っている場合、最初の所定数のサイトを候補として、前記探索用の候補エリアにセットする手段と、
     候補エリアが既にセットされ、前記サイト仮リストに登録されていない場合、レイテンシ及び信頼性に関する数値情報がより高評価であれば、前記探索用の候補エリアに記載されているサイトを置き換える手段と、
     新たな候補となるサイトが特定された場合、当該サイトを前記サイト仮リストに追加する手段と、
     中間成果表の結合木の作成件数から表作成コストを見積もる手段と、
     前記表作成コストの見積りに対し、正規化されたレイテンシに関する数値情報並びに信頼性に関する数値情報の逆数を重畳し、全体コストを求め、全体コストに対して計画番号を付け、候補の一つとして評価リストに追加する手段と、
     索引の作成可能性を判定し、索引が作成可能である場合、全体コストから索引作成コストを見積り、索引作成コストも組み込んだ全体コストを改めて計算し、改めて計算した全体コストに対して計画番号を付け、別候補として前記評価リストに追加する手段と、
     前記評価リスト内のデータに関して、全体コストでソートし序列化を行い、全体コストが最小のものを第一案として記憶する手段と、
     前記第一案と同類の構成で配置先が異なり、全体コストが次に小さいものを第二案として記憶する手段と、
     前記第一案、及び前記第二案を回答する手段と
    を更に具備する
     クエリ最適化装置。
    The query optimization device according to claim 7,
    In the processing of the location evaluation function, a means for confirming the next join operation destination, taking out a site group used in the next join operation destination, and adding it to the temporary site list;
    Means for taking out a site not included in the temporary site list and adding it to the candidate temporary list;
    Means for preparing and initializing candidate areas for search;
    Means for sequentially reading site information and reading numerical information about latency and reliability related to the site;
    If the candidate area remains initialized, means for setting the first predetermined number of sites as candidates in the candidate area for search;
    If the candidate area is already set and not registered in the temporary site list, if the numerical information regarding latency and reliability is higher evaluation, means for replacing the site described in the candidate area for search;
    If a new candidate site is identified, means for adding the site to the temporary site list;
    A means of estimating the cost of creating a table from the number of join trees created in the intermediate outcome table;
    Superimposing the numerical information related to normalized latency and the inverse of the numerical information related to reliability on the estimate of the table creation cost, the total cost is obtained, and the plan number is assigned to the total cost and evaluated as one of the candidates. Means to add to the list,
    If the index creation possibility is determined and the index can be created, the index creation cost is estimated from the total cost, the total cost including the index creation cost is calculated again, and the plan number is assigned to the newly calculated total cost. Means for adding to the evaluation list as another candidate,
    With respect to the data in the evaluation list, means for sorting and ranking by the total cost, and storing the one with the minimum total cost as a first plan,
    Means for storing the second plan with the next lowest overall cost in the same configuration as the first plan,
    A query optimization apparatus, further comprising: means for answering the first proposal and the second proposal.
  9.  データベース管理システムにおいて、クエリの中間成果の配置方式を評価するコンピュータにより実施されるクエリ最適化方法であって、
     データを保持するサイトに関するレイテンシ及び信頼性に関する数値情報を取り出すことと、
     取り出したレイテンシ及び信頼性に関する数値情報をクエリのコスト値に重畳し拡張コスト値を求めることと、
     拡張コスト値で序列化した場合に上位で所定の条件に適合する指定値を持つサイトを選定することと、
     アルゴリズム中のパラメータを一部変更した処理系自身のコピーを作成して先読みを実施することと、
     前記先読みの結果、前記所定の条件に適合する指定値を持つサイトの中から、最適化の結果発生する中間成果の配置先サイトを選定し、前記中間成果を、前記選定された配置先サイトのうち少なくとも1つのサイトに配置することと
    を含む
     クエリ最適化方法。
    In a database management system, a query optimization method implemented by a computer that evaluates an arrangement method of intermediate results of a query,
    Retrieving numerical information about latency and reliability of the site holding the data;
    Superimposing the retrieved latency and reliability numerical information on the cost value of the query to obtain an extended cost value;
    Select a site with a specified value that meets the specified conditions at the top when ordered by extended cost value,
    Create a copy of the processing system itself with some changes to the parameters in the algorithm and perform prefetching;
    As a result of the prefetching, a site to which an intermediate result generated as a result of optimization is selected from sites having specified values that meet the predetermined condition, and the intermediate result is selected from the selected destination site. A query optimization method comprising deploying to at least one of the sites.
  10.  請求項9に記載のクエリ最適化方法であって、
     クエリの構文を解析し、参照する表集合の各要素表で想定されるアクセスパターンを抽出し、仮結合要素リストに追加することと、
     前記各要素表に関して前記仮結合要素リストから最適なアクセス条件のものを取り出して残し、他を前記仮結合要素リストから削除することと、
     前記仮結合要素リストの各要素に対して、全レプリカ分のエントリを作成し、前記仮結合要素リストに追加することと、
     前記仮結合要素リストからエントリを取り出し、当該エントリにレイテンシ及び信頼性に関する数値情報を付与し、前記仮結合要素リストを更新することと、
     前記仮結合要素リストのエントリのレイテンシ及び信頼性に関する数値情報を評価することと、
     評価の結果、上位の所定数のレプリカを残し、他を前記仮結合要素リストから削除することと、
     前記仮結合要素リストのエントリに基づいて、結合要素リストの初期版を作成し、最適化プラン要素リストに追加することと、
     前記仮結合要素リストを削除することと
    を更に含む
     クエリ最適化方法。
    The query optimization method according to claim 9, comprising:
    Analyzing the query syntax, extracting the expected access pattern in each element table of the referenced table set, adding it to the temporary join element list,
    Taking out the one with the optimal access condition from the temporary binding element list for each of the element tables, and deleting the other from the temporary binding element list;
    Creating an entry for all replicas for each element of the temporary binding element list and adding to the temporary binding element list;
    Taking out an entry from the temporary binding element list, giving numerical information about latency and reliability to the entry, and updating the temporary binding element list;
    Evaluating numerical information regarding latency and reliability of the entries in the temporary binding element list;
    As a result of the evaluation, leaving the upper predetermined number of replicas, deleting the other from the temporary coupling element list,
    Creating an initial version of the combined element list based on the entry of the temporary combined element list and adding it to the optimized plan element list;
    The query optimization method further comprising: deleting the temporary joining element list.
  11.  請求項10に記載のクエリ最適化方法であって、
     同じ処理系をコピーし、コピーした処理系を起動することと、
     前記最適化プラン要素リスト内の要素から、貪欲法(欲張り法)のアルゴリズムを用いてコストを計算する評価関数を最小にする結合木の選択を行うことと、
     自身がコピー処理系か非コピー処理系かを判断することと、
     自身がコピー処理系である場合、非コピー処理系からの問い合わせ待ちを行い、コピーを実施した非コピー処理系からの問い合わせの際に指定される中間成果表の結合木の種類を判定し、外部側/駆動表か、内部側/参照表かを評価し、どのサイトのどの表と結合演算を行うか、を非コピー処理系に応答し、コピー処理系を終了することと、
     自身が非コピー処理系である場合、結合木を新たな中間成果表に仮置きし、コピー処理系に問い合わせることと、
     配置評価関数を呼び出し、前記中間成果表を配置するべきサイトと当該サイトの索引の作成方針で最適な案を取り出すことと、
     前記配置評価関数の結果に基づき、所定の箇所に前記中間生成結果を配置することと
    を更に含む
     クエリ最適化方法。
    The query optimization method according to claim 10, comprising:
    Copy the same processing system, start the copied processing system,
    Selecting from the elements in the optimization plan element list a join tree that minimizes an evaluation function for calculating a cost using an algorithm of greedy method (greedy method);
    Determining whether it is a copy processing system or a non-copy processing system,
    If it is a copy processing system, it waits for an inquiry from the non-copy processing system, determines the type of the intermediate tree for the intermediate result table specified when making an inquiry from the non-copy processing system that performed the copy, Evaluating the side / drive table or the internal / reference table, responding to which table at which site to perform the join operation to the non-copy processing system, and terminating the copy processing system;
    If you are a non-copy processing system, temporarily place the join tree in a new intermediate outcome table, query the copy processing system,
    Calling the placement evaluation function, taking out the optimal plan in terms of the site where the intermediate result table should be placed and the index creation policy of the site;
    Arranging the intermediate generation result at a predetermined location based on the result of the arrangement evaluation function. The query optimization method.
  12.  請求項11に記載のクエリ最適化方法であって、
     前記配置評価関数の処理において、次の結合演算先を確認し、次の結合演算先で利用されるサイト群を取り出して、サイト仮リストに追加することと、
     前記サイト仮リストに含まれないサイトを取り出し、候補仮リストに追加することと、
     探索用の候補エリアを準備し、初期化することと、
     順次、サイト情報を読み出し、当該サイトに関するレイテンシ及び信頼性に関する数値情報を読み出すことと、
     候補エリアで初期化されたものが残っている場合、最初の所定数のサイトを候補として、前記探索用の候補エリアにセットすることと、
     候補エリアが既にセットされ、前記サイト仮リストに登録されていない場合、レイテンシ及び信頼性に関する数値情報がより高評価であれば、前記探索用の候補エリアに記載されているサイトを置き換えることと、
     新たな候補となるサイトが特定された場合、当該サイトを前記サイト仮リストに追加することと、
     中間成果表の結合木の作成件数から表作成コストを見積もることと、
     前記表作成コストの見積りに対し、正規化されたレイテンシに関する数値情報並びに信頼性に関する数値情報の逆数を重畳し、全体コストを求め、全体コストに対して計画番号を付け、候補の一つとして評価リストに追加することと、
     索引の作成可能性を判定し、索引が作成可能である場合、全体コストから索引作成コストを見積り、索引作成コストも組み込んだ全体コストを改めて計算し、改めて計算した全体コストに対して計画番号を付け、別候補として前記評価リストに追加することと、
     前記評価リスト内のデータに関して、全体コストでソートし序列化を行い、全体コストが最小のものを第一案として記憶することと、
     前記第一案と同類の構成で配置先が異なり、全体コストが次に小さいものを第二案として記憶することと、
     前記第一案、及び前記第二案を回答することと
    を更に含む
     クエリ最適化方法。
    The query optimization method according to claim 11, comprising:
    In the processing of the location evaluation function, confirm the next join operation destination, take out the site group used in the next join operation destination, and add to the temporary site list;
    Taking out sites not included in the temporary site list and adding them to the candidate temporary list;
    Preparing and initializing candidate areas for search;
    Sequentially reading the site information, reading the numerical information about the latency and reliability of the site,
    If there remains a initialized area in the candidate area, setting the first predetermined number of sites as candidates in the candidate area for search;
    If the candidate area is already set and not registered in the temporary site list, if the numerical information regarding latency and reliability is higher, replacing the site described in the candidate area for search,
    If a new candidate site is identified, adding the site to the temporary site list,
    Estimating table creation costs from the number of join trees created in the intermediate outcome table,
    Superimposing the numerical information on normalized latency and the reciprocal of the numerical information on reliability is superimposed on the estimate of the table creation cost, the overall cost is obtained, the plan number is assigned to the overall cost, and it is evaluated as one of the candidates Adding to the list,
    If the index creation possibility is determined and the index can be created, the index creation cost is estimated from the total cost, the total cost including the index creation cost is calculated again, and the plan number is assigned to the newly calculated total cost. Adding to the evaluation list as another candidate,
    Regarding the data in the evaluation list, sorting and ranking by the overall cost, storing the one with the lowest overall cost as a first plan,
    The arrangement destination is different in the same configuration as the first plan, and the next lowest overall cost is stored as the second plan,
    A query optimization method, further comprising: responding to the first proposal and the second proposal.
  13.  請求項9乃至12のいずれか一項に記載のクエリ最適化方法を、計算機に実行させるためのプログラム。 A program for causing a computer to execute the query optimization method according to any one of claims 9 to 12.
PCT/JP2011/055887 2010-03-24 2011-03-14 Query optimization system, query optimization device, and query optimization method WO2011118425A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2012506941A JPWO2011118425A1 (en) 2010-03-24 2011-03-14 Query optimization system, query optimization device, and query optimization method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010068822 2010-03-24
JP2010-068822 2010-03-24

Publications (1)

Publication Number Publication Date
WO2011118425A1 true WO2011118425A1 (en) 2011-09-29

Family

ID=44672983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/055887 WO2011118425A1 (en) 2010-03-24 2011-03-14 Query optimization system, query optimization device, and query optimization method

Country Status (2)

Country Link
JP (1) JPWO2011118425A1 (en)
WO (1) WO2011118425A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014000578A1 (en) * 2012-06-29 2014-01-03 International Business Machines Corporation Method and apparatus for processing database data in distributed database system
CN110866003A (en) * 2018-08-27 2020-03-06 阿里巴巴集团控股有限公司 Method and device for estimating number of index values and electronic equipment
US11061907B2 (en) 2018-08-03 2021-07-13 Hitachi, Ltd. Database management system and method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002222108A (en) * 2001-01-26 2002-08-09 Hitachi Ltd Device and method for generating partial replica

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002222108A (en) * 2001-01-26 2002-08-09 Hitachi Ltd Device and method for generating partial replica

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DONALD KOSSMANN: "Iterative Dynamic Programming: A new Class of Query Optimization Algorithms", ACM TRANSACTIONS ON DATABASE SYSTEMS, vol. 25, no. 1, March 2000 (2000-03-01), pages 43 - 82, Retrieved from the Internet <URL:http://portal.acm.org/citation.cfm?id=352982> [retrieved on 20110414] *
EUGENE J. SHEKITA: "Multi-Join Optimization for Symmetric Multi Processors", PROCEEDINGS OF THE 19TH VLDB CONFERENCE, VLDB, 24 August 1993 (1993-08-24), pages 479 - 492, Retrieved from the Internet <URL:http://www.vldb.org/conf/1993/P479.PDF> [retrieved on 20110414] *
PAWEL JURCZYK: "Dynamic Query Processing for P2P Data Services in the Cloud", 20TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS(DEXA2009), 31 August 2009 (2009-08-31), pages 396 - 411, XP019125957, Retrieved from the Internet <URL:http://www.mathcs.emory.edu/~1xiong/research/pub/dobjects09dexa.pdf> [retrieved on 20110414] *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014000578A1 (en) * 2012-06-29 2014-01-03 International Business Machines Corporation Method and apparatus for processing database data in distributed database system
CN103514229A (en) * 2012-06-29 2014-01-15 国际商业机器公司 Method and device used for processing database data in distributed database system
GB2517885A (en) * 2012-06-29 2015-03-04 Ibm Method and apparatus for processing database data in distributed database system
US9411867B2 (en) 2012-06-29 2016-08-09 International Business Machines Corporation Method and apparatus for processing database data in distributed database system
US10140351B2 (en) 2012-06-29 2018-11-27 International Business Machines Corporation Method and apparatus for processing database data in distributed database system
US10628449B2 (en) 2012-06-29 2020-04-21 International Business Machines Corporation Method and apparatus for processing database data in distributed database system
US11061907B2 (en) 2018-08-03 2021-07-13 Hitachi, Ltd. Database management system and method
CN110866003A (en) * 2018-08-27 2020-03-06 阿里巴巴集团控股有限公司 Method and device for estimating number of index values and electronic equipment
CN110866003B (en) * 2018-08-27 2023-09-26 阿里云计算有限公司 Index value number estimation method and device and electronic equipment

Also Published As

Publication number Publication date
JPWO2011118425A1 (en) 2013-07-04

Similar Documents

Publication Publication Date Title
US6446089B1 (en) Method of using a cache to determine the visibility to a remote database client of a plurality of database transactions
US6178425B1 (en) Method of determining the visibility to a remote database client of a plurality of database transactions using simplified visibility rules
CN103678556B (en) The method and processing equipment of columnar database processing
JP3984659B2 (en) Overview catalog
JP4340226B2 (en) Providing usable versions of data items
US8756196B2 (en) Propagating tables while preserving cyclic foreign key relationships
CN108431810A (en) proxy database
US20130110873A1 (en) Method and system for data storage and management
CN104484472B (en) A kind of data-base cluster and implementation method of a variety of heterogeneous data sources of mixing
CN100518192C (en) Computer and method for establishing light weight catalog access data protocol exchange format fragment
CN108140040A (en) The selective data compression of database in memory
WO2008140937A2 (en) Query handling in databases with replicated data
AU2005239366A1 (en) Partial query caching
CN101546311A (en) Data processing method and data processing device for recycle bin
JPH0822409A (en) Distribution information management system for network
WO2010077549A2 (en) Data integrity in a database environment through background synchronization
EP2610768B1 (en) Data archiving and de-archiving in a business environment
CN103946794A (en) Cross-reference and priority claim to related applications
US11645281B1 (en) Caching query plans in database systems
US20070174360A1 (en) Storage system embedding database
CN106569896A (en) Data distribution and parallel processing method and system
CN110134706A (en) SQL statement automatic optimization method, device, computer equipment and storage medium
CN1829974B (en) Parallel recovery by non-failed nodes
CN100485640C (en) Cache for an enterprise software system
WO2011118425A1 (en) Query optimization system, query optimization device, and query optimization method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11759231

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012506941

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11759231

Country of ref document: EP

Kind code of ref document: A1