WO2011078162A1

WO2011078162A1 - Scheduling device, scheduling method and program

Info

Publication number: WO2011078162A1
Application number: PCT/JP2010/072994
Authority: WO
Inventors: 小林　大
Original assignee: 日本電気株式会社
Priority date: 2009-12-24
Filing date: 2010-12-21
Publication date: 2011-06-30
Also published as: JP5810918B2; JPWO2011078162A1

Abstract

Disclosed is a scheduling device for improving system performance and usage efficiency of resources when time-sharing a parallel data processing system by a plurality of dataflow programs. A scheduling device comprises a program analysis unit that refers to first and second dataflow programs and generates first and second DAGs representing both dataflow programs, and first and second processing execution order information units representing execution order of processing corresponding to both DAG nodes; a processing assignment unit for assigning processing corresponding to nodes of the first and second DAGs to a plurality of data processing devices on the basis of the first and second processing execution order information units; and a data assignment unit for referring to the first and second DAGs and the first and second processing execution order information units, and exchanging data used in the processing corresponding to the nodes of the first and second DAGs between a storage unit of a storage device and storage units provided in each of the plurality of data processing devices.

Description

Scheduling apparatus, scheduling method and program

[Description of related applications]
The present invention is based on the priority claim of Japanese patent application: Japanese Patent Application No. 2009-293082 (filed on Dec. 24, 2009), the entire contents of which are incorporated herein by reference. Shall.
The present invention relates to a scheduling device, a scheduling method, and a program, and more particularly, to a scheduling device, a scheduling method, and a program in a parallel data processing system.

In a parallel data processing system, data to be processed is stored in a plurality of computers connected in parallel to the network, and data processing is performed in parallel by the plurality of computers. As a typical method of a parallel data processing system, there are a parallel database method (Non-Patent Document 1) that arranges data according to processing, and a MapReduce method (Non-Patent Document 2) that arranges processing according to data arrangement. In the MapReduce method, data is divided without depending on the contents and stored in a computer constituting a parallel data processing system.

A program for processing data in a parallel data processing system to obtain a result can be expressed by a directed acyclic graph (DAG) structure with the processing content as a node and the data flow as a branch. . Such a program is called a data flow program.

In the parallel database described in Non-Patent Document 1, a series of programs described in a query language such as SQL is converted into a query tree having a DAG structure by a part of system internal processing called a query optimizer. In Pig Latin described in Non-Patent Document 3 and DryLINQ described in Non-Patent Document 4, a user who writes a program directly writes a program having a DAG structure.

The data flow program expressed in the DAG structure is converted into processing operations in the system to be used and executed in order by the processing scheduler function. In Pig Latin described in Non-Patent Document 3 and Hadoop described in Non-Patent Document 5, processing contents corresponding to each node of the DAG structure are converted into MapReduce processing by a processing scheduler function (JobTracker) and executed.

Many technologies that use a system composed of a plurality of computers in a time-sharing manner are known as computation scheduling on a parallel computer. Non-Patent Document 6 describes a gang schedule technique in which a process that spans multiple computers is replaced with a different process. In the technique described in Patent Document 1, the processing is further divided into units called gang groups from the process and used in a time-sharing manner. Also, in the schedule method described in Patent Document 2, when a job is composed of a plurality of processes, job information is used to switch processes on each computer.

In order to replace a plurality of data processing programs in a time-sharing manner, a storage device (hereinafter referred to as “storage device”) for storing unused data is used in addition to a computer used for processing. By the way, a technique for exchanging data across a plurality of computers or storage devices is known. According to the method described in Patent Document 3, data can be re-developed reflecting the data usage tendency. Further, according to the method described in Patent Document 4, it is possible to replace data sets without stopping the processing. The function for performing the control for replacing the data processing program in a time division manner is hereinafter referred to as a data scheduler function or a data allocation function.

Note that Non-Patent Document 7 describes a technique for speeding up the system by holding intermediate data generated during processing without discarding it.

Japanese Patent No. 3885748 JP 2001-249821 A Japanese Patent Application No. 2009-083426 Japanese Patent Application No. 2009-202543

The disclosures of the above-mentioned patent documents and non-patent documents are incorporated herein by reference. The following analysis was made by the present inventors.
Consider a case where a plurality of data flow programs are allocated to a parallel data processing system by a processing scheduler function (or a processing allocation function) and the parallel data processing system is used in a time-sharing manner.

The scheduling techniques for parallel computers described in

Patent Documents

1 and 2 and Non-Patent Document 6 refer to synchronization between processes, but do not refer to the flow of data being used. In addition, the parallel data processing systems described in Non-Patent Documents 1 to 7 do not mention time-sharing use, and according to the combination of these systems and the technologies described in Patent Documents 3 and 4, The scheduler and the data scheduler operate separately, and the system becomes inefficient.

One of the causes of inefficiency is the replacement of processing that occurs during processing execution. That is, while one process in the data flow program assigned by the process scheduler is being performed, the data scheduler switches this process with another process in a time-sharing manner. When the computer has a cache mechanism for processing frequently handled data using a high-speed and low-capacity storage element, if the data scheduler is replaced, the cache hit rate decreases and the processing time increases. In addition, when the above function that does not discard intermediate data during processing is used, the amount of data to be replaced during processing is much larger than before and after processing, and the time for data replacement Will increase.

の他 Another factor for inefficiency is saturation of storage device performance. When the storage device includes a plurality of lower layer data storage units, the data transfer capability of the lower layer data storage unit is limited according to the performance of the device and the network. Therefore, when a large number of data processing devices exchange data with the same lower layer data storage unit, the data transfer amount is limited by the data transfer capability, and the data transfer rate per data processing device may decrease.

Another factor for inefficiency is the lack of data reuse information. For example, when the process A and the process B process the same data, the process A can be omitted after the process A by performing the process B after the process A. However, this cannot be realized when the data scheduler does not grasp the processing order and the data flow.

Therefore, when a parallel data processing system is used in a time-sharing manner by a plurality of data flow programs, it becomes a problem to improve system performance and resource utilization efficiency. The objective of this invention is providing the scheduling apparatus, the scheduling method, and program which solve this subject.

The scheduling apparatus according to the first aspect of the present invention is:
Referring to the first data flow program, a first directed acyclic graph (DAG) representing the first data flow program and execution of a process corresponding to a node of the first DAG Generating first processing execution order information representing the order, and referring to the second data flow program, the second DAG representing the second data flow program, and the nodes of the second DAG A program analysis unit that generates second process execution order information representing the execution order of the corresponding processes;
A process allocation unit that allocates a process corresponding to a node of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information; ,
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A data allocation unit that exchanges data used for processing between a storage unit of the storage device and a storage unit provided in each of the plurality of data processing devices.

The scheduling method according to the second aspect of the present invention is:
The computer refers to the first data flow program and corresponds to a first directed acyclic graph (DAG) representing the first data flow program and a node of the first DAG. Generating first process execution order information representing a process execution order, referring to a second data flow program, a second DAG representing the second data flow program, and the second DAG; Generating second process execution order information representing an execution order of processes corresponding to the nodes;
A process allocating step of allocating processes corresponding to nodes of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information; ,
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A data allocation step of replacing data used for processing between a storage unit of the storage device and a storage unit provided in each of the plurality of data processing devices.

The program according to the third aspect of the present invention is:
Referring to the first data flow program, a first directed acyclic graph (DAG) representing the first data flow program and execution of a process corresponding to a node of the first DAG Generating first processing execution order information representing the order, and referring to the second data flow program, the second DAG representing the second data flow program, and the nodes of the second DAG A process of generating second process execution order information representing an execution order of the corresponding process;
A process of allocating a process corresponding to a node of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information;
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A computer is caused to execute a process of exchanging data used for processing between a storage unit of a storage device and a storage unit provided in each of the plurality of data processing devices.

The scheduling apparatus, scheduling method, and program according to the present invention can improve system performance and resource utilization efficiency when a parallel data processing system is time-sharedly used by a plurality of data flow programs.

It is a block diagram which shows the structure of the parallel data processing system which concerns on 1st Embodiment. It is a block diagram which shows the physical structure of the parallel data processing system which concerns on 1st Embodiment. It is a block diagram which shows the function which concerns on the parallel data processing in the parallel data processing system which concerns on 1st Embodiment. It is a block diagram which shows the function which concerns on the time division utilization of the parallel data set in the parallel data processing system which concerns on 1st Embodiment. It is a figure which shows the data flow program represented by the DAG structure as an example. It is a block diagram which shows the structure of the parallel data processing system which concerns on 2nd Embodiment. It is a block diagram which shows the other structure of the parallel data processing system which concerns on 2nd Embodiment. It is a block diagram which shows the structure of the parallel data processing system which concerns on 3rd Embodiment. It is a block diagram which shows the other structure of the parallel data processing system which concerns on 3rd Embodiment. It is a figure which shows as an example the data processor allocation list | wrist in the scheduling apparatus of the parallel data processing system which concerns on 3rd Embodiment. It is a block diagram which shows the structure of the parallel data processing system which concerns on 4th Embodiment. It is a block diagram which shows the structure of the parallel data processing system which concerns on 5th Embodiment.

According to the first development form, the scheduling device according to the first viewpoint is provided.

According to the second deployment form,
The data allocating unit allocates data used for processing corresponding to the nodes of the first DAG and data used for processing corresponding to the nodes of the second DAG to the plurality of data processing devices. The scheduling device is provided that alternately repeats the allocation of each time each processing is completed.

According to the third deployment mode,
The data allocating unit corresponds to the plurality of data processing devices, the allocation of data used for a plurality of processes corresponding to the plurality of nodes of the first DAG, and the plurality of nodes of the second DAG. There is provided a scheduling device that alternately repeats allocation of data used for a plurality of processes each time the plurality of processes are completed.

According to the fourth deployment mode,
The data allocating unit performs processing corresponding to a node of one of the first DAG and the second DAG for data allocated to the plurality of data processing devices in a predetermined period. A scheduling device is provided that allocates data used for processing corresponding to a node of the other DAG to the plurality of data processing devices when the processing is not completed before the lapse.

According to the fifth development form,
The data allocation unit generates a process identifier for identifying a process targeted for data allocated to the plurality of data processing devices;
A scheduling device is provided in which the process assignment unit assigns a process to the plurality of data processing devices with reference to the process identifier.

According to the sixth development form,
The data allocation unit includes an execution order for executing a second process after the first process in at least one of the first process execution order information and the second process execution order information. A signal indicating that the first processing has been completed from any one of the plurality of data processing devices in a case where data used for the second processing is recorded in the storage unit of the storage device Is received, the data used for the first processing is transferred from the storage unit of the data processing device to the storage unit of the storage device, and the data used for the second processing is transferred to the storage device. A scheduling device is provided that causes a storage unit to transmit to the storage unit of the data processing device.

According to the seventh development form,
The program analysis unit has a data processing device allocation list indicating to which data processing device of the plurality of data processing devices the processing corresponding to the nodes of the first DAG and the second DAG is allocated. Output,
The data allocation unit stores data used for processing corresponding to the nodes of the first DAG and the second DAG in a storage unit of the storage device and a data processing device allocation list included in the data processing device allocation list. A scheduling apparatus is provided, wherein the scheduling apparatus is replaced only with a part.

According to the eighth deployment form,
Whether the data allocation unit should further refer to information indicating the performance of the storage device and change the execution order included in the first process execution order information and / or the second process execution order information. If it is determined that it should be changed, a change request including the change contents is output to the program analysis unit,
The program analysis unit changes the first process execution order information and / or the second process execution order information in response to the change request, and outputs the change to the process allocation unit and the data allocation unit A scheduling device is provided.

According to the ninth development mode, the scheduling method according to the second viewpoint is provided.

According to the tenth development form, a program according to the third viewpoint is provided.

According to the eleventh development form, a computer-readable recording medium recording the program according to the tenth development form is provided.

(Embodiment 1)
The parallel data processing system according to the first embodiment will be described with reference to the drawings. FIG. 2 is a block diagram showing the configuration of the parallel data processing system according to this embodiment. Referring to FIG. 2, the parallel data processing system includes one or more data processing devices 50a-1 to 50a-n, a storage device 40a, and a scheduling device 10a connected via a network 80. A user device 70 that is a computer used by a user who uses the parallel data processing system is also connected to the network 80. In FIG. 2, as an example, the number of data processing apparatuses is three (n = 3), but the number of data processing apparatuses is not limited to this.

The data processing devices 50a-i (i = 1, 2,...) Each have a CPU 71, a data storage unit 72, and a data transfer unit 73. The CPU 71 implements a distributed data discharge unit, a data reception unit, and a data processing unit which will be described later. The data storage unit 72 realizes an upper layer data storage unit to be described later. The CPU 71 and the data transfer unit 73 implement a data receiving unit described later.

The storage device 40a includes a CPU 81, a data storage unit 82, and a data transfer unit 83. The CPU 81 and the data transfer unit 83 realize a data transfer unit described later. The CPU 81 and the data storage unit 82 realize a lower layer data storage unit to be described later.

The scheduling device 10 a is a computer having a CPU 91, a data storage unit 92, and a data transfer unit 93. The CPU 91, the data storage unit 92, and the data transfer unit 93 are used to implement a program analysis unit, a process allocation unit, and a data allocation unit, which will be described later.

The scheduling device 10a may be a single computer as shown in FIG. Alternatively, the scheduling device 10a may be realized by a plurality of computers, and each computer may individually execute each of the plurality of functions of the scheduling device 10a. Further, a part of the data processing device 50a-i or the storage device 40a may replace the function of the scheduling device 10a.

The data storage units 72, 82, and 92 are, for example, a hard disk drive, flash memory, DRAM, MRAM (Magnetorative Random Access Memory), FeRAM (Ferroelectric Random Access Memory), PRAM (Phase change RAM), and a RAM that is stored in a PRAM (Phase change RAM). It is a control device that records data on an apparatus, a physical medium capable of recording data such as a magnetic tape, or a medium installed outside a storage node.

The network 80 and the

data transfer units

73, 83, and 93 use, for example, Ethernet (registered trademark), Fiber Channel, FCoE (Fibre Channel over Ethernet (registered trademark)), InfiniBand, QsNet, Myrnet, or Ethernet. It can be realized by an upper protocol such as TCP / IP or RDMA. However, the implementation method of the network 80 is not limited to these.

First, the functions related to parallel data processing will be described with reference to FIG.

The data processing devices 50a-i store data to be processed in the upper layer data storage unit 61. When there are a plurality of data processing devices 50a-1 to 50a-n, a data set necessary for one processing content is divided and stored in individual data processing devices 50a-i. For example, an identifier of individual data included in a data set or a hash that determines which data is stored in which data processing device 50a-i by dividing a range of values obtained by processing the identifier with a hash function Division can be used. An arbitrary arrangement in which arbitrary data is stored in an arbitrary upper data storage unit 61 can also be used. Note that the data arrangement method is not limited to these.

Referring to FIG. 3, the scheduling apparatus 10 a includes a program analysis unit 21 and a process allocation unit 22.

The program analysis unit 21 receives the program from the user device 70, specifies the data processing device 50a-i that executes the program, and performs processing control.

The program from the user is a data flow program in which a plurality of processing contents for data, usage data information indicating data used for processing, and processing order constraints are described. The data flow program can be represented by a directed acyclic graph (DAG) structure with processing contents as nodes and usage data information indicating data used for processing as branches.

FIG. 5 is a diagram illustrating an example of a data flow program represented by a DAG structure. As an example, the data flow program is a program that directly describes the data flow structure in the languages described in Non-Patent Documents 3 and 4. The data flow program may be information representing a query tree obtained by converting a program based on a query language such as SQL by a query optimizer of the database management system. The data flow program is not limited to these.

The program analysis unit 21 interprets the input data flow program and determines the data processing execution order based on the performance and availability information of the available data processing devices 50a-i and the arrangement information of the data to be used. .

For example, when the data arrangement is an arbitrary arrangement, the processing contents at the nodes of the DAG can be decomposed and executed into Map processing and Reduce processing for all nodes. Alternatively, individual data load processing may be assigned, the calculation amount corresponding to the data amount may be calculated, and the nodes of the DAG may be arranged in each data processing device in the order in which the calculation amount is minimized. In addition, the method of determining the process execution order by the program analysis part 21 is not limited to these.

The process allocation unit 22 controls the data processing unit 65 of the data processing device 50a-i based on the process execution order information representing the process execution order determined by the program analysis unit 21, and actually processes the data. .

Next, functions related to time-sharing use of parallel data sets will be described with reference to FIG.

Referring to FIG. 4, the storage device 40a includes a lower layer data storage unit 45 and a data transfer unit 46.

The lower layer data storage unit 45 stores data to be processed.

The data transfer unit 46 copies or moves a part of the stored data to the data processing device 50a-i according to the data transfer command.

Referring to FIG. 4, the data processing device 50a-i includes an upper layer data storage unit 61, a distributed data discharge unit 62, a data reception unit 63, and a data processing unit 65.

The upper layer data storage unit 61 stores at least one processing target data.

The data receiving unit 63 stores the data transmitted from the storage device 40a in the upper layer data storage unit 61.

In response to the data processing discharge command, the distributed data discharge unit 62 outputs a part of the data in the upper data storage unit 61 and processing information representing information that can be restored at the time of processing of the current data processing unit 65 according to the data processing discharge command. Copy or move to the lower layer data storage unit 45 of the indicated storage device 40a.

When the distributed data discharge unit 62 is realized by, for example, the method described in Patent Document 4, it can discharge data without stopping the processing of the data processing device. Further, when the data processing unit 65 operates on a VM (Virtual Machine), the distributed data discharge unit 62 converts the process and data into recordable data using the VM migration function and the checkpoint function. You may transmit to the storage apparatus 40a.

Referring to FIG. 4, the scheduling apparatus 10a further includes a data allocation unit 23.

The data allocation unit 23 issues data transfer instructions to transfer data from the storage device 40a to the data processing device 50a-i. In addition, the data allocation unit 23 issues data processing discharge commands to discharge data from the data processing devices 50a-i to the storage device 40a. The data allocation unit 23 can exchange data between the upper layer data storage unit 61 and the lower layer data storage unit 45 by simultaneously issuing a data transfer command and a data processing discharge command.

At this time, according to the method described in Patent Document 3, when data is further transferred from the lower layer data storage unit 45 to the upper layer data storage unit 61, the data arrangement suitable for the next processing can be achieved.

FIG. 1 is a block diagram showing a configuration of a parallel data processing system according to the present embodiment. The parallel data processing system of FIG. 1 is a parallel data processing system that performs time division use of parallel data sets. In FIG. 1, the following information exchange is also described.

1, the scheduling apparatus 10 includes a program analysis unit 21, a process allocation unit 22, and a data allocation unit 23.

The program analysis unit 21 receives the data flow program, outputs the process execution order information to the process assignment unit 22, and outputs the process execution order information to the data assignment unit 23 and DAG structure information representing the DAG structure of the data flow program. Output.

The data allocation unit 23 performs data transfer instructions and data processing at a timing at which data can be efficiently exchanged between the upper layer data storage unit 61 and the lower layer data storage unit 45 based on the process execution order information and the DAG structure information. Output eject command.

For example, it is assumed that the processing execution order information includes that the processing in FIG. 5 is executed in the order of Load2, Filter1, Load1, Mining, Load3, Filter2, Statistical, JOIN. At this time, since the data allocation unit 23 can know that Load1 is performed after execution of Filter1, the data of Filter1 can be left in the upper layer data storage unit 61 as it is without replacing data at this timing. In addition, since the result of Mining3 is not used in Load3 and Filter2 after execution of Mining, the data allocation unit 23 discharges the Mining result data to the lower layer data storage unit 45 and transfers data3 to the upper layer data storage unit 61. Can do.

When the upper layer data storage unit 61 is configured by a DRAM or SSD capable of performing random access at high speed, and the lower layer data storage unit 45 is configured by an HDD capable of performing sequential access at high speed, such a case Based on the control, the system performance can be improved by transferring the data to be used in advance to the upper layer data storage unit 61 by sequential access. On the other hand, a high-capacity HDD that adopts a data grid configuration that consumes power according to the data capacity stored in the upper layer data storage unit 61 and consumes substantially constant power regardless of the data capacity stored in the lower layer data storage unit 45 Is adopted, the power consumption of the system can be reduced by reducing the data amount of the upper layer data storage unit 61 based on such control.

In addition, when the data allocation unit 23 uses a method of changing the data arrangement according to the processing trend, the upper layer data storage unit so that the data arrangement is suitable for the next processing based on the DAG structure information. Data can be placed in Examples of such a data arrangement changing method include hash partitioning used for hash join and a method described in Patent Document 3.

The parallel data processing system according to the present embodiment can improve system performance and resource utilization efficiency when the system is used in a time-sharing manner. This is because the data amount of the upper data storage unit 61 can be controlled by referring to the processing order in the data flow program.

(Embodiment 2)
A parallel data processing system according to the second embodiment will be described with reference to the drawings. In the present embodiment, by detecting the end of processing in the data processing apparatus, more efficient time division use of the parallel data processing system is realized. FIG. 6 is a block diagram showing the configuration of the parallel data processing system according to this embodiment.

Referring to FIG. 6, the data processing device 51 in the parallel data processing system according to the present embodiment includes a processing end detection unit 66 realized using the CPU 71 (see FIG. 2) in addition to the functions in the first embodiment. Also have.

The process end detection unit 66 detects that one process instructed by the process allocation unit 22 has ended, and outputs a process end signal. Here, one process refers to a process assigned to the data processing device 51 among processes corresponding to one node of the DAG structure. The process end detection unit 66 can be realized, for example, by providing a special interrupt instruction at the end of the program.

The data allocation unit 25 in the scheduling device 11 of the parallel data processing system according to the present embodiment further receives a process end signal. As in the first embodiment, the program analysis unit 21 receives the data flow program, outputs the process execution order information to the process allocation unit 22, and outputs the DAG structure information to the data allocation unit 25. DAG structure information representing the DAG structure of the data flow program is output.

The data allocation unit 25 is a timing at which data can be efficiently exchanged between the upper layer data storage unit 61 and the lower layer data storage unit 45 based on the input process execution order information, DAG structure information, and process end signal. The data transfer command and the data processing discharge command are output, and the data processing device identification information for identifying the data processing device 51 and the processing identification information for identifying the processing content are output to the data transfer unit 47 of the storage device 41.

By introducing the processing end detection unit 66, the data allocation unit 25 can know the processing end timing in each data processing device 51, and can perform the data replacement processing more efficiently.

For example, it is assumed that the processing execution order information includes that the processing in FIG. 5 is executed in the order of Load2, Filter1, Load1, Mining, Load3, Filter2, Statistical, JOIN. In addition, the data processing method is a MapReduce method. That is, all the data processing devices 51 hold the processing target data in an arbitrary arrangement, and the processing at each node is distributed to all the data processing devices 51.

In this case, the process Load1 can be performed sequentially from the data processing device 51 that has completed the process Filter1. In addition, the result data of the process filter 1 can be sequentially loaded from the data processing device 51 in which the process filter 2 has been completed, and can be prepared for the next process statistical.

Also, since it is possible to detect that the processing has been completed, it is possible to prevent data from being replaced during the processing. The data processing device 51 may hold intermediate data generated during the processing in addition to the processing target data during the processing. If data is exchanged in the middle of processing, it is necessary to exchange data including such intermediate data, which may increase the time required for data exchange processing. That is, since the end of processing can be detected, replacement of intermediate data can be avoided, resource utilization efficiency in the system can be improved, and processing time can be shortened.

In addition to the processing end detection unit 66, when a mechanism capable of accessing the discharged data is introduced to the distributed data discharge unit 62, the program analysis unit 21 stores all the data in the upper layer data storage unit 6. It can be determined that it exists, and a large amount of data can be stored in the lower layer data storage unit 45. At this time, many of the conventional implementations that do not support time-division use can be used as the program analysis unit 21.

For example, a method described in Patent Document 4 can be cited as a mechanism for making it possible to access discharged data.

FIG. 7 is a block diagram showing another configuration of the parallel data processing system according to the present embodiment. Referring to FIG. 7, in this parallel data processing system, the storage apparatus 42 has a plurality of lower layer data storage units 45. At this time, each data processing device 52 can individually access each lower layer data storage unit 45.

The data transfer capability of the lower layer data storage unit 45 is limited according to the performance of the device and the network. Therefore, when a large number of upper layer data storage units 61 exchange data with the same lower layer data storage unit 45, the data transfer amount is limited by the data transfer capability, and the data transfer rate per data processing device 52 decreases. Sometimes.

Therefore, the data allocation unit 26 receives the processing end signal from the processing end detection unit 66 and transmits the data processing discharge command so that the resources of the storage device 42 are used more evenly when the data processing discharge command is transmitted. The allocation is calculated, and information for identifying the lower layer data storage unit 45 based on the calculated allocation is transmitted as the lower layer data storage unit identification information.

As the allocation method, for example, a round robin method can be used. That is, a lower layer data storage unit list in which the lower layer data storage unit 45 is arranged in an arbitrary order is held, and each time a process end detection unit is received, the lower layer data storage unit 45 is assigned in order from the top of the lower layer data storage unit list. May be. Further, as another allocation method, the transfer time on the network may be taken into consideration, and the allocation may be performed in order from the lower layer data storage unit 45 close to the data processing device 52. The allocation method is not limited to these.

At this time, the distributed data discharge unit 67 of the data processing device 52 discharges the data to the designated lower layer data storage unit 45.

According to such a configuration, data can be replaced by using more data transfer capability of the storage apparatus 42, and the data processing capability of the system can be improved.

(Embodiment 3)
A parallel data processing system according to a third embodiment will be described with reference to the drawings. In this embodiment, when using some but not all of the data processing devices 50, time-sharing usage is realized, which is more efficient. FIG. 8 is a block diagram showing the configuration of the parallel data processing system according to this embodiment.

In the parallel data processing system according to the present embodiment, when the program analysis unit 27 assigns the processing of each node of the DAG to the data processing device 50, the program analysis unit 27 does not necessarily assign all the data processing devices 50 but also some data. You may make it allocate to the processing apparatus 50. FIG. In addition, the program analysis unit 27 further outputs a data processing device assignment list indicating to which data processing device 50 the processing corresponding to each node of the DAG is assigned.

FIG. 10 is a diagram showing, as an example, a data processing device allocation list in the scheduling device 13 of the parallel data processing system according to the present embodiment. The list in FIG. 10 is merely an example, and the format of the list is not limited to this.

The data allocation unit 28 uses the data processing device allocation list as an input in addition to the processing execution order information and the DAG structure information, and uses it to generate a data transfer command and a data processing discharge command. The data allocation unit 28 can generate an instruction for more efficiently handling the resources in the system by using the data processing device allocation list.

As an example, when the processing of the data flow program shown in FIG. 5 is performed by the data processing device shown in the data processing device assignment list shown in FIG. 10, it is assumed that the processing Statistical is executed after the processing Filter2. Referring to FIG. 10, the process Filter 2 is executed by the data processing apparatuses 6 and 7, and the process Statistical is executed by the data processing apparatuses 1 to 7. Further, it is assumed that the processing Mining is executed by the data processing devices 2 to 5 before the processing Filter 2.

At this time, by using the information of the data processing device allocation list, the result of the processing Mining is transferred to the storage device 40, and the output data of the processing Filter 1 used for the input of the processing Mining is left as it is in the data processing devices 2 to 5. The process Statistical can be executed using the output of the Filter 2 and the output of the process Filter 1. As a result, the amount of data transfer can be reduced, and system resources can be used more efficiently.

FIG. 9 is a block diagram showing another configuration of the parallel data processing system according to the present embodiment. Referring to FIG. 9, the data processing device 53 further includes a power saving control unit 67.

The power saving control unit 67 changes the processing capability and power consumption of the data processing device 53 according to the power control command. Generally, the power consumption increases as the processing capacity is increased.

As power control by the power saving control unit 67, for example, it is conceivable to turn on / off the power of the data processing device 53 in accordance with a power control command. As another example of power control, ACPI (Advanced Configuration and Power Interface) is used to control the number of rotations of the HDD that constitutes the data processing device, and to control the operating frequency of the CPU. Can be used. Note that the power saving control method is not limited thereto.

Referring to FIG. 9, the data allocation unit 29 outputs a power control command to the power saving control unit 67 of the data processing device 53. At this time, the data allocation unit 29 can efficiently control the power used by the system while maintaining the performance of the system.

When the processing of the data flow program shown in FIG. 5 is performed in the data processing devices shown in the data processing device allocation list shown in FIG. 10, after the processing Statistical is performed in the data processing devices 1 to 7, Processing JOIN is performed in the data processing devices 3 and 5. Therefore, the

data processing devices

1, 2, 4, 6, and 7 are not used when the processing JOIN is started. Therefore, the

data processing devices

1, 2, 4, 6, and 7 are saved in the storage device 40, and the

data processing devices

1, 2, 4, 6, and 7 are controlled to reduce the power. The power consumption of the entire system can be reduced without affecting the performance of the processing performed in steps 3 and 5.

(Embodiment 4)
A parallel data processing system according to a fourth embodiment will be described with reference to the drawings. In the fourth embodiment, the scheduling apparatus has a plurality of program analysis units, and realizes more efficient time division use. FIG. 11 is a block diagram showing the configuration of the parallel data processing system according to this embodiment.

Referring to FIG. 11, the scheduling device 16 includes a program analysis unit A31 and a program analysis unit B32. The program analysis unit A31 receives the data flow program A and outputs process execution order information and DAG structure information for the data flow program A. Similarly, the program analysis unit B32 receives the data flow program B and outputs process execution order information and DAG structure information for the data flow program B. At this time, each of the program analysis units A31 and B32 performs processing assignment on the assumption that the data processing device 50 is exclusively used.

For example, a single parallel data processing system may be shared among multiple applications. As another example, a single parallel data processing system may be shared between applications of different companies.

The data allocation unit 35 receives a plurality of process execution order information and DAG structure information output from the plurality of program analysis units A31 and B32. The data allocation unit 35 determines a process issuance order across a plurality of DAG structures based on a plurality of process execution order information and DAG structure information.

The data allocation unit 35 transmits a data processing discharge command and a data transfer command as in the first to third embodiments, and identifies a processing identifier for processing content targeted for data loaded into the upper layer data storage unit 61 next Is transmitted to the process allocation unit 33.

The process allocation unit 33 selects a process based on the process execution order information output from each of the plurality of program analysis units A31 and B32 and the process identifier output from the data allocation unit 35, and processes the process. Is issued to each data processing unit 65.

As an example, the data flow program K1 and the data flow program K2 each have the DAG structure shown in FIG. It is decided that

As a first data allocation method, allocation may be performed for each node of the data flow programs K1 and K2. That is, K1. Load2, K2. Load2, K1. Filter1, K2. By assigning them in the order of Filter1,..., Each data flow program K1, K2 can use the parallel data processing system equally.

As a second data allocation method, switching may be performed in units that reduce the amount of data exchange. That is, K1. (Load2, Filter1), K2. (Load2, Filter1), K1. (Load1, Mining), K2. (Load1, Mining),... At this time, the number of data exchanges can be reduced.

As a third data allocation method, the data flow programs K1 and K2 may be switched in a predetermined time unit. For example, even if the process is halfway after 5 seconds, the process of K1 is discharged and the process of K2 is loaded, and after 5 seconds, the process of K2 is discharged and the continuation of the process of K1 is loaded. The switching may be repeated. At this time, even if the data flow program K1 requires a long time for processing, the processing of the data flow program K2 can proceed in parallel without being affected by this.

Note that these data allocation methods are merely examples, and the present invention is not limited to these.

According to the parallel data processing system of the present embodiment, a plurality of data flow programs can be executed using a single parallel data processing system in a time-sharing manner.

(Embodiment 5)
A parallel data processing system according to a fifth embodiment will be described with reference to the drawings. FIG. 12 is a block diagram showing the configuration of the parallel data processing system according to this embodiment. In this embodiment, the data allocation unit 37 can change the processing execution order determined by the program analysis unit 36.

The program analysis unit 36 outputs the first process execution order information and the DAG structure information to the data allocation unit 37.

If the data allocation unit 37 determines from the information such as the data storage method of the storage apparatus 40 that the process execution order should be changed, the data allocation unit 37 programs a process execution order change request indicating the changed process execution order information. The data is output to the analysis unit 36.

When the program analysis unit 36 receives a processing execution order change request, the program analysis unit 36 examines the request and generates new second processing execution order information. The time division use of the parallel data processing system is performed based on the second processing execution order information.

For example, when the data sets used by the process A and the process B are the same, or when reading data used by the process A and the process B at the same time is efficient in terms of device characteristics of the lower layer data storage unit 45, the process A When the execution order of and the execution order of the process B are separated, it is preferable to change the process order. In this case, the data allocation unit 37 changes the processing order so that the process B is executed subsequent to the execution of the process A, and the data sets used in the process A and the process B are simultaneously stored in the upper data storage unit. The data processing device 50 and the storage device 40 are controlled so as to be arranged in 61. Thereby, the performance of a parallel data processing system can be improved. In addition, when a processing order is changed, it is not limited to this.

The parallel data processing system according to the present invention can be applied to parallel database systems, parallel data processing systems, distributed storage, parallel file systems, distributed databases, data grids, and cluster computers. In particular, according to the parallel data processing system according to the present invention, the parallel data processing system can be efficiently used in a time division manner by a plurality of processes over a plurality of applications or a plurality of processes in a single application.

In the present invention, at least the forms listed in the following supplementary notes are included.
(Supplementary Note 1) Referring to the first data flow program, the first directed acyclic graph (DAG) representing the first data flow program and the nodes of the first DAG First processing execution order information representing the execution order of the processing to be performed, and referring to the second data flow program, the second DAG representing the second data flow program, and the second A program analysis unit for generating second process execution order information indicating the execution order of processes corresponding to nodes of the DAG;
A process allocation unit that allocates a process corresponding to a node of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information; ,
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A scheduling apparatus comprising: a data allocation unit that replaces data used for processing between a storage unit of a storage device and a storage unit provided in each of the plurality of data processing devices.

(Supplementary Note 2) The data allocation unit allocates data used for processing corresponding to the nodes of the first DAG to the plurality of data processing devices, and processing corresponding to the nodes of the second DAG. The scheduling apparatus according to appendix 1, wherein the allocation of data used in the process is alternately repeated every time each process is completed.

(Supplementary Note 3) The data allocation unit allocates data to be used for a plurality of processes corresponding to a plurality of nodes of the first DAG and a plurality of the second DAGs to the plurality of data processing devices. The scheduling apparatus according to appendix 1, wherein allocation of data used for a plurality of processes corresponding to the nodes is alternately repeated every time the plurality of processes are completed.

(Supplementary Note 4) The data allocating unit performs processing corresponding to a node of one of the first DAG and the second DAG with respect to data allocated to the plurality of data processing devices. The

supplementary note

2 or 3, characterized in that if it does not end before the lapse of a predetermined period, data used for processing corresponding to a node of the other DAG is assigned to the plurality of data processing devices. Scheduling device.

(Supplementary Note 5) The data allocation unit generates a process identifier for identifying a process targeting data allocated to the plurality of data processing devices,
The scheduling apparatus according to any one of appendices 1 to 4, wherein the process allocation unit allocates a process to the plurality of data processing apparatuses with reference to the process identifier.

(Supplementary Note 6) In the data allocation unit, an execution order for executing the second process after the first process is included in at least one of the first process execution order information and the second process execution order information. When the data used for the second processing is recorded in the storage unit of the storage device, the first processing is terminated from any one of the plurality of data processing devices. When the signal indicating that it has been received is received, the data used for the first processing is transferred from the storage unit of the data processing device to the storage unit of the storage device, and the data used for the second processing is transferred The scheduling apparatus according to any one of appendices 1 to 5, wherein the scheduling apparatus causes the storage unit of the storage apparatus to transmit to the storage unit of the data processing apparatus.

(Supplementary note 7) When the storage device includes a plurality of storage units, the data allocation unit receives data used for processing corresponding to the nodes of the first DAG or the second DAG. The scheduling apparatus according to appendix 6, wherein the data processing apparatus is notified of which one of the storage sections is to be output.

(Supplementary Note 8) The data allocating unit is configured to select a storage unit to output data used for processing corresponding to a node of the first DAG or the second DAG, from among the plurality of storage units. The scheduling apparatus according to appendix 7, wherein the scheduling is notified to the plurality of data processing apparatuses so as to make a selection based on the data.

(Additional remark 9) The said program analysis part is a data processing which shows to which data processing apparatus among the said several data processing apparatuses the process corresponded to the node of said 1st DAG and said 2nd DAG Output device allocation list,
The data allocation unit stores data used for processing corresponding to the nodes of the first DAG and the second DAG in a storage unit of the storage device and a data processing device allocation list included in the data processing device allocation list. The scheduling apparatus according to any one of appendices 1 to 8, wherein the scheduling apparatus is exchanged between the first and second sections.

(Additional remark 10) The said data allocation part notifies the data processing apparatus which is not contained in the said data processing apparatus allocation list among these data processing apparatuses so that power consumption may be reduced, It is characterized by the above-mentioned. The scheduling apparatus according to appendix 9.

(Additional remark 11) The said data allocation part further refers to the information which shows the performance of the said storage apparatus, The execution order contained in the said 1st process execution order information and / or the said 2nd process execution order information If it is determined whether or not to change, and if it is determined to be changed, a change request including the change contents is output to the program analysis unit,
The program analysis unit changes the first process execution order information and / or the second process execution order information in response to the change request, and outputs the change to the process allocation unit and the data allocation unit The scheduling apparatus according to any one of appendices 1 to 10, wherein:

(Supplementary note 12) The scheduling apparatus according to any one of supplementary notes 1 to 11,
A parallel data processing system comprising the plurality of data processing devices and / or the storage device.

(Additional remark 13) The computer refers to the first data flow program, the first directed acyclic graph (DAG) representing the first data flow program, and the first DAG Generating first process execution order information representing the execution order of processes corresponding to nodes, referring to a second data flow program, a second DAG representing the second data flow program, and Generating second process execution order information representing an execution order of processes corresponding to nodes of the second DAG;
A process allocating step of allocating processes corresponding to nodes of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information; ,
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A data allocation step of exchanging data used for processing between a storage unit of a storage device and a storage unit provided in each of the plurality of data processing devices.

(Supplementary Note 14) In the data allocation step, for the plurality of data processing devices, allocation of data used for processing corresponding to the nodes of the first DAG and processing corresponding to the nodes of the second DAG 14. The scheduling method according to appendix 13, wherein the allocation of data used for the process is alternately repeated every time each process is completed.

(Supplementary Note 15) In the data allocation step, allocation of data used for a plurality of processes corresponding to a plurality of nodes of the first DAG and a plurality of the second DAGs to the plurality of data processing devices. 14. The scheduling method according to appendix 13, wherein allocation of data used for a plurality of processes corresponding to the nodes is alternately repeated every time the plurality of processes are completed.

(Supplementary Note 16) In the data allocation step, a process corresponding to a node of one of the first DAG and the second DAG with respect to data allocated to the plurality of data processing devices, The supplementary item 14 or 15, wherein when not completed before the elapse of a predetermined period, data used for processing corresponding to a node of the other DAG is allocated to the plurality of data processing devices. Scheduling method.

(Supplementary Note 17) In the data allocation step, a process identifier for identifying a process targeted for data allocated to the plurality of data processing devices is generated,
The scheduling method according to any one of appendices 13 to 16, wherein, in the process assignment step, a process is assigned to the plurality of data processing devices with reference to the process identifier.

(Supplementary Note 18) Referring to the first data flow program, the first directed acyclic graph (DAG) representing the first data flow program and the nodes of the first DAG First processing execution order information representing the execution order of the processing to be performed, and referring to the second data flow program, the second DAG representing the second data flow program, and the second A process of generating second process execution order information representing an execution order of processes corresponding to nodes of the DAG;
A process of allocating a process corresponding to a node of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information;
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A program for causing a computer to execute processing for replacing data used for processing between a storage unit of a storage device and a storage unit provided in each of the plurality of data processing devices.

In the frame of the entire disclosure (including claims) of the present invention, the embodiment can be changed and adjusted based on the basic technical concept. Various combinations and selections of various disclosed elements are possible within the scope of the claims of the present invention. That is, the present invention of course includes various variations and modifications that could be made by those skilled in the art according to the entire disclosure including the claims and the technical idea.

10, 10a, 11, 12, 13, 15, 16, 17

Scheduling device

21, 27, 36

Program analysis unit

22, 33

Processing allocation unit

23, 25, 26, 28, 29, 35, 37 Data allocation unit 31 Program analysis Part A
32 Program analysis part B
40, 40a, 41, 42 Storage device 45 Lower layer

data storage unit

46, 47

Data transfer unit

50, 50a-1 to 50a-n, 51, 52, 53 Data processing unit 61 Upper layer

data storage unit

62, 67 Distributed data output unit 63 Data receiving unit 65 Data processing unit 66 Processing end detection unit 68 Power saving control unit 70 User devices 71, 81, 91 CPU
72, 82, 92

Data storage unit

73, 83, 93 Data transfer unit 80 Network

Claims

Referring to the first data flow program, a first directed acyclic graph (DAG) representing the first data flow program and execution of a process corresponding to a node of the first DAG Generating first processing execution order information representing the order, and referring to the second data flow program, the second DAG representing the second data flow program, and the nodes of the second DAG A program analysis unit that generates second process execution order information representing the execution order of the corresponding processes;
A process allocation unit that allocates a process corresponding to a node of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information; ,
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A scheduling apparatus comprising: a data allocation unit that replaces data used for processing between a storage unit of a storage device and a storage unit provided in each of the plurality of data processing devices.
The data allocating unit allocates data used for processing corresponding to the nodes of the first DAG and data used for processing corresponding to the nodes of the second DAG to the plurality of data processing devices. The scheduling apparatus according to claim 1, wherein the allocation is alternately repeated every time each process is completed.
The data allocating unit corresponds to the plurality of data processing devices, the allocation of data used for a plurality of processes corresponding to the plurality of nodes of the first DAG, and the plurality of nodes of the second DAG. The scheduling apparatus according to claim 1, wherein allocation of data used for a plurality of processes is alternately repeated every time each of the plurality of processes ends.
The data allocating unit performs processing corresponding to a node of one of the first DAG and the second DAG for data allocated to the plurality of data processing devices in a predetermined period. 4. The scheduling apparatus according to claim 2, wherein if the processing is not completed before the lapse of time, data used for processing corresponding to a node of the other DAG is allocated to the plurality of data processing apparatuses.
The data allocation unit generates a process identifier for identifying a process targeted for data allocated to the plurality of data processing devices;
The scheduling apparatus according to claim 1, wherein the process allocation unit allocates a process to the plurality of data processing apparatuses with reference to the process identifier.
The data allocation unit includes an execution order for executing a second process after the first process in at least one of the first process execution order information and the second process execution order information. A signal indicating that the first processing has been completed from any one of the plurality of data processing devices in a case where data used for the second processing is recorded in the storage unit of the storage device Is received, the data used for the first processing is transferred from the storage unit of the data processing device to the storage unit of the storage device, and the data used for the second processing is transferred to the storage device. The scheduling apparatus according to claim 1, wherein the scheduling apparatus causes the storage section to transmit to the storage section of the data processing apparatus.
The program analysis unit has a data processing device allocation list indicating to which data processing device of the plurality of data processing devices the processing corresponding to the nodes of the first DAG and the second DAG is allocated. Output,
The data allocation unit stores data used for processing corresponding to the nodes of the first DAG and the second DAG in a storage unit of the storage device and a data processing device allocation list included in the data processing device allocation list. The scheduling apparatus according to any one of claims 1 to 6, wherein the scheduling apparatus is exchanged with each other.
Whether the data allocation unit should further refer to information indicating the performance of the storage device and change the execution order included in the first process execution order information and / or the second process execution order information. If it is determined that it should be changed, a change request including the change contents is output to the program analysis unit,
The program analysis unit changes the first process execution order information and / or the second process execution order information in response to the change request, and outputs the change to the process allocation unit and the data allocation unit The scheduling apparatus according to claim 1, wherein
The computer refers to the first data flow program and corresponds to a first directed acyclic graph (DAG) representing the first data flow program and a node of the first DAG. Generating first process execution order information representing a process execution order, referring to a second data flow program, a second DAG representing the second data flow program, and the second DAG; Generating second process execution order information representing an execution order of processes corresponding to the nodes;
A process allocating step of allocating processes corresponding to nodes of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information; ,
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A data allocation step of exchanging data used for processing between a storage unit of a storage device and a storage unit provided in each of the plurality of data processing devices.
Referring to the first data flow program, a first directed acyclic graph (DAG) representing the first data flow program and execution of a process corresponding to a node of the first DAG Generating first processing execution order information representing the order, and referring to the second data flow program, the second DAG representing the second data flow program, and the nodes of the second DAG A process of generating second process execution order information representing an execution order of the corresponding process;
A process of allocating a process corresponding to a node of the first DAG and the second DAG to a plurality of data processing devices based on the first process execution order information and the second process execution order information;
Referring to the first DAG, the second DAG, the first process execution order information, and the second process execution order information, they correspond to the nodes of the first DAG and the second DAG. A program for causing a computer to execute processing for replacing data used for processing between a storage unit of a storage device and a storage unit provided in each of the plurality of data processing devices.