WO2005116832A1 - 分散処理環境におけるジョブの実行を制御するためのコンピュータシステム、方法及びプログラム - Google Patents
分散処理環境におけるジョブの実行を制御するためのコンピュータシステム、方法及びプログラム Download PDFInfo
- Publication number
- WO2005116832A1 WO2005116832A1 PCT/JP2005/009350 JP2005009350W WO2005116832A1 WO 2005116832 A1 WO2005116832 A1 WO 2005116832A1 JP 2005009350 W JP2005009350 W JP 2005009350W WO 2005116832 A1 WO2005116832 A1 WO 2005116832A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- resource
- job
- computer system
- network
- grid
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
Definitions
- the present invention relates to grid computing, and more particularly, to a method for controlling a plurality of grid computing systems in an integrated manner and a system configuration thereof.
- a distributed processing system is a system that manages a large number of various computer resources connected to a network as one group, and performs load distribution and scheduling.
- the distributed processing system group means a group of distributed processing systems that exist on a wide area network (distributed network).
- a distributed processing system will be described as an individual grid computing system (hereinafter, abbreviated as a dalit system) in a grid computing environment in which resources are virtualized.
- Non-Patent Document 1 for the centralized scheme and the hierarchical scheme
- Non-Patent Document 2 for the distributed scheme
- FIG. 13 is a diagram schematically showing a system configuration based on a centralized scheme.
- the meta-scheduler of the center server that manages the entire grid system group collects the blue alerts of all grid systems, and the meta-scheduler makes the scheduling e-mail decision.
- the scheduling decision is not made, but the meta-scheduler of the center server also executes the submitted job, and completes the job and the resource status (such as an empty processor).
- the resource status such as an empty processor.
- a procedure on the center server is required to reflect the changes in the scheduling by the meta scheduler.
- FIG. 14 is a diagram schematically showing a system configuration according to a hierarchical scheme.
- the scheduling process is shared between the center server and each local site. After submitting a job from the center server's metascheduler to the local site's scheduler, the metascheduler does not need to be directly involved in the job. Each job will be executed at each local site to which the job was sent, even if there is free space at other local sites after submitting the job.
- FIG. 15 is a diagram schematically showing a system configuration based on a distributed scheme.
- every site has a meta-scheduler. Jobs are submitted and scheduled to the meta-scheduler at each oral site. Since all sites have a meta-scheduler, if a certain job is scheduled at a predetermined local site, and if there is free space at another local site, the job is executed at the other local site. Can be rescheduled.
- the meta scheduler at each local site must have the same information, and the Information such as load status is exchanged at any time or periodically.
- Non-Patent Document 1 Chris Smith, “Open Source Metascheduling for Virtual Organizations with the Community Scheduler Framework (CSF)", Technical Whitepaper, Platform Computing Inc. August 2003.
- CSF Community Scheduler Framework
- Non-Patent Document 2 Vi jay Subramani, ⁇ Distributed Job Scheduling on Computational Grids using Multiple simultaneous Requests, IEEE International Symposium on High Performance Distributed Computing (HPDC 2002), 2002.
- the information such as the load status of each local site must be exchanged as needed or periodically between the meta schedulers of all the local sites, so that the network load increases. I will.
- the present invention provides a scalable system capable of easily responding to a change in system configuration and expansion / reduction of scale in a wide-area distributed system that integrates and uses computer resources existing in a grid system group. It is intended to realize the configuration.
- Another object of the present invention is to reduce network load while exchanging information between local sites in order to execute a job efficiently.
- Still another object of the present invention is to realize a wide-area distributed system in which grid systems are integrated without increasing the development cost of the system.
- the present invention is implemented as a network that performs distributed processing by a group of computer systems (groups of grid systems) connected via a network. That is, in this distributed processing environment, each grid system assigns a job in information processing to a computer resource on the network and issues a request for execution. And a computer resource (local resource) such as a process server that actually executes a job in the own system. Each grid server allocates a job to a resource unit including each local resource and another grid system on the network, and a scheduler for requesting execution of the job, and a scheduler between the scheduler and the resource unit. And an agent (resource agent) that relays the communication.
- the agent is a software module that manages information on resource means, receives a job execution request from a scheduler on behalf of the resource means assigned to the job, and responds according to the status of the resource means. The job execution request is sent to the resource means.
- an agent is provided separately for each local resource and each other grid system (network resource) adjacent to (directly connected to) the system on the network. , Each of which makes a job execution request in an individual communication format set with the corresponding resource means.
- the agent corresponding to the local resource also acquires and manages the local resource power to obtain information on the capability and operation status, and the agent corresponding to another grid system (network resource) uses the grid system to execute the job. It also obtains and manages grid server power in other grid systems that uses information on resource capabilities that can be provided for requests. Then, the scheduler allocates a job to the resource means based on each information managed by the resource agent.
- the grid server includes a resource capability information acquisition unit that acquires information on resource capabilities that can be provided by its own system in response to a job execution request from the outside, the resource capability information acquisition unit, and another grid on the network.
- a resource capability information notifying unit for notifying the agent of information on the available resource capability acquired by the resource capability information unit in response to an inquiry from an agent of the grid server in the system; be able to.
- the scheduler calculates the available resource capacity based on the information on the resource capacity obtained from the agent in the own system. Then, the information on the resource capability that can be provided is passed to the resource capability information acquisition unit.
- the interface means of the grid server receives a job execution request received from an agent of the grid server in another grid system on the network, and the job reception unit receives the job execution request. And a job execution requesting unit that transfers the job according to the execution request to the scheduler and requests assignment and execution of the job.
- a computer system uses a local resource by an interface module provided for each of the local resources included in its own system and other grid systems (network resources) on the network. Acquiring information on the capability and operation status of the network resource from the local resource, and acquiring and managing information on the resource capability that can be provided by the network resource from the grid server in the network resource; and Allocating a job to resource means including local resources and network resources based on these pieces of information, and issuing a job execution request to the resource means to which the job is assigned. Comprising the steps of rows, the. Further, the interface module includes a step of temporarily holding the issued job execution request and transmitting the issued job execution request to a powerful resource unit according to the operation state of the resource unit to which the job is assigned.
- the present invention is also realized as a program that controls a computer to realize the functions of the grid server described above, or a program that causes a computer to execute processing corresponding to each step of the above-described job execution control method.
- This program is provided by storing and distributing it on a magnetic disk, optical disk, semiconductor memory, or other recording medium, or distributing it via a network.
- the grid server of each grid system constituting the wide area distributed system is connected via the agent, and the agent has information on the grid system in charge.
- the agent has information on the grid system in charge. Therefore, it is possible to realize a scalable system configuration that can easily cope with a change in the size of the system configuration change and the scale of the change.
- each grid system since each grid system has information of another grid system adjacent on the network, information of the entire grid system group is shared by each grid system as a result. Network loads that do not require frequent information exchange between grid systems can be reduced.
- FIG. 1 is a diagram showing an overall configuration of a wide area distributed system according to the present embodiment.
- the wide area distributed system of the present embodiment integrates a group of grid systems connected to a wide area network such as the Internet so that computer resources in each grid system can be used mutually.
- Each grid system is a distributed processing system that manages a large number of diverse computer resources connected to the network as a group by using grid computing technology, and performs load distribution and scheduling.
- each grid system does not have a subordinate relationship, and operates in parallel with an equal relationship.
- other grid systems adjacent on the network can be treated in the same manner as local resources (computer resources) in the own system, and can request execution of a job.
- adjacent on a network means a relationship between grid systems that can directly exchange data via the network.
- the local resource refers to a computer resource such as a process server that actually executes a job assigned in grid computing.
- a network scheme Network Scheme
- FIG. 2 is a diagram showing a configuration of each grid system constituting the wide area distributed system of FIG.
- the grid system includes a grid server (GS) 100 for allocating (scheduling) a job and a local server for actually executing a job according to the allocation by the grid server 100.
- Process server (PS) 200 as a resource.
- the grid server 100 is also connected to the grid server 100 of another grid system.
- the grid server 100, the process server 200, and the grid servers 100 of a plurality of Dardell systems are connected to each other via the Internet or another computer network.
- This computer network may be of any type, such as a communication protocol or a wired or wireless communication, and may have a firewall or other access restrictions.
- the grid server 100 of the present embodiment includes an interface module called an agent, and the grid server of the process server 200 or another grid system through the agent. Connect to server 100.
- the framework of the powerful device connection is hereinafter referred to as an agent framework (Agent Framework).
- FIG. 3 is a diagram schematically showing an example of a hardware configuration of a computer device suitable for realizing the grid server 100 and the process server 200 in the present embodiment.
- the computer device shown in FIG. A CPU (Central Processing Unit) 11 which is an arithmetic means, an MZB (mother board) chipset 12 and a main memory 13 connected to the CPU 11 via a CPU bus, and an MZB chipset 12 and an AGP ( Video card connected to CPU11 via Accelerated Graphics Port) 1
- CPU Central Processing Unit
- MZB mother board
- a magnetic disk device (HDD) 15 connected to the M / B chipset 12 via a PCI (Peripheral Component Interconnect) bus, a network interface 16, and a bridge circuit 17 and an IS A (Industry Standard Architecture) Bus It has a flexible disk drive 18 and a keyboard Z mouse 19 connected to the M / B chipset 12 via a low-speed bus such as.
- PCI Peripheral Component Interconnect
- network interface 16 Peripheral Component Interconnect
- IS A Industry Standard Architecture
- FIG. 3 merely shows an example of a hardware configuration of a computer device that realizes the present embodiment, and various other configurations can be adopted as long as the present embodiment is applicable.
- the video card 14 instead of providing the video card 14, only the video memory may be mounted and the CPU 11 may process the image data, or the external storage device may be an ATA (AT Attachment
- CD-R Compact Disc Recordable
- DVD-RAM Digital Versatile Disc Random Access Memory
- FIG. 4 is a diagram showing a functional configuration of the grid server 100 in the present embodiment.
- the grid server 100 manages a process server 200 and a scheduler 110 for allocating (scheduling) a job to each process server 200 which is a local resource in its own system.
- a resource agent 120 that relays transmission and reception, and a grid server resource agent 'interface (hereinafter referred to as “GS agent' interface”) 130 that allows the system to operate as if it were a resource of another grid system.
- the resource agent 120 is provided for each process server 200 and each other grid system (network resource) adjacent on the network. Then, the scheduler 110 accesses each process server 200 and the grid server 100 of another grid system via the resource agent 120.
- the scheduler 110 is realized by, for example, a program-controlled CPU 11 and storage means such as the main memory 13 and the magnetic disk device 15 shown in FIG. 3, and as a specific function, as shown in FIG. It comprises a resource capability inquiry response unit 111, a resource capability acquisition unit 112, a job reception unit 113, an optimal resource selection unit 114, and a job requesting unit 115.
- the resource capability inquiry / response unit 111 calculates and replies the available resource capability of the own system in response to an external capability inquiry (resource capability acquisition request) input via the GS agent interface 130.
- the resource capacity that can be provided depends on the resource The calculation is based on the information acquired by the source capability acquisition unit 112. Also, the resource capacity that can be provided can be changed according to the resource capacity provision target.
- the resource capacity acquisition unit 112 inquires the resource capacity available to the own system from each process server 200 and the resource agent 120 corresponding to another grid system adjacent on the network, and obtains information (hereinafter, these resources are referred to as the resource capacity acquisition unit 112).
- the process server 200 which is recognized as a local resource of the system, and the other grid systems, which are recognized as network resources, are collectively referred to as resource means).
- the acquired information includes static information on the original processing capacity and storage capacity of the storage device in the own system, and dynamic information based on the real-time load situation.
- the job receiving unit 113 receives a job execution request from an external computer system (for example, a client) or the GS agent 'interface 130.
- the optimal resource selecting unit 114 selects the resource means optimal for the execution according to the job based on the information of the resource capability acquired by the resource capability acquiring unit 112, and allocates the job.
- the logic for optimization in job assignment is arbitrary!
- the job requesting unit 115 issues a request to execute a job to the resource agent 120 corresponding to the resource means selected by the optimal resource selecting unit 114.
- the resource agent 120 relays communication between the scheduler 110 and available resource means, and receives a job execution request from the scheduler 110 on behalf of these resource means. Therefore, the inquiry destination of the resource capacity acquisition unit 112 and the issue destination of the request of the job requesting unit 115 become the resource agent 120!
- scheduler 110 Other functions of the scheduler 110 are not different from the existing scheduler. Also, the difference in the communication format between the grid server 100 and the individual process server 200 or the grid server 100 of another grid system is absorbed by the setting in the resource agent 120, and the scheduler 110 itself issues the request. There is no need to consider the differences in communication formats. Therefore, a scheduler that is used in an existing grid system can be used as the scheduler 110.
- the resource agent 120 is, for example, a program-controlled CPU 11 shown in FIG. And the storage means such as the main memory 13 and the magnetic disk device 15 and the network interface 16, and as its specific functions, as shown in FIG. 4, a resource status management unit 121, a resource capacity management unit 122, a job It includes a receiving unit 123 and a job requesting unit 124.
- the resource status management unit 121 accesses the corresponding resource means and grasps the current operation status in the corresponding process server 200 (local resource) or grid system (network resource).
- the resource capacity management unit 122 manages statistical information and the like regarding the job execution capacity of the corresponding resource means, and returns the managed information in response to an inquiry from the resource capacity acquisition unit 112 of the scheduler 110.
- the statistical information and the like relating to the job execution ability are simply dynamic information such as time variation of the load on the CPU and operation tendency, which are not merely based on the static information of the processing capacity of the CPU itself and the storage capacity of the storage device. Includes information obtained by statistical processing.
- Such resource information managed by the resource status management unit 121 and the resource capacity management unit 122 is obtained from the resource means supported by the resource agent 120, and is stored in the main memory 13 and the magnetic disk device 15 shown in FIG. In the storage means.
- the job receiving unit 123 receives a job execution request issued from the job requesting unit 115 of the scheduler 110.
- the job requesting unit 124 transmits the job execution request received by the job receiving unit 123 to the corresponding resource unit.
- the GS agent 'interface 130 is realized by, for example, a program-controlled CPU 11 and storage means such as a main memory 13 and a magnetic disk device 15 and a network interface 16 as shown in FIG. As shown in FIG. 4, a resource capability information acquiring unit 131, a resource capability information notifying unit 132, a job receiving unit 133, and a job execution requesting unit 134 are provided.
- the GS agent 'interface 130 is a function for making the grid system available to other grid systems on the network as well as the local resources of the other grid system.
- the grid server 100 receives a request from the grid server 100 of another grid system. It can receive and return the result of executing the job using the resource means available to its own system.
- the resource capability information acquiring unit 131 inquires of the scheduler 110 and acquires information on the resource capability of the own system (resource information) that can be provided in response to an external powerful job execution request.
- the resource capability information notification unit 132 notifies the resource information acquired by the resource capability information acquisition unit 131 to the transmission source of the resource capability acquisition request in response to the received resource capability acquisition request.
- the resource agent 120 is notified.
- the resource status management unit 121 and the resource capacity management unit 122 receive this notification, and store and manage the storage in the storage device such as the main memory 13 or the magnetic disk device 15.
- the notification from the resource capability information notification unit 132 to the grid server 100 may be performed periodically, or may be performed when the operation status of the own system is changed. Also, the resource agent 120 of the grid server 100 may make an inquiry to the corresponding grid server 100 of another grid system at an arbitrary timing.
- the job receiving unit 133 receives the job execution request transmitted from the job requesting unit 124 of the resource agent 120 of the grid server 100 in another grid system.
- the job execution requesting unit 134 requests the scheduler 110 to schedule and execute the job received by the job receiving unit 133.
- FIG. 5 is a diagram showing the relationship between the functional configuration of the process server 200 and the resource agent 120 of the grid server 100.
- the process server 200 is a process server resource agent.interface (hereinafter referred to as “PS agent” interface) for causing the computer device shown in FIG. 3 to function as the process server 200 in the grid system. ) 210 Have.
- PS agent process server resource agent.interface
- the PS agent ′ interface 210 is realized by, for example, the program-controlled CPU 11 shown in FIG. 3, storage means such as the main memory 13 and the magnetic disk device 15, and the network interface 16. As shown in FIG. 5, the functions include a PS status monitoring unit 211, a resource capability information notification unit 212, a job reception unit 213, and a job execution unit 214.
- the PS status monitoring unit 211 monitors the current usage status and resource status of its own device (process server 200) and collects information.
- the resource capability information notifying unit 212 notifies the resource agent 120 of the grid server 100 of the information on the usage status of the PS and the status of the resource collected by the PS status monitoring unit 211.
- the resource status management unit 121 and the resource capacity management unit 122 receive this notification, and store and manage it in a storage device such as the main memory 13 or the magnetic disk device 15.
- the notification from the resource capability information notification unit 212 to the grid server 100 may be performed periodically, or may be performed when the operation status of the process server 200 is changed. It is also possible to inquire the process server 200 at any time from the corresponding resource agent 120 of the grid server 100 and match it!
- the job receiving unit 213 receives the job execution request transmitted from the job requesting unit 124 of the resource agent 120 of the grid server 100.
- the job execution unit 214 executes the job received by the job reception unit 213 using the resources of the process server 200.
- the PS status monitoring unit 211 monitors the status of its own device and collects information.
- the resource capacity information acquisition unit 131 inquires the scheduler 110 of the resource capacity of the own system, and the job execution unit 214 executes a job using the resources of the own apparatus, while the job execution request unit 134 sets the scheduler.
- job execution is requested to 110.
- the process server 200 with the PS agent interface 210 is a local resource that executes jobs in the grid system
- the GS agent interface 130 is a local resource.
- the embedded grid server 100 is a server that performs overall control of the grid system and performs scheduling of job execution.
- the resource agent 120 of the grid server 100 that transmits a resource capacity acquisition request and a job execution request and the corresponding GS agent 'interface 130 and PS agent' interface 210, the GS agent ' There is no difference between the interface 130 and the PS agent 'interface 210. Therefore, the resource agent 120 has the same functional configuration regardless of whether the corresponding partner is the process server 200 that is a local resource or another grid server 100 that is a network resource.
- the jobs are executed with the load distributed by a group of grid systems constituting the wide area distribution system.
- the client is an information device such as a computer or a PDA (Personal Digital Assistant) that can access any of the grid systems constituting the wide area distribution system of the present embodiment.
- the process server 200 having a function described later can also issue a job execution request as a client.
- FIG. 6 is a diagram showing the relationship between the functional configuration of a client that issues a job execution request to the wide area distributed system of the present embodiment and the scheduler 110 of the grid server 100.
- the client 300 sends the job execution request to the grid system
- the system includes a resource capacity inquiry section 310 for inquiring the resource capacity of the system, and a job request section 320 for issuing a job execution request and transmitting it to the grid system.
- the client 300 may leave the computer resources necessary for executing the job to the grid system if the execution result for the desired job is obtained.
- the resource capability inquiry unit 310 is required. Not a configuration requirement.
- These functions are realized by a program-controlled CPU 11 and storage means such as a main memory 13 and a magnetic disk device 15, for example, when the client 300 is configured by the computer device shown in FIG.
- the job requesting unit 320 of the client 300 issues a job execution request and transmits the job execution request to the grid server 100 in the grid system to be accessed.
- a resource capacity acquisition request is transmitted from the resource capacity inquiry unit 310 to the grid server 100 to determine whether the grid system has sufficient resource capacity to execute a job. Can be.
- the job receiving unit 113 receives the job execution request transmitted from the client 300, and the optimal resource selecting unit 114 targets the resource means that can be used by the own system. Assign jobs.
- the resource means to which the job is assigned are the process server 200, which is a local resource, and another grid system, which is a network resource.
- FIG. 7 is a flowchart for explaining the job scheduling operation by the scheduler 110.
- the optimal resource selection unit 114 acquires statistical information such as the capability and operation tendency of each resource means from the resource agent 120 via the resource capability inquiry response unit 111 and the resource capability acquisition unit 112 (step 701), and optimal scheduling is performed based on the information and the type and characteristics of the job (step 702). Then, the job requesting unit 115 issues a job execution request based on the processing result of the optimal resource selecting unit 114 irrespective of the operation status of the resource unit to which the job is assigned, and issues a resource agent 120 corresponding to the resource unit. (Step 703).
- the logic of the scheduling by the optimal resource selection unit 114 may be arbitrary, but may be other groups.
- the job execution is also scheduled in the grid system to which the job is requested. Therefore, it is generally considered that allocating jobs to local resources is more efficient. Therefore, first assign a job to the process server 200, which is a local resource of the own system, and use the method that was used when requesting another grid system to execute a job when the capacity of the process server 200 alone is insufficient. Can be.
- the resource agent 120 transmits the job execution request received from the job requesting unit 115 of the scheduler 110 to the corresponding resource means, and also receives the resource execution result and returns the job execution result to the scheduler 110.
- the operation of the resource agent 120 does not differ depending on whether the corresponding resource means is the process server 200 or another grid system.
- the scheduler 110 integrates the execution results of the job by each resource means received from the resource agent 120 and returns the result to the client 300.
- the resource means for executing a job is a case where the resource is a process server 200 which is a local resource of the grid system requested to execute the job, or a case where the resource means is another grid system which is a network resource.
- the job receiving unit 213 of the PS agent interface 210 receives a job execution request from the resource agent 120 of the grid server 100
- the process server 200 executes the job in accordance with the request. Then, the execution result is returned to the resource agent 120 of the grid server 100.
- the resource means is a grid system
- a job execution request from the resource agent 120 is received by the job receiving unit 133 of the GS agent 'interface 130 in the grid server 100 of the grid system, and The request is sent to the scheduler 110 of the grid server 100 by the request unit 134.
- FIG. 8 is a diagram showing the relationship between the resource agent 120, the GS agent 'interface 130, and the scheduler 110.
- the scheduler 110 similarly to the operation for the job execution request directly received from the client 300 described above, the scheduler 110 also responds to the job execution request received from the grid server 100 of another grid system via the GS agent 'interface 130. Scheduling can be performed, and a job execution can be requested to resource means available to the own system.
- the resource agent 120 acquires information (resource information) on the current operation status and job execution capability of the corresponding resource device, And the resource capacity management unit 122.
- the resource means is the process server 200
- the resource information is collected by the PS status monitoring unit 211 of the PS agent interface 210 and sent to the resource agent 120 by the resource capability information notification unit 212.
- the resource information is acquired by the resource capability information acquisition unit 131 of the GS agent 'interface 130 in the grid server 100 of the grid system, and the resource information is reported by the resource capability information notification unit 132. Sent to Sagent 120.
- the resource capability information acquisition unit 131 interrogates the resource capability inquiry of the scheduler 110 with the matching response unit 111, and receives the information. Therefore, the resource capability inquiry response unit 111 may receive an inquiry about the resource capability from the client 300, or may receive an inquiry from the GS agent 'interface 130.
- another grid system adjacent to this grid system on the network can execute a job on the grid system while being able to use it as a resource means of the grid system. You can also ask.
- the grid system uses the predetermined grid system that has requested the job execution as a resource means of its own system. Not available.
- the scheduler 110 uses the grid system including the grid server 100 which has transmitted the resource capability acquisition request to the GS agent' interface 130 as a resource means. I can't. Therefore, in this case, the scheduler 110 calculates the resource capacity that can be provided except for the grid system including the grid server 100 that has transmitted the resource capacity acquisition request, and returns it to the GS agent 'interface 130.
- FIG. 9 is a diagram showing an overall configuration of a grid system group constituting a wide area distributed system according to the present embodiment.
- each grid system the connection between the grid server 100 and the process server 200 which is a local resource, and the connection between the grid server 100 and another grid system are connected to the grid server 100. It has been decided to be performed through the provided resource agent 120.
- a network scheme as shown in FIG. 9 is realized, and each grid system group can receive a job execution request from the client 300, and transfers the job to the process server 200 which is a local resource of its own system. , Or can be put into another grid system adjacent on the network and executed.
- Each grid system does not have a subordinate relationship, and operates in parallel in an equal relationship.
- FIG. 10 is a diagram showing a state of distribution when a job is input to a predetermined grid system of a grid system group connected by the network scheme of the present embodiment.
- a job is submitted to grid A in a wide-area distributed system consisting of five grid systems (grids A, B, C, D, and E) indicated by broken lines.
- This job is first distributed to Process Server (PS) 200, which is a local resource of Grid A. If the capacity of the local resource of Grid A cannot handle this job and overflow occurs, the grid server (GS) 100 of Grid A will respond to the neighboring grids B and C on the network.
- PS Process Server
- a job can be passed from the grid A to the grids B and C regardless of the operation status of the grids B and C.
- grid A In the grid server 100, the resource agents 120 corresponding to the grids B and C receive the execution request of the job, and when the grids B and C are ready to accept the job, the resource agents 120 switch to the grids B and C. The job is submitted to C.
- the job is input to another grid system.
- a method of processing jobs preferentially with local resources as much as possible is preferable to reduce the load on the network.
- the method of distributed job submission is not limited to this. In consideration of the local resources of the own system and the capacity of other neighboring grid systems, job types, characteristics, etc., it is possible to allocate jobs with arbitrary logic so that distribution is optimal (high execution efficiency). it can.
- FIG. 11 is a diagram showing a state of distribution when a job is input to another grid system (grid B) in the grid system group of FIG.
- a job submitted to grid B is first submitted to process server 200, which is a local resource of grid B, and when an overflow occurs, adjacent grids A, D, and Distributed into E. If an overflow occurs in grid A, jobs are also distributed to grid C.
- FIG. 12 is a diagram illustrating resource capacity when a job is input to a predetermined grid system in the grid system group of FIG.
- the resource capacity of each grid system is defined as follows.
- C Grid system X's own (local resource) resource capacity
- C Resource capacity that grid system x can provide in response to a job execution request from a client
- grid system b (grid B in Fig. 10) is adjacent to grid system a and grid systems d and e (grid D and E in Fig. 10).
- the resource capacity that can be provided is the sum of the resource capacity of the own system and the resource capacity provided by grid systems d and e. That is,
- grid system c (grid C in Fig. 10) is only adjacent to grid system a, so it can provide only the resource capacity of its own system.
- the processing capacity C provided to execute this job is calculated as follows.
- the resource capacity provided from grid system b to grid system d is as follows.
- the resource capacity c is as described above.
- each grid system directly obtains the resource capability of its own system and the resource capability provided by another grid system adjacent to its own system.
- the processing capacity of the entire wide area distributed system is increased. Therefore, as in the conventional technology shown in Figs. 13 to 15, the network load that does not require the provision of a meta-scheduler to exchange information for grasping the state of the grid system and its local resources in the entire wide area distributed system. Can be greatly reduced.
- the grid server 100 of the grid system and the process server 200 which is a local resource are connected as a resource agent 120 provided in the grid server 100 as an interface module.
- Grid Sano 100 of another grid system adjacent on the network was connected via the same resource agent 120.
- the grid servers 100 of the grid systems adjacent to each other on the network can treat each other's grid system in the same way as the local resources of the own system.
- a connected wide area distributed system can be realized. Since the scheduler 110 of each grid server 100 does not need to distinguish between other grid systems and its own resources, there is no need to introduce a special mechanism for a wide area distributed system. Also, there is no need to provide a meta-scheduler that manages the grid system group that composes the wide area distributed system. Therefore, the labor and cost required for system development can be significantly reduced.
- the scheduler 110 since the resource agent 120 provided corresponding to each resource means manages information on resource means including local resources and other adjacent grid systems, the scheduler 110 operates each resource means.
- the job may be assigned to the resource agent 120 without considering the state. Therefore, when a new grid system is added to the grid system group that composes the wide area distributed system, or when a predetermined grid system is excluded from the grid system group, the grid system adjacent to these grid systems is This can be achieved simply by adding or deleting the corresponding resource agent 120. For this reason, the extensibility and flexibility of the system are very high.
- FIG. 1 is a diagram showing an entire configuration of a wide area distributed system according to the present embodiment.
- FIG. 2 is a diagram showing a configuration of each grid system constituting the wide area distributed system of FIG. 1.
- FIG. 3 is a diagram schematically showing an example of a hardware configuration of a computer device suitable for realizing a grid server and a process server in the present embodiment.
- FIG. 4 is a diagram showing a functional configuration of a grid server in the present embodiment.
- FIG. 5 is a diagram showing a relationship between a functional configuration of a process server and a resource agent of a grid server according to the embodiment.
- FIG. 6 is a diagram showing a relationship between a functional configuration of a client that issues a job execution request to the wide area distributed system of the present embodiment and a grid server scheduler.
- FIG. 7 is a flowchart illustrating an operation of scheduling a job by the scheduler of the embodiment.
- FIG. 8 is a diagram showing a relationship between a resource agent, a GS agent 'interface, and a scheduler in the present embodiment.
- FIG. 9 is a diagram showing an entire configuration of a grid system group constituting a wide area distributed system according to the present embodiment.
- FIG. 10 is a diagram illustrating a distribution state when a job is input to a predetermined grid system of a grid system group connected by the network scheme of the present embodiment.
- FIG. 11 is a diagram showing a distribution state when a job is input to another grid system in the grid system group of FIG.
- FIG. 12 is a diagram illustrating resource capacity when a job is input to a predetermined grid system in the grid system group of FIG.
- FIG. 13 is a diagram schematically showing a system configuration of a wide area distributed system by a centralized scheme.
- FIG. 14 is a diagram schematically showing a system configuration of a wide area distributed system by a hierarchical scheme.
- FIG. 15 is a diagram schematically showing a system configuration of a wide area distributed system by a distributed scheme. Explanation of reference numerals
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006513869A JPWO2005116832A1 (ja) | 2004-05-31 | 2005-05-23 | 分散処理環境におけるジョブの実行を制御するためのコンピュータシステム、方法及びプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-161819 | 2004-05-31 | ||
JP2004161819 | 2004-05-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005116832A1 true WO2005116832A1 (ja) | 2005-12-08 |
Family
ID=35451046
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/009350 WO2005116832A1 (ja) | 2004-05-31 | 2005-05-23 | 分散処理環境におけるジョブの実行を制御するためのコンピュータシステム、方法及びプログラム |
Country Status (3)
Country | Link |
---|---|
JP (1) | JPWO2005116832A1 (ja) |
CN (1) | CN1954295A (ja) |
WO (1) | WO2005116832A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006038669A (ja) * | 2004-07-28 | 2006-02-09 | Toyota Infotechnology Center Co Ltd | グリッド・コンピューティング・システム、プログラム、記録媒体およびグリッド・コンピューティング方法 |
JP2008198025A (ja) * | 2007-02-14 | 2008-08-28 | Fujitsu Ltd | 並列処理制御プログラム、並列処理制御システムおよび並列処理制御方法 |
JP2009187415A (ja) * | 2008-02-08 | 2009-08-20 | Nec Corp | グリッドコンピューティングシステム及びデータ処理方法 |
JP2011096247A (ja) * | 2009-10-28 | 2011-05-12 | Internatl Business Mach Corp <Ibm> | 並列計算の親和性駆動分散スケジューリングのための装置、方法、およびコンピュータ・プログラム(並列計算の親和性駆動分散スケジューリングのためのシステムおよび方法) |
JP2013239124A (ja) * | 2012-05-17 | 2013-11-28 | Nec Corp | 端末制御システム、端末管理装置、端末制御装置、端末制御方法、端末管理プログラム及び端末制御プログラム |
US9921883B2 (en) | 2015-01-22 | 2018-03-20 | Fujitsu Limited | Job management device and method for determining processing elements for job assignment |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8442015B2 (en) * | 2007-07-20 | 2013-05-14 | Broadcom Corporation | Method and system for an atomizing function of a mobile device |
CN106899656B (zh) * | 2017-01-03 | 2018-12-11 | 珠海格力电器股份有限公司 | 设备控制方法和装置 |
CN110032364B (zh) * | 2019-04-11 | 2023-08-15 | 上海商汤智能科技有限公司 | 数据处理方法、装置、电子设备和计算机存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07141302A (ja) * | 1993-11-17 | 1995-06-02 | Agency Of Ind Science & Technol | 並列計算機における負荷分散方法 |
JPH09179832A (ja) * | 1995-12-27 | 1997-07-11 | Sony Corp | 計算装置および方法 |
JPH09231184A (ja) * | 1996-02-23 | 1997-09-05 | Mitsubishi Electric Corp | 自律協調情報処理装置並びに自律協調分散処理方法 |
JP2912225B2 (ja) * | 1996-04-18 | 1999-06-28 | 四国日本電気ソフトウェア株式会社 | 通信処理システム |
-
2005
- 2005-05-23 JP JP2006513869A patent/JPWO2005116832A1/ja active Pending
- 2005-05-23 CN CNA2005800154954A patent/CN1954295A/zh active Pending
- 2005-05-23 WO PCT/JP2005/009350 patent/WO2005116832A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07141302A (ja) * | 1993-11-17 | 1995-06-02 | Agency Of Ind Science & Technol | 並列計算機における負荷分散方法 |
JPH09179832A (ja) * | 1995-12-27 | 1997-07-11 | Sony Corp | 計算装置および方法 |
JPH09231184A (ja) * | 1996-02-23 | 1997-09-05 | Mitsubishi Electric Corp | 自律協調情報処理装置並びに自律協調分散処理方法 |
JP2912225B2 (ja) * | 1996-04-18 | 1999-06-28 | 四国日本電気ソフトウェア株式会社 | 通信処理システム |
Non-Patent Citations (3)
Title |
---|
ANDRADE N. ET AL: "Our Grid: An approach to easily assemble grids with equitable resource sharing", PROCEEDINGS OF THE 9TH INTERNATIONAL JOB SCHEDULING FOR PARALLEL PROCESSING, June 2003 (2003-06-01), pages 61 - 86, XP002990370 * |
PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), March 2003 (2003-03-01), XP010678728 * |
TALIA D. ET AL: "Toward a Synergy Between P2P and Grids", IEEE INTERNET COMPUTING, 2003, pages 94 - 96, XP002990371 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006038669A (ja) * | 2004-07-28 | 2006-02-09 | Toyota Infotechnology Center Co Ltd | グリッド・コンピューティング・システム、プログラム、記録媒体およびグリッド・コンピューティング方法 |
JP4522780B2 (ja) * | 2004-07-28 | 2010-08-11 | 株式会社トヨタIt開発センター | グリッド・コンピューティング・システム、プログラム、記録媒体およびグリッド・コンピューティング方法 |
JP2008198025A (ja) * | 2007-02-14 | 2008-08-28 | Fujitsu Ltd | 並列処理制御プログラム、並列処理制御システムおよび並列処理制御方法 |
JP2009187415A (ja) * | 2008-02-08 | 2009-08-20 | Nec Corp | グリッドコンピューティングシステム及びデータ処理方法 |
JP2011096247A (ja) * | 2009-10-28 | 2011-05-12 | Internatl Business Mach Corp <Ibm> | 並列計算の親和性駆動分散スケジューリングのための装置、方法、およびコンピュータ・プログラム(並列計算の親和性駆動分散スケジューリングのためのシステムおよび方法) |
US8959525B2 (en) | 2009-10-28 | 2015-02-17 | International Business Machines Corporation | Systems and methods for affinity driven distributed scheduling of parallel computations |
JP2013239124A (ja) * | 2012-05-17 | 2013-11-28 | Nec Corp | 端末制御システム、端末管理装置、端末制御装置、端末制御方法、端末管理プログラム及び端末制御プログラム |
US9921883B2 (en) | 2015-01-22 | 2018-03-20 | Fujitsu Limited | Job management device and method for determining processing elements for job assignment |
Also Published As
Publication number | Publication date |
---|---|
CN1954295A (zh) | 2007-04-25 |
JPWO2005116832A1 (ja) | 2008-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tran-Dang et al. | FRATO: Fog resource based adaptive task offloading for delay-minimizing IoT service provisioning | |
JP5022030B2 (ja) | コンピュータシステム、これを構成するサーバ、そのジョブ実行制御方法及びプログラム | |
US20210250249A1 (en) | System and Method for Providing Dynamic Provisioning Within a Compute Environment | |
US10652319B2 (en) | Method and system for forming compute clusters using block chains | |
US9075659B2 (en) | Task allocation in a computer network | |
WO2005116832A1 (ja) | 分散処理環境におけるジョブの実行を制御するためのコンピュータシステム、方法及びプログラム | |
US7707288B2 (en) | Automatically building a locally managed virtual node grouping to handle a grid job requiring a degree of resource parallelism within a grid environment | |
US20050188087A1 (en) | Parallel processing system | |
US7774457B1 (en) | Resource evaluation for a batch job and an interactive session concurrently executed in a grid computing environment | |
JP4954089B2 (ja) | グリッド・アクティビティのモニタリングおよび振り分けによる総合的グリッド環境管理を促進する方法、システム、およびコンピュータ・プログラム | |
US9424096B2 (en) | Task allocation in a computer network | |
JP5088366B2 (ja) | 仮想計算機制御プログラム、仮想計算機制御システムおよび仮想計算機移動方法 | |
KR20130088512A (ko) | 클러스터 컴퓨팅 환경에서의 자원 관리 장치 및 방법 | |
JP2008226181A (ja) | 並列実行プログラム、該プログラムを記録した記録媒体、並列実行装置および並列実行方法 | |
JP2007041720A (ja) | ジョブステップ実行プログラムおよびジョブステップ実行方法 | |
US10270847B2 (en) | Method for distributing heavy task loads across a multiple-computer network by sending a task-available message over the computer network to all other server computers connected to the network | |
JP5151509B2 (ja) | 仮想マシンシステム及びそれに用いる仮想マシン分散方法 | |
US20120324095A1 (en) | Image processing in a computer network | |
JP2007102332A (ja) | 負荷分散システム及び負荷分散方法 | |
JP4963854B2 (ja) | マルチプロセッサコンピュータおよびネットワークコンピューティングシステム | |
Peng et al. | BQueue: A coarse-grained bucket QoS scheduler | |
JP4963855B2 (ja) | ネットワークコンピューティングシステムおよびマルチプロセッサコンピュータ | |
TW202205091A (zh) | 運算系統及其主機資源分配方法 | |
JPH10207847A (ja) | 分散システムにおける自動負荷分散方式 | |
JP4647513B2 (ja) | 並列処理システム、処理端末装置、並列処理方法、描画システム、画像表示システム、音響システム、プログラム、及び、記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006513869 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580015495.4 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |