WO2014016950A1

WO2014016950A1 - Parallel computer system, and method for arranging processing load in parallel computer system

Info

Publication number: WO2014016950A1
Application number: PCT/JP2012/069077
Authority: WO
Inventors: 泰幸工藤; 加藤　猛; 雅士高田; 幸二福田
Original assignee: 株式会社日立製作所
Priority date: 2012-07-27
Filing date: 2012-07-27
Publication date: 2014-01-30
Also published as: JPWO2014016950A1

Abstract

The present invention addresses the problem of providing a method for arranging the processing load in a parallel computer system, said method being capable of appropriately distributing a processing load that includes the influence of increases/decreases in the communication load between computer processors in conjunction with the migration of processes. The present invention solves this problem by means of a parallel computer system that distributes multiple processing loads with respect to multiple processing nodes, wherein: information regarding the amount of communication between processing loads is obtained; pairs of processing nodes are selected from the multiple processing nodes on the basis of the obtained information, as nodes for which the arrangement is to be changed; a prediction is made regarding the change in the amount of the communication load between these pairs of processing nodes when the processing load is migrated between the selected pairs of processing nodes; and the multiple processing loads are rearranged on the basis of the prediction results.

Description

Parallel computer system and processing load allocation method to parallel computer system

The present invention relates to a parallel computer system, and more particularly, to a processing load arrangement on the parallel computer system.

There is a technique described in Patent Document 1 as background art in this technical field. This publication describes a load distribution method for appropriately distributing the loads of a plurality of computers. Specifically, when a certain process (described as a program in Patent Document 1) is executed by its own calculation processor (described as an executioner in Patent Document 1), the communication time before movement with each of the calculation processors, A method of obtaining a post-movement communication time with each processor when processing is moved to another calculation processor and moving the process to the other calculation processor so that the post-movement communication time is shorter than the pre-movement communication time Is described. Moreover, there exists a technique of patent document 2 as background art of this technical field. This publication describes a method of executing a process being executed as a separate process so that the communication efficiency of the entire system is optimized in a parallel computer system. Specifically, when the amount of communication usage exceeds the threshold for the amount of communication when transmitting from the running process to another calculation processor, the smallest communication among the nearby calculation processors on the receiving side. A method is described in which a calculation processor which is a usage amount is selected and a process being executed is transferred to the processor.

JP-A-8-30558 Japanese Patent Laid-Open No. 11-154143

In recent years, in order to analyze and predict complex economic systems and social movements, there is an increasing demand for a method called agent-based simulation (ABS: Agent-based Simulation). The ABS is generally composed of a plurality of agents that make autonomous decision making, and generally considers the behavior of the system in consideration of the interaction between agents. When executing ABS using parallel computation, agents are grouped and allocated to each computation processor, and the interaction between agents is propagated through the network.

Here, in the computer system of Patent Document 1 described above, the determination as to whether or not to move the process to another calculation processor is based on the sum of the process execution time and the communication time associated with the process movement, and the process is being executed. The communication between computation nodes occurring in the above is not considered. Therefore, it is difficult to properly distribute the processing load including communication in a process such as ABS in which a large number of communications between calculation processors occur.

On the other hand, although the computer system of Patent Document 2 described above uses the communication load between the calculation processors during the process execution as a criterion for determining the process movement, the increase or decrease in the communication load generated in another communication path due to the process movement is described below. Not considered. Therefore, in a process in which a large number of communication between calculation processors occurs, such as ABS, there is a high possibility that communication on another communication path will increase as the process moves, and the processing load including communication will be distributed appropriately. It was difficult.

The present invention has been made in view of the above problems, and its object is to provide a parallel computer system capable of appropriately distributing the processing load amount including the influence of increase / decrease in the communication load between the calculation processors accompanying the process movement. It is to provide a processing load arrangement method.

In the present invention, in a parallel computer system that distributes and executes a plurality of processing loads for a plurality of processing nodes, information on the amount of communication between the processing loads is acquired, and a plurality of processes are performed based on the acquired information. Predicting a change in the communication load amount between the pair of processing nodes when a pair of processing nodes whose arrangement is to be changed is selected and the processing load is moved between the selected pair of processing nodes. And the above-mentioned problem is solved by rearranging the plurality of processing loads based on the prediction result.

According to the present invention, even in a process such as ABS in which a large number of communications between computing processors occur, it is possible to appropriately distribute the processing load including communications, and to speed up the processing in the parallel computer system. Can be achieved.

It is a block diagram explaining the function of a parallel computer system. It is a block diagram explaining the hardware image of a parallel computer system. It is a flowchart explaining operation | movement of a parallel computer system. It is a list for demonstrating operation | movement of a process load arrangement | positioning means. It is a figure for demonstrating the operation image of a process load arrangement | positioning means. It is a list for demonstrating operation | movement of a load monitoring means. It is a list for demonstrating operation | movement of a process load arrangement | positioning means. It is a list for demonstrating operation | movement of a process load arrangement | positioning means. It is a figure for demonstrating the operation image of a process load arrangement | positioning means. It is a list for demonstrating operation | movement of a load monitoring means. It is a list for demonstrating operation | movement of a process load arrangement | positioning means. It is a list for demonstrating operation | movement of a process load arrangement | positioning means. It is a figure for demonstrating the operation image of a process load arrangement | positioning means. It is a list for demonstrating operation | movement of a load monitoring means. It is a list for demonstrating operation | movement of a load monitoring means. It is a list for demonstrating operation | movement of a load monitoring means.

Hereinafter, examples will be described with reference to the drawings.

In the present embodiment, an example of a parallel computer system capable of appropriately distributing the processing load including the communication amount when executing calculation involving parallel processing such as ABS will be described.

FIG. 1 is a functional block diagram of the parallel computer system 100 of this embodiment. The parallel computer system 100 includes a plurality of information processing apparatuses. The plurality of information processing apparatuses of the parallel computer system 100 include one management node 101 and three processing nodes 102 to 104. Each of the management node 101 and the processing nodes 102 to 104 is a server device, for example. In the following description, when referring to a specific processing node, the processing node 102 is referred to as a processing node 1, the processing node 103 is referred to as a processing node 2, and the processing node 104 is referred to as a processing node 3. In the management node 101, program information 105, processing load arrangement information 106, monitoring result information 107, processing load arrangement means 108, communication control means 109, and calculation result aggregation means 110 are arranged. In each processing node, processing information 111, calculation necessary information 112, calculation processing means 113, communication control means 114, and load monitoring means 115 are arranged. The parallel computer system 100 includes a storage device 117. The management node 101, the storage device 117, and the processing nodes 102 to 104 are connected via the communication path control means 116.

FIG. 2 shows the functional block of FIG. 1 as a hardware image. Each of the management node 101 and the processing nodes 102 to 104 has a central processing unit (CPU) 201, a memory 202, and communication means 203. Various network devices can be used for the communication path control means 116. In this embodiment, a network switch is used for the communication path control means 116.

Next, the operation of the parallel computer system 100 of this embodiment will be described. FIG. 3 is a flowchart showing the operation of the parallel computer system 100.

First, a program is loaded from the storage device 117 to the memory 202 of the management node 101 and stored as program information 105. The stored program information 105 includes calculation information performed by each agent of the ABS, information related to communication performed by the agents, and the like. In this embodiment, an ABS composed of 18 agents A to R will be described. In the ABS of this embodiment, the process is repeated 20 times while changing the calculation coefficient of the agent.

Next, the operation of the parallel computer system 100 after the processing load arrangement 301 will be described. In FIG. 3, the processing load allocation 301 and subsequent steps include processing execution 302, load analysis 303, processing load rearrangement 304, setting change 305, processing execution 306, load analysis 307, and result output 308.

First, the processing load placement 301 is executed by the processing load placement means 108 and the communication control means 109 of the management node 101. First, the processing load arrangement unit 108 reads the stored program information 105 and assigns, for example, agents A to R cyclically to the

processing nodes

1, 2, and 3. Then, computation information executed by each agent in each processing node, information related to communication performed by the agents, and the like are transmitted using the communication control means 108, and these are transmitted as processing information 111 to the memory 202 of each processing node. Stored. FIG. 4 is a list showing the arrangement destination of each agent on the processing node and the relationship between the communicating agents, and this information is stored as the processing load arrangement information 106. FIG. 5 is an image of the list shown in FIG. 4. Each agent is indicated by a circle, agents that need to communicate with each other are represented by solid lines, and a dotted line is represented by a region assigned to each processing node.

Next, the operation of the parallel computer system 100 in the first process execution 302 will be described. The process execution 302 is executed by the arithmetic processing unit 113 of each processing node. The arithmetic processing means 113 reads the processing information 111 and the calculation necessary information 112, executes the calculation of each agent in each processing node from these information, and outputs the result. Here, the calculation necessary information 112 is information necessary for an agent in each processing node to perform a calculation, for example, a result of a calculation executed by another agent. Further, the calculation order and parallelism in the calculation processing unit 113, synchronization with other processing nodes, and the like are determined based on the calculation necessary information 112, and these are included in the processing information 111. Furthermore, the calculation result in the calculation processing unit 113 is transmitted using the communication control unit 114, stored in the calculation totaling unit 110 of the management node 101, and also in the calculation necessary information 112 of the processing node that needs the information. Stored.

Next, the operation of the load analysis 303 will be described. The load analysis 303 is executed by the load monitoring unit 115 of each processing node. The load monitoring unit 115 monitors the communication load of the communication control unit 114, and the monitoring result is transmitted to the management node 101 using the communication control unit 114 and stored as monitoring result information 107. Here, the communication load to be monitored is a value obtained by measuring the communication time between processing nodes in the first process execution 302, and is measured by, for example, the communication unit 203 or the communication path control unit 116 of each processing device. . The reason for selecting this index is that the communication speed between agents across the processing nodes is much slower than that in the processing nodes, which is considered to easily cause a bottleneck in processing. FIG. 6 is an example of the monitoring result information 107 transmitted from each processing node. In FIG. 6, the communication time is proportional to the number of connections between the processing nodes in the image diagram of FIG. 5, for the sake of convenience of explanation, it is assumed that the communication amounts of the agents are all equal. That is, in this embodiment, the communication amount between the processing nodes is evaluated by the number of connections.

Next, the operation of the processing load rearrangement 304 will be described. The processing load rearrangement 304 is executed by the processing load placement unit 108 of the management node 108. The processing load placement unit 108 reads the monitoring result information 107 and determines the bottleneck route with the highest communication load. For example, in FIG. 6, since the average usage rate between the processing node 1 and the processing node 2 is the highest at 100%, this route is determined as a bottleneck route. Next, the processing load placement unit 108 selects an agent to be moved to alleviate the bottleneck. The concept for this selection will be described with reference to the image diagram of FIG.

As described above, since it is assumed that the communication time is proportional to the number of connections between processing nodes, communication can be reduced by reducing the number of connections between

processing nodes

1 and 2. Appropriate distribution of processing load including the load. Therefore, the processing load placement unit 108 of the management node 101 selects the pair of processing node 1 and processing node 2 having the highest average usage rate, and tries to move the agent to be processed between the pair.

For example, in FIG. 5, when the agent D is moved from the processing node 1 to the processing node 2, since the connection with the agent B moves into the processing node 2, the number of connections between the processing node 1 and the processing node 2 is calculated. One can be reduced. However, three connections with agents A, M, and J are newly generated between processing node 1 and processing node 2. Therefore, the number of connections between the processing node 1 and the processing node 2 is increased by two compared to before the movement. On the other hand, when the agent G is moved from the processing node 1 to the processing node 2, the connection between the agents E, K, and Q moves into the processing node 2, so that the connection between the processing node 1 and the processing node 2 is performed. The number can be reduced by three. On the other hand, one connection with the agent J is newly generated between the processing node 1 and the processing node 2. Therefore, the number of connections between the processing node 1 and the processing node 2 is reduced by two compared to before the movement. In this way, if the number of connections with the agent in the processing node to be moved is subtracted from the number of connections with the agent in the processing node, and the result is negative, the processing nodes It is possible to reduce the number of connections.

Therefore, the processing load placement unit 108 of the management node 101 changes the number of connections between the processing nodes when the processing load is moved between the selected pairs based on the processing load placement information 106 shown in FIG. calculate. FIG. 7A is an example of the calculation result of the change in the number of connections between processing nodes for the processing node 1, and the number of agents communicated by each agent in the processing node 1 and the processing by each agent. Information on the change (increase / decrease) in the number of connections between processing nodes when moving to node 2 is included. Here, the change in the number of connections between the processing node 1 and the processing node 2 depends on the number of agents that each agent communicates in the processing node 1 from the number of agents that are arranged in the processing node 2 that communicates with each agent. It can be obtained by subtracting the number. Further, the number of connections between the

processing nodes

2 and 3 increases by the number of agents arranged in the processing nodes 3 with which the respective processing nodes communicate. Furthermore, the change in the number of connections between the processing node 3 and the processing node 1 can be obtained by subtracting from 0 the number of agents arranged in the processing node 3 with which each agent communicates. In addition, the magnitude of the increase in the number of connections between the

processing nodes

2 and 3 and the magnitude of the decrease in the number of connections between the

processing nodes

3 and 1 coincide.

Here, paying attention to the column of increase / decrease in the number of connections between the

processing nodes

1 and 2, only the agent G has a negative value, and in the agent G, the

processing nodes

2 and 3 And the increase / decrease in the number of connections between the processing node 3 and the processing node 1 are zero. Therefore, the processing load placement unit 108 can reduce the number of connections between the

processing nodes

1 and 2 by moving the agent G to the processing node 2 and can also reduce the number of connections between the other processing nodes. It can be determined that the number of connections does not increase or decrease. Similarly, FIG. 7B is an example of a list when processing node 2 is targeted, and includes an increase / decrease in the number of connections between processing nodes when each agent moves to processing node 1. In FIG. 7B, when attention is paid to the column of increase / decrease in the number of connections between the processing node 1 and the processing node 2, it can be seen that there is no agent having a negative value. From the above, the processing load placement unit 108 determines that the agent G in the processing node 1 should be moved to the processing node 2 in order to alleviate the bottleneck. If there are multiple candidates for movement to be processed to eliminate bottlenecks, select the candidate that will most reduce the number of connections, that is, the candidate that most reduces the overall communication load. It can be lowered well. In addition, when there are a plurality of processing target movement candidates for bottleneck resolution, by selecting a candidate that does not involve an increase in the number of connections between processing nodes other than the processing node pair in which the processing load is moved, It is possible to prevent the occurrence of a new bottleneck accompanying the movement of the processing load. The above-described movement of the agent G is a candidate that most reduces the communication load, and is an ideal movement that prevents the occurrence of a new bottleneck.

Based on the above analysis results, the processing load placement means 108 updates the list of FIG. 4 so that the agent G is placed in the processing node 2 and stores it again as processing load placement information 106.

Next, the operation of the parallel computer system 100 in the setting change 305 will be described. The content of the setting change 305 is included in the program information 105, and the management node 101 updates the content of the processing information 111 of each processing node using the communication control unit 108 according to the number of times of processing execution. In this embodiment, the setting change 305 is a change of the calculation coefficient of each agent.

With the above operation, the first processing procedure is completed, and the process proceeds to the second processing execution 302. The second processing execution 302 is performed based on the processing load arrangement information 106 and the processing information 111 updated by the first processing procedure.

Although the processing execution 302 and the load analysis 303 are different in that the calculation coefficient of each agent is changed and the agent G is arranged in the processing node 2, the parallel computer system 100 is functionally The same operations as those in the first process execution 302 and the load analysis 303 are performed. Specifically, the parallel computer system 100 performs processing execution 302 based on the processing load arrangement shown in the image diagram of FIG. 8 and obtains monitoring result information 107 shown in FIG.

Next, in the load rearrangement 304, the processing load placement unit 108 determines from the monitoring result information 107 shown in FIG. 9 that the bottleneck path is between the processing node 2 and the processing node 3 with the longest communication time. To do.

Next, the increase / decrease in the number of connections between the processing nodes is calculated, and the processing load placement unit 108 reduces the number of connections between the

processing nodes

2 and 3 based on the calculation result shown in FIG. The agent N is determined to be moved from the processing node 3 to the processing node 2. Furthermore, the processing load placement unit 108 updates the list of FIG. 4 so that the agent N is placed in the processing node 2 and stores the updated list as the processing load placement information 106 again.

FIG. 11 is an image diagram of the updated processing load arrangement information 106. Thereafter, the operation of the parallel computer system 100 shifts to the setting change 305, and the contents of the processing information 111 are updated as in the first processing procedure.

With the above operation, the second processing procedure is completed, and the procedure proceeds to the third processing procedure. In the third and subsequent processing procedures, the processing load rearrangement 304 is not executed, the processing execution 306, the load analysis 307, and the setting change 305 are executed, and the same operation is repeated until the twentieth processing procedure is completed. . The operations of the process execution 306, the load analysis 307, and the setting change 305 are functionally similar to the first and second processing procedures. In the third and subsequent processing procedures, the result of the load analysis 307 becomes the monitoring result information 107 shown in FIG. 12, and the communication time is further reduced compared to the monitoring result information 107 shown in FIG. I understand. Thus, by rearranging the processing loads at the initial stage of parallel processing, it is possible to receive many benefits of reducing the communication time after rearrangement.

Finally, the operation of the parallel computer system 100 with the result output 308 will be described. The calculation results from the first time to the 20th time are stored in the calculation totaling means 110, and the parallel computer system 100 outputs these results based on the program information 105. In the present embodiment, although the output destination of the calculation result is not specified, for example, an image display device, a printer, an external storage, or the like can be used as the output destination.

According to the first embodiment of the present invention described above, the second processing procedure can alleviate the communication bottleneck than the first processing, and the third processing procedure can perform the third processing procedure more than the second processing procedure. Communication bottlenecks can be alleviated. That is, it is possible to provide a parallel computer system that can appropriately distribute the processing load amount for communication when executing parallel processing such as ABS in which a large number of communication between calculation processors occurs.

In the embodiment of the present invention, the processing load rearrangement is set to two times, but is not limited to this, and can be set once or three times or more. However, since processing time also occurs in the processing load rearrangement process itself, it is not efficient to repeat the processing load rearrangement after the bottleneck has been resolved to some extent. Therefore, it is desirable to determine the number of repetitions of processing load relocation in consideration of the balance between the time for executing one processing procedure and the number of repetitions thereof and the time for executing processing load relocation. Since it is desirable that the bottleneck be eliminated at an early stage, it is desirable to perform the processing load rearrangement in the first processing procedure described above.

In this embodiment, the processing load rearrangement operation is determined by comparing the communication time. However, the present invention is not limited to this. For example, the processing load is set only when the communication time threshold is exceeded and the threshold is exceeded. It is also possible to perform relocation.

Also, it is possible to repeat the processing load rearrangement before executing the processing procedure to make the number of connections between the processing nodes uniform. However, the actual communication time between processing nodes may not necessarily depend on the number of connections due to various factors. Therefore, it is desirable to provide a communication load monitoring function and a rearrangement function even when the above processing procedure is performed. If the communication time after the processing load rearrangement obviously increases due to various factors, it is possible to execute the processing load allocation of the next candidate or return to the original processing load allocation processing.

Also, depending on the network topology, the communication bandwidth between the processing nodes may be different. In this case, even if communication with the same amount of information is executed, the communication time may be different between the processing nodes. . In this case, the communication time per unit information amount between the processing nodes is calculated in advance from the measurement or the specification of the apparatus, and the processing load is rearranged by taking this information into account, thereby making the communication time uniform. Is possible.

Furthermore, in the present embodiment, ABS is taken as an example of parallel computation, but the present invention is not limited to this, and can be applied to analysis called graph processing such as shortest path search. In the case of graph processing, processing for each graph vertex corresponds to each processing load. Furthermore, in this embodiment, for convenience of explanation, one management node and three processing nodes are used, but it goes without saying that the present invention is not limited to this.

In the present embodiment, an example of a parallel computing system capable of not only monitoring communication load but also computing load and taking into account both load amounts and appropriately distributing the processing load amount will be described.

In this embodiment, the basic configuration and operation are the same as those of the parallel computer system 100 shown in the first embodiment of the present invention. The difference from the first embodiment of the present invention is that the execution contents of the load monitoring means 115 and the contents of the monitoring result information 107 as a result thereof, and the execution contents of the processing load arrangement means 108 and the processing load arrangement information 106 as a result thereof. It is the contents of. Hereinafter, these differences will be described in detail.

First, the load monitoring means 115 monitors the communication load of the communication control means 114 as in the first embodiment of the present invention, but also monitors the calculation load of the arithmetic processing means 113 anew. Here, the computational load to be monitored is, for example, the average load rate of the CPU 201 and the average usage rate of the memory 202 at the time of operation of the process execution 302. FIG. 13 is an example of the monitoring result information 107 transmitted from each processing node, and assumes the case of the processing load arrangement image shown in FIG. In FIG. 13, the CPU load factor and the memory usage rate of each processing node are proportional to the number of agents in the image diagram of FIG. This is because they are assumed to be equal.

Next, the processing load arrangement means 108 performs the same operation as that of the first embodiment of the present invention, but the difference from the first embodiment of the present invention is that the CPU load factor or the memory usage rate is different. If the value is exceeded, the agent will not be moved. For example, if the threshold value in the present embodiment is 65%, the CPU load rate and the memory usage rate are both 60% in the monitoring result information 107 in FIG. Then, the agent G moves as in the first embodiment of the present invention.

Next, the second processing procedure is executed, and the result of monitoring the load in the same manner is the monitoring result information 107 in FIG. In FIG. 14, both the CPU load rate and the memory usage rate are 70%, which exceeds 65% of the threshold value. Therefore, the candidate agent N is not moved, and the second arrangement image shown in FIG. 8 is changed. Without this, the third and subsequent processing steps are executed.

According to the second embodiment of the present invention described above, the communication bottleneck can be mitigated in the second processing procedure than in the first. However, if the agent is moved to further reduce the communication bottleneck, the calculation processing is affected. Therefore, it is determined that the processing load arrangement at this point is optimal. That is, it is possible to provide a parallel computer system that can appropriately distribute the processing load including communication when executing processing such as ABS in which a large number of communications between computing processors occur.

In this embodiment, as a criterion for determining whether or not to perform load rearrangement, the CPU load rate or the memory usage rate is set to 65% as a threshold value. However, the present invention is not limited to this. It is desirable to determine the threshold in consideration of the balance between time and calculation time.

100: parallel computing system 100, 101: management node, 102 to 104: processing node 1, 105: program information, 106: processing load arrangement information, 107: monitoring result information, 108: processing load arrangement means, 109: communication control means 110: calculation result totaling means, 111: processing information, 112: calculation necessary information, 113: calculation processing means, 114: communication control means, 115: load monitoring means, 116: communication path control means, 117: storage device, 201 : CPU, 202: Memory, 203: Communication means, 301: Processing load allocation, 302: Processing execution, 303: Load analysis, 304: Processing load relocation, 305: Setting change, 306: Processing execution, 307: Load analysis, 308: Result output.

Claims

A processing load allocation method for a parallel computer system that executes a plurality of processing loads distributed to a plurality of processing nodes.
Get information on traffic between each processing load,
Based on the acquired information, select a pair of processing nodes whose arrangement is to be changed among the plurality of processing nodes,
Predicting a change in the communication load amount between the pair of processing nodes when the processing load is moved between the selected pair of processing nodes,
A processing load arrangement method comprising rearranging the plurality of processing loads based on a prediction result.
The processing load arrangement method according to claim 1,
When making the prediction,
Further, a processing load arrangement method characterized by predicting a change in communication load amount between processing nodes other than between the pair of processing nodes.
The processing load arrangement method according to claim 1,
The parallel computer system repeats a plurality of processes,
The processing load arrangement method, wherein the rearrangement is performed after a first process of the plurality of processes.
In the processing load arrangement method according to claim 3,
The processing load arrangement method, wherein the first process is a first process among the plurality of processes.
The processing load arrangement method according to claim 1,
A processing load arrangement method, wherein each processing load corresponds to each agent in the agent-based simulation.
The processing load arrangement method according to claim 1,
A processing load arrangement method, wherein each processing load corresponds to each node of the graph.
A parallel computer system that executes a plurality of processing loads in a distributed manner with respect to a plurality of processing nodes,
Means for obtaining information on the amount of communication between each processing load;
Based on the information acquired by the means for acquiring, a pair of processing nodes to be subject to change of arrangement among the plurality of processing nodes is selected,
Predicting a change in the communication load amount between the pair of processing nodes when the processing load is moved between the selected pair of processing nodes,
A parallel computer system comprising: means for rearranging the plurality of processing loads based on a prediction result.
In the parallel computer system according to claim 7,
The means for performing the rearrangement is:
When making the prediction,
Furthermore, a parallel computer system characterized by also predicting a change in communication load between processing nodes other than between the pair of processing nodes.
In the parallel computer system according to claim 7,
The parallel computer system repeats a plurality of processes,
The means for performing the rearrangement is:
A parallel computer system, wherein the rearrangement is performed after a first process of the plurality of processes.
In the parallel computer system according to claim 9,
The parallel computer system, wherein the first process is a first process among the plurality of processes.
In the parallel computer system according to claim 7,
A parallel computer system characterized in that each processing load corresponds to each agent in the agent-based simulation.
A processing load allocation method for a parallel computer system that executes a plurality of processing loads distributed to a plurality of processing nodes.
Get information on traffic between each processing load,
With the change of the distributed arrangement,
A change in communication load between a first processing node of the plurality of processing nodes and a second processing node of the plurality of processing nodes;
Predicting a change in communication load between the second processing node and a third processing node of the plurality of processing nodes, and a change in communication load between the third processing node and the first processing node. Done
A processing load arrangement method, wherein the plurality of processing loads are rearranged based on the acquired information and a prediction result.
The processing load arrangement method according to claim 12,
When performing the relocation,
Based on the acquired information,
Select a pair of processing nodes whose arrangement is to be changed among the plurality of processing nodes,
A processing load placement method characterized in that the distributed placement is changed by moving a processing load between selected pairs of processing nodes.
The processing load arrangement method according to claim 12,
The parallel computer system repeats a plurality of processes,
The processing load arrangement method, wherein the rearrangement is performed after a first process of the plurality of processes.
The processing load arrangement method according to claim 14,
The processing load arrangement method, wherein the first process is a first process among the plurality of processes.