WO2014002412A1 - プログラム変換装置及び方法、処理切替方法、実行方式決定方法及びプログラム記憶媒体、プロセッサシステム並びに並列実行方法 - Google Patents
プログラム変換装置及び方法、処理切替方法、実行方式決定方法及びプログラム記憶媒体、プロセッサシステム並びに並列実行方法 Download PDFInfo
- Publication number
- WO2014002412A1 WO2014002412A1 PCT/JP2013/003684 JP2013003684W WO2014002412A1 WO 2014002412 A1 WO2014002412 A1 WO 2014002412A1 JP 2013003684 W JP2013003684 W JP 2013003684W WO 2014002412 A1 WO2014002412 A1 WO 2014002412A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- execution method
- program
- processor
- ratio
- execution
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/52—Binary to binary
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/509—Offload
Definitions
- the present invention relates to a program conversion apparatus and method, a process switching method and program, an execution method determination method and program, and a processor system that cause a sub processor to execute a part of a main processor program in a processor system including a main processor and a sub processor. It relates to a parallel execution method.
- Server systems equipped with a main processor and sub-processors such as General Purpose Graphic Processing Units (GPGPU) are in widespread use.
- Such a server system is used for high performance such as shortening the execution time of a program (hereinafter referred to as “latency”) when processing a single or a set of input data as a unit to be processed.
- latency the execution time of a program
- offload method In order to shorten the latency of the program in such a system, a method (hereinafter referred to as “offload method”) in which a sub processor executes one or more partial processes from the entire main processor program is used. May be.
- a program that is scheduled to be executed by the main processor is called a “main program”.
- main program a partial program to be executed by the sub-processor using the offload method is referred to as an “offload unit” or “offload program”.
- offloading Designating to offload a certain program, that is, designating an offload part is referred to as “offload designation”.
- the “off-loading method” is generally realized by the following three procedures.
- the offload section is executed by the sub processor.
- the latency of the offload unit must be shorter when executed by the sub processor than when executed by the host processor. .
- the specification of the range of the offload part in the main program is performed by the developer of the main program (hereinafter simply referred to as “program developer”).
- the program developer decides the offload unit in consideration of the latency reduction effect using the sub processor and the time required for data transfer in order to shorten the latency.
- the designation of the offload part in the main program is often performed by inserting a directive that indicates the range of the offload part and the data to be transferred.
- a directive that indicates the range of the offload part and the data to be transferred.
- it is necessary to analyze the data necessary for the sub processor to process the offload unit and the data written back to the host processor after the processing of the offload unit. Since such data analysis is generally difficult, it is difficult to freely specify an arbitrary range of the main program as an offload section.
- there are also processes that make it easy to analyze the transferred data such as an input process for receiving data for program execution and an output process for writing the execution result of the program.
- FIG. 19 shows an example of a parallel operation in which there is a problem that the processor resources are excessive and high throughput cannot be obtained.
- a host processor and an accelerator as a sub processor that assists the processing of the host processor are provided. Assume that the amount of processor resources is greater for accelerators than for host processors.
- Both the host processor and accelerator resources are required to execute the partial program specified as the offload section. That is, when the offload unit is executed for one input data, a certain amount of resources of the host processor and the accelerator are used. At this time, if there is a difference in the resource amount between the host processor and the accelerator, if a program that uses the same amount of resource for the host processor and the accelerator is executed for multiple input data simultaneously, the host processor resource will be used up first. It is. And the processor resources of the accelerator are left. As described above, there is a problem that the program cannot be executed with respect to the input data exceeding the number of input data being executed even though the accelerator resources are surplus.
- Patent Document 1 There is a method for determining which processor executes each loop in the input software (for example, see Patent Document 1).
- the data transfer time to the accelerator is measured, and a winning / losing table indicating the superiority or inferiority of the execution time of the test processor is created for each of the host processor and the accelerator. Then, a loop to be offloaded is determined based on the winning / losing table, and the input software is converted so that the loop is offloaded.
- JP 2011-204209 A JP-A-10-289116
- Patent Document 2 With the technique of Patent Document 2, a plurality of programs can be moved without monopolizing resources with a single program. However, in the technology of Patent Document 2, it is assumed that there are a plurality of different types of programs, and it is assumed that one common program is executed in parallel for each of a plurality of inputs. Absent. Therefore, in the technique of Patent Document 2, even if there are excess processor resources, it is not possible to execute one program in parallel by dividing the resources. (Object of invention) The present invention has been made in view of the technical problems as described above, and it is possible to maximize the use of processor resources included in the system and improve the processing capability, and a program conversion apparatus and method, and a process switching method. It is another object of the present invention to provide a program, an execution method determination method and program, a processor system, and a parallel execution method.
- the program conversion device uses, for a specific process, a usage ratio between a first usage amount of the first resource included in the first processor and a second usage amount of the second resource included in the second processor.
- Specific process determining means for determining the range of the partial program in the target program including the first execution method designating program that operates in combination with the first ratio, the partial program, and the usage ratio as the first ratio. Is provided with a process conversion means for converting into a second execution method designating program that operates in combination as a different second ratio.
- the usage ratio between the first usage amount of the first resource included in the first processor and the second usage amount of the second resource included in the second processor for the specific process.
- the target program including the first execution method designating program that operates in combination with the first ratio, and determines the second program in which the usage ratio is different from the first ratio.
- the program is converted into a second execution method designating program that operates in combination as a ratio.
- the process switching method of the present invention is based on the designation from the outside, and the first usage of the first resource included in the first processor and the second resource included in the second processor for a specific process.
- the first processing means according to the first execution method that operates using the usage ratio with the usage amount of 2 as the first ratio, or the usage ratio is used as the second ratio different from the first ratio.
- the second processing means according to the second execution method that operates is switched.
- the process switching program storage medium of the present invention provides the first processor with the first usage of the first resource included in the first processor and the second resource included in the second processor for the specific process.
- the first processing means according to the first execution method that operates using the usage ratio of 2 as the first ratio and the usage ratio as the second ratio different from the first ratio.
- the execution method determination method of the present invention uses the first usage amount of the first resource included in the first processor and the second usage amount of the second resource included in the second processor for a specific process. Select either the first execution method that operates by using the ratio as the first ratio, or the second execution method that operates by using the usage ratio as the second ratio different from the first ratio. The first execution method or the second execution method is set based on the selection result.
- the execution method determination program storage medium of the present invention provides the first processor with the first usage amount of the first resource included in the first processor and the second resource included in the second processor for specific processing.
- a first execution method that operates by using the usage ratio with the second usage amount as the first ratio, or a second operation that operates by using the usage ratio as the second ratio different from the first ratio.
- An execution method determining means for selecting one of the execution methods and a program for operating as an execution method setting means for setting the first execution method or the second execution method based on the selection result are stored.
- the processor system includes a first processor having a first resource and a second processor having a second resource, and the first processor has a specific information based on designation from the outside.
- the first processing means according to the first execution method that operates by using the usage ratio of the first usage amount of the first resource and the second usage amount of the second resource as the first ratio.
- the parallel execution method of the present invention uses the first processor and the first usage of the first resource of the first processor and the second processor for the specific processing based on the designation from the outside.
- the first processing means according to the first execution method that operates using the usage ratio of the second resource with the second usage amount as the first ratio, or the usage ratio is different from the first ratio.
- one type of program can be executed in combination with a plurality of execution methods that use different processor resources for each processor. Therefore, processor resources can be used up even when one kind of program is executed in parallel. Therefore, it is possible to sufficiently use the resources provided in the system and increase the processing capacity.
- the structure of the program which the 1st Embodiment of this invention makes object is shown.
- FIG. 1 shows the configuration of a computer system targeted by this embodiment.
- the computer system is a system in which a plurality of computers 100 are connected by a connection network 104.
- Each computer includes an arithmetic processing device 101, a storage device 102, and a communication device 103. All the computers 100 are connected by a connection network 104 and can communicate with each other.
- the structure, so-called architecture and processing performance of each arithmetic processing unit 101 may be the same or different.
- One or more computers 100 serve as host processors, and the other computers 100 serve as one or more accelerators.
- the number of accelerators is not limited to one. It is not necessary for the accelerators to have the same architecture or processing capability, and it is assumed that there are N types (N is an integer of 1 or more) of architectures or processing capabilities.
- Accelerator performs processing according to instructions received from the “host processor”. Therefore, “host processor” and “accelerator” correspond to the “main processor” and “sub processor” described above, respectively. In the present embodiment, description will be made using “host processor” and “accelerator” as names specifically indicating the role played by the computer.
- An example of this embodiment is a computer system in which a plurality of computers are connected by a bus or a network.
- the bus is a general serial or parallel bus used in, for example, a personal computer.
- the network is, for example, a wired or wireless LAN (Local Area Network).
- LAN Local Area Network
- FIG. 2 shows a configuration of a target program to be processed by this embodiment.
- the target program includes an input processing unit 201, an arithmetic processing unit 202, and an output processing unit 203.
- An example of the target program is a Web server program.
- the Web server program performs processing for each request from the client sent via the network, and returns the processing result to the client. At this time, for each request, an execution thread or instance is generated by using a standard function of the OS (Operating System) by the Web server program. A part of the processing of such a Web server program is specified in the Web server program so as to be processed by an accelerator in the server.
- the Web program is an example of a target program, and the target program is not limited to this.
- one or more offload units 204 are designated as partial programs executed by the accelerator.
- the offload unit 204 is a range surrounded by the offload start instruction 205 and the offload end instruction 206 in the target program.
- M M is an integer of 1 or more
- L types L is an integer of 1 or more
- execution methods in which the amount of processor resources used by each processor by the target program is different on the computer system of FIG. Examples of these multiple execution methods include the following methods.
- Execution method for offloading an arbitrary part in the target program Execution method for executing the post-conversion program generated by using predetermined conversion software so that an arbitrary part in the target program can be offloaded It is.
- FIG. 3 shows the configuration of the program execution system of this embodiment.
- the program execution control system includes a computer system 300 including a conversion device 310, a host processor 320, and an accelerator 330.
- Each of the host processor 320 and the accelerator 330 may be a single arithmetic device or a part of an integrated computer.
- the number of accelerators 330 should just be one or more, and the number is not limited. If there are a plurality of accelerators 330, their architectures need not be the same.
- the conversion device 310 includes a specific process determination unit 311 and a process conversion unit 312.
- the specific process determining unit 303 determines a portion (hereinafter referred to as “specific processing unit”) in the target program 340 that performs a specified predetermined process.
- specific processing unit a portion in the target program 340 that performs a specified predetermined process.
- M specific processing units are designated in the target program 340.
- the process conversion means 312 converts the M specific processing units of the target program 340 into a post-conversion program 341 so that it can be executed by a plurality of execution methods. Therefore, the post-conversion program 341 includes an execution method designation program 342 corresponding to each of the L types of execution methods and a process switching unit 343 inside.
- the process switching means 309 switches the execution means to be applied to the execution method designation program 310 among the L types of execution methods.
- the host processor 320 has an execution method determination unit 321 and an execution method setting unit 322.
- the accelerator 330 receives from the host processor 320 the accelerator designating program 344 corresponding to the execution system including the execution by the accelerator among the execution system designating programs 342.
- the accelerator 330 does not need to receive the execution method designation program 342 corresponding to the above-described “execution method that operates only by the host processor”. (Operation of the first embodiment)
- Each of these means operates as follows.
- the specific process determination unit 311 inputs the target program 340, searches the target program 340 for a range surrounded by the specific process, and checks an offloadable range in the target program 340.
- the specific process determination unit 311 notifies the process conversion unit 312 of the positions of the plurality of specific processes of the target program 340.
- the process conversion unit 312 inputs a plurality of specific process positions notified from the specific process determination unit 311, and sets a range surrounded by the specific process positions as an offloadable range. Then, the process conversion unit 312 creates an execution method designation program 342 for both the execution method and the non-offload execution method for each offloadable range. Each execution method designation program 342 differs in the amount of processor resources used for the host processor 320 or the accelerator 330. Furthermore, the process conversion unit 312 creates a post-conversion program 341 having a process switching unit 343 that switches both programs.
- the host processor execution method determination means 321 inputs the converted program 341 and selects a program execution method for one input data so that the processor resources of each processor can be used up.
- the execution method setting means 322 notifies the execution method designated by the execution method determination means 321 to the process switching means 343 of the post-conversion program 341 using a communication means (not shown).
- the communication means is a function provided in the computer system 300 such as an OS that manages the operation of the host processor.
- the target program 340 is a program that includes an offload section designated as offload in the target program 340.
- the target program 340 is input to the specific process determination unit 333 and converted into the post-conversion program 341 by the process conversion unit 312.
- the post-conversion program 341 is executed by any execution method for each input data. That is, the post-conversion program 341 operates by switching the execution method by the process switching unit 343 according to the execution method designated by the execution method setting unit 322.
- FIG. 4 is a flowchart showing the operation of the parallel processing control system according to the embodiment of the present invention.
- the specific process determination unit 311 of the conversion apparatus 310 searches for a specific processing unit from the target program 340 in order to examine the range in the target program 340 that can be offloaded.
- the specific process determination unit 311 notifies the process conversion unit 312 of the positions of the plurality of specific processing units (step S401).
- the processing conversion unit 312 of the conversion device 310 recognizes the offloadable range from the positions of the plurality of specific processing units notified from the specific processing determination unit 311. Then, the processing conversion unit 312 creates both a partial program that is offloaded and a partial program that is not offloaded for each offloadable range. That is, a plurality of execution method designating programs having different processor resource usage are prepared. Furthermore, the process conversion unit 312 creates a post-conversion program 341 having a process switching unit 343 that switches between an execution method designation program that performs offloading and an execution method designation program that does not perform offload (step S402).
- the execution method determination means 321 of the host processor 320 determines how many of the input data are to be processed in each execution method. Then, the execution method determining unit 321 selects an execution method of the post-conversion program 341 that processes one input data, using up the processor resources of each processor (step S403).
- the execution method setting means 322 of the host processor 320 notifies the execution method designated by the execution method determination means 321 to the process switching means 343 of the post-conversion program 341 using the communication means (step S404).
- the post-conversion program 341 having a plurality of execution methods in which the amount of processor resources used by each processor is different can be generated by the above flow of the present embodiment. Furthermore, a program execution means for switching the execution method for each input data can be realized. Therefore, unused resources can be reduced as much as possible by selecting an execution method that uses unused processor resources.
- FIG. 5 shows an example in which more input data can be processed using unused processor resources according to this embodiment.
- the range of specific locations where the target program can be offloaded can be set in various ways.
- an example of setting an offloadable range is shown. Specifically, among the above-described execution methods, 2) an execution method for offloading in the range of input processing to output processing is set as the second embodiment, and 3) only a part of the offload designation unit is offloaded. An execution method will be described as a third embodiment.
- the specific process determination unit 311 determines that the “range from the input process to the output process” of the target program 340 is a specific process and treats it as a range that can be offloaded.
- an execution method in which the processor resource usage of each processor by the converted program 341 on the first parallel processing control system is different is (N + 2).
- the breakdown is one type of the original execution method of the target program 340 that uses the host processor 320 and the N accelerators 330 in a predetermined procedure, and the execution method in which the host processor 320 executes the range from input processing to output processing.
- step S401 of the first embodiment the specific process determination unit 303 searches for a range surrounded by the output process from the input process of the target program 340 and notifies the process conversion unit 304 of the range in order to check the offloadable range in the target program 340. To do.
- the specific process determination unit 311 determines that the “offload designated range” of the target program 340 is a specific processing unit and treats it as a range that can be offloaded. That is, a range for which offload is already designated is treated as a range that can be offloaded.
- the target program 340 includes M offload designation portions, there are (2 ⁇ M) types of execution methods of programs with different amounts of processor resources used by each processor. This is because there are combinations of execution formats ranging from when all M offload designation portions are offloaded to when no offload is performed.
- An example of a plurality of execution methods set in this embodiment is shown in FIG.
- step S401 of the first embodiment the specific process determining unit 311 checks a range in the target program 340 where the offload is designated, and a set of a position where the offload start of the target program 340 is designated and a position where the end of offload is designated. And the processing conversion means 312 is notified of these positions.
- the third embodiment it is possible to disregard part of the M offload designation portions of the target program 340 and not to offload.
- the amount of processor resource used differs between when an offload designation part is actually offloaded and when it is not offloaded. Accordingly, by adjusting the number of offloading portions of the M offload designation portions, it is possible to prepare a plurality of execution methods having different processor resource usage. Therefore, it is possible to improve the throughput of the system by selecting an appropriate resource usage execution method and reducing unused resources as much as possible.
- the original offload execution format of the target program 340 includes an execution method in which only one of the host and N accelerators performs the offload processing, and only a part of the M offload designation portions. You can combine offload execution methods.
- the number of execution methods increases by combination, and unused processor resources can be used more flexibly and throughput can be further improved.
- the number of input data handled for each execution method (hereinafter referred to as “the number of distributions”) can be changed in order to improve the throughput.
- the number of distributions can be changed in order to improve the throughput.
- the fourth to sixth embodiments an example of a distribution number determination method will be described.
- the number of distributions to be simultaneously processed in each execution method is determined so that the processor resources of each processor are used up. Then, the execution method of each input data is determined based on the distribution number.
- Examples of the method for determining the number of distributions include designation by a user, calculation using a predetermined algorithm from system specifications and program profile results, and the like.
- the “profile” here is called a runtime profile, and is various information related to program execution. This information includes the required execution time and the amount of processor resources for a partial process of the program.
- the method for determining the distribution number is not particularly limited.
- step S403 of the first embodiment specifically, the following processing is performed. That is, the execution method determination unit 321 determines the number of distribution of each execution method in descending order of priority so that the processor resources of each processor are used up with high throughput. Then, the execution method determining means 321 determines the program execution method for one input data (step S403).
- FIG. 8 is a flowchart showing an operation when the execution method determination unit 321 determines the execution method for one input data after determining the number of execution method distributions.
- the execution method determination unit 321 acquires the number of processing data (hereinafter referred to as “the number of simultaneously processed data”) that is being processed simultaneously for each execution method (step S801).
- the execution method determination unit 321 determines an execution method in which the number of simultaneously processed data is less than the number of distributions in order from the execution method with the highest priority (step S802).
- the execution method determination means 321 determines the number of distributions for each execution method in order to obtain high throughput, and determines the execution method from the determined number of distributions and the number of simultaneously processed data for each execution method being executed. Then, the execution method setting means 322 is notified of the determined execution method.
- the number of distributions is determined before executing the program. Therefore, it is not necessary to use processor resources to determine the number of distributions during program execution.
- the process of determining the number of distributions is not included in the target program and affects the actual operation throughput. That is, the process of determining the number of distributions is an overhead. Therefore, the number of distributions is determined before program execution. Therefore, the program can be executed in an ideal state without causing the number of data handled by the execution method with high priority to be less than ideal or the state where the number of data handled by the execution method having low priority is larger than ideal. Therefore, there is an effect that an optimum throughput can be obtained when a program is executed using a combination of a plurality of execution methods.
- the number of distributions may be specified by the user in the post-conversion program so that the throughput is the highest.
- the allocation number for each execution method is determined from the parameters of the program and the processor.
- FIG. 9 is a flowchart showing an operation when the execution method determining unit 311 determines the number of distributions.
- the execution method determination means 311 calculates the priority of each execution method and the processor usage rate of each execution method for one input data, and initializes the unused rate of each processor (step S901).
- the priority of each execution method there is a method of assigning a higher priority to the latency of each execution method, that is, in the order of short execution time of a program when processing a single input data.
- the processor usage rate of each execution method with respect to one input data the ratio of execution time in each processor per unit time can be mentioned.
- a method in which the unused rate of each processor is initially set at 100 percent can be mentioned.
- the execution method determination unit 311 determines whether the execution method is determined to be executable when there is an available processor from the unused rate of each processor and the processor usage rate of each execution method for one input data.
- the execution method with the highest priority is selected (step S902).
- the execution method determination unit 301 determines the number of input data handled simultaneously by the selected execution method (step S903).
- the selected execution method is the smallest of the results obtained by dividing the unused rate for each processor by the processor usage rate of the execution method for one input data.
- the execution method determination unit 311 updates the unused rate of each processor (step S904). That is, the execution method determination unit 311 calculates the processor usage rate when the number of input data determined in step S903 is handled in the selected execution method. The execution method determining unit 311 sets the result of subtracting the calculated processor usage rate from the unused processor rate as the updated unused processor rate.
- the execution method determination unit 311 determines whether the unused rate of all processors is zero (step S905). If the unused ratios of all the processors are zero, the distribution number determination algorithm ends, and if not zero, the process proceeds to step S902.
- the distribution number is automatically calculated from the processor parameters and the target program parameters.
- the distribution number calculation process can be performed before executing the program. Therefore, it is not necessary to use processor resources to determine the number of distributions during program execution. Therefore, as in the fourth embodiment, there is an effect that an optimum throughput can be obtained when a program is executed using a combination of a plurality of execution methods.
- a provisional distribution number is determined and an execution method is selected from the distribution number.
- FIG. 10 is a block diagram showing the configuration of the execution method determining means 1001 of this embodiment.
- the execution method determination unit 1001 includes a performance measurement unit 1002, an allocation number determination unit 1003, and an execution method selection unit 1004. As information exchanged between each means, there are a performance measurement result 1011, execution method priority information 1012, and a distribution number 1013 which is the number of input data handled for each execution method.
- FIG. 11 is a flowchart showing the operation of the execution method determining means 1001 of this embodiment.
- FIG. 12 is an example of a performance measurement result management table used in the operation when three types of execution methods are prepared.
- the allocation number determining means 1003 determines an initial value of the allocation number (step S1101).
- the execution method selection unit 1004 selects an execution method for the input from the determined current distribution number. Then, the execution of the post-conversion program 341 is started.
- the allocation number determining unit 1003 selects an execution method with an uncertain distribution number from the priority information of the execution method (step S1102).
- the distribution number determination unit 1003 increases the distribution number of the selected execution method (step S1103).
- the execution method selection means 1004 selects an execution method for a certain input from the increased current distribution number. Then, one of the three types of execution method designation programs 342 included in the post-conversion program 341 is executed.
- the performance measuring unit 2301 measures the throughput performance per unit time, and adds the result to the management table configured by the combination of the number of distributions of each execution method and the throughput performance (step S1104).
- the distribution number determination unit 1003 determines whether or not to determine the distribution number (step S1105).
- the condition for determining the allocation number (hereinafter referred to as “allocation number determination condition”) is “the performance measurement result when the target execution method is increased is the performance measurement result before the increase”. It has become worse ”.
- the allocation number determination condition may be when the performance measurement result is increased or decreased. Therefore, the allocation number determination condition is not limited to the above condition.
- step S1106 If the determination of the allocation number is not determined, the process proceeds to step S1103.
- the allocation number determining unit 1003 determines the allocation number of the selected execution method (step S1106).
- the distribution number when the performance measurement result before the distribution number is increased is determined as the distribution number of the selection execution method.
- the allocation number determining means 1003 determines the end of the flow for dynamically determining the allocation number (step S1107).
- the flow is terminated when the condition for determining the number of distributions of all execution methods is satisfied, and the process proceeds to step S1102 when the condition is not satisfied.
- the above-described operation of the distribution number determining means 1003 determines the distribution number of each execution method, and the program can be executed according to the distribution number.
- the allocation number determining means 1003 determines an initial value of the allocation number (step S1101). In this embodiment, “15” is set for the execution method 1, “0” for the execution method 2, and “0” for the execution method 3 as the initial value of the distribution number.
- the execution method selection unit 1004 selects an execution method for a certain input from the set distribution number, and the execution of the application is started.
- the allocation number determining unit 1003 selects an execution method with an uncertain distribution number from the priority information of the execution method (step S1102).
- the execution method 1 is selected in order to determine the number of allocations of the execution method 1 having the highest priority.
- the distribution number determination unit 1003 increases the distribution number of the selected execution method 1 (step S1103). In this embodiment, each distribution number is increased by one.
- the execution method selection means 1004 selects an execution method for a certain input from the increased current distribution number. Then, the application is executed.
- the performance measuring unit 1001 measures the throughput performance per unit time and adds the result to the management table (step S1104).
- the throughput performance to be measured is defined as the number of data output per second.
- the measured performance is not limited to the throughput.
- “150” is shown as the performance value when the distribution numbers of the execution methods 1, 2, and 3 are executed as “15”, “0”, and “0”, respectively. be written.
- the distribution number determination unit 1003 determines whether or not to determine the distribution number (step S1105).
- the distribution number determination condition is that the performance measurement result when the target execution method is increased is worse than the performance measurement result before the increase.
- step S1106 When the confirmation of the number of distributions is determined, the process proceeds to step S1106, and when the determination of the number of distributions is not determined, the process proceeds to step S1103.
- the performance result falls from “200” to “190” between the result number 5 and the result number 6 as in the management table of FIG. Therefore, when the result number 6 is processed, the process proceeds to step S1106.
- step S1106 the allocation number determining unit 1003 determines the allocation number of the selected execution method.
- the distribution number when the performance measurement result before increasing the distribution number is obtained is determined as the distribution number of the selection execution method.
- the allocation number determination unit 1003 determines that the allocation number of the execution method 1 is 20 from the management table of FIG.
- the allocation number determining means 1003 determines the end of the flow for dynamically determining the allocation number (step S1107).
- the flow is terminated when the condition for determining the number of distributions of all execution methods is satisfied, and the process proceeds to step S1102 when the condition is not satisfied.
- the number of allocations for obtaining a high throughput at the time of execution is searched. Accordingly, the number of distributions can be determined without the parameters of the processor and the target program. (Seventh embodiment)
- a seventh embodiment an example of a specific computer system is shown.
- FIG. 13 shows a block diagram of a computer system composed of a multi-core host processor and a meni-core accelerator.
- the computer system of this embodiment includes a host processor 1301, a host memory 1302, a bus controller 1303, and an accelerator 1304.
- the accelerator 1304 includes a menicore 1305 and a memory 1306 inside.
- the host processor 1301 and the menicore 1305 have different built-in CPU (Central Processing Unit) architecture and number. Because the structure and number of arithmetic units are different, the host processor 1301 and the menicore 1305 have relatively good processing and poor processing.
- CPU Central Processing Unit
- the number of CPUs of the host processor 1301 and the menicore 1305 is 8 for the host processor 1301 and 40 for the menicore 1305.
- the CPUs of the host processor 1301 and the menicore 1305 have different performances when executing scalar operations and vector operations. For example, regarding the scalar operation, if the performance of one CPU of the host processor 1301 is 1, the performance of one CPU of the menicore 1305 is 0.25, and the performance of the CPU of the host processor 1301 is higher.
- the vector operation if the performance of one CPU of the host processor 1301 is 1, the performance of one CPU of the menicore 1305 is 2, and the performance of the CPU of the menicore 1305 is higher.
- the host processor 1301 and the accelerator 1304 are connected by a bus controller 1303.
- the OS for the host and the OS for the accelerator operate independently.
- Communication between the host processor 1301 and the accelerator 1304 is realized by a general technique such as a socket.
- the host processor 1301 has a function of instructing the accelerator 1304 to start processing.
- Each of the host processor 1301 and the accelerator 1304 has a unique IP (Internet Protocol) address.
- FIG. 14 shows a configuration diagram of an image processing program including offload designation for one input in the present embodiment.
- the target program 340 includes an input processing API (Application Program Interface) 1403, an arithmetic processing A1404, an offload start API 1405, an arithmetic processing B1406, an offload end API 1407, and an output processing API 1408.
- API Application Program Interface
- the image data is stored in the input queue 1401 until the execution of the target program 340 is started.
- the configuration file 1402 specifies program execution parameters such as command line arguments.
- the target program 340 reads necessary information from the input queue 1401 and the setting file 1402 and outputs the processing result to the output queue 1409.
- the operation of each component included in the target program 340 will be described.
- the input processing API 1403 sets information for executing the program by acquiring execution parameters from the setting file 1402 and image data from the input queue 1401.
- the mechanism by which the input processing API 1403 acquires data from the input queue is as follows. That is, the management thread of the input queue 1401 on the host processor 320 transmits the data in the input queue 1401 to the program using a socket. Then, the input processing API 1403 executed by the program thread receives data at the socket.
- the input processing API 1403 is defined to receive command line arguments and set data for executing the program. Therefore, only the command line arguments are necessary for execution.
- the arithmetic processing A1404 is a scalar arithmetic processing that constitutes a part of the target program 340, and performs processing using the information set by the input processing API 1403 as input.
- the offload start API 1405 specifies an offload start position from the host processor 320 to the accelerator 330 and data to be transferred from the host processor 320 to the accelerator 330.
- Arithmetic processing B 1406 is a vector arithmetic processing that constitutes a part of the target program 340, and performs processing using the intermediate result obtained from the arithmetic processing A1404.
- the offload end API 1407 designates the offload end position from the host processor 320 to the accelerator 330 and the data transferred from the accelerator 320 to the host processor 330.
- the output processing API 1408 adds the result obtained by processing up to the arithmetic processing B 1406 to the output queue 1409.
- the mechanism by which the output processing API 1408 adds data to the output queue 1409 is as follows. That is, the output processing API 1408, which is a program thread, transmits data through the socket. Then, a thread running on the host receives data on the socket and adds the data to the output queue 1409.
- the output processing API 1408 is specified to receive only the result data as an argument and write it to the output queue. Therefore, there is no data generated by the output processing API 1408 itself.
- the ratio of the execution time of each process to the execution time of the entire program is so small that it can be ignored for the input process API and the output process API, 40% for the arithmetic process A1404 and 60% for the arithmetic process B1406.
- Both arithmetic processing A and arithmetic processing B can be executed by either the host processor 320 or the accelerator 330.
- both a compiler for the host processor 320 and a compiler for the accelerator 330 are prepared. Therefore, it is possible to generate two types of executable files from different processors that execute the program.
- the input processing API 1403 realizes data acquisition from the input queue 1401 with a socket. Therefore, it can be executed by either the host processor 320 or the accelerator 330.
- the output processing API 1408 realizes data output to the output queue 1409 with a socket. Therefore, it can be executed by either the host processor 320 or the accelerator 330.
- an execution method 2 having a different assignment from the original execution method 1 in which the assignment of processors for executing a program is designated is prepared.
- Different execution methods have different resource usage for each processor. Therefore, surplus resources can be reduced as much as possible by using a plurality of execution methods in combination with a plurality of input data. Therefore, the throughput of the entire system can be improved.
- execution method combination system a system that executes a combination of different execution methods.
- the host processor 320 and the accelerator 330 are simply referred to as “host” and “accelerator”.
- FIG. 15 shows an example of a plurality of program execution methods that can be realized by the execution method combination system.
- Executiution method 1 is an execution method of the original program with offload designation, and a part of the program is offloaded from the host to the accelerator. In execution method 1, processing is performed according to the following flow.
- the input API and arithmetic processing A are executed on the host side. Then, the host executes an offload start API and transfers data from the host to the accelerator.
- arithmetic processing B is executed on the accelerator side. Then, the accelerator executes an offload end API, and transfers data from the accelerator to the host.
- Execution method 1 uses both host processor resources and accelerator processor resources for one piece of input data.
- “one input data” means a single or a set of input data as a unit to be processed.
- Example method 2 is an execution method in which all the programs are executed on the host side.
- the host side executes all of the execution of the input API, the arithmetic processing A arithmetic processing B, and the output API.
- Execution method 2 is realized by ignoring the offload specification part in the program by a general technique such as a compiler. In execution method 2, only the processor resource of the host is used for one input data.
- Executiution method 3 is an execution method in which all processing by the program is executed on the accelerator side. However, the host processor needs to execute an instruction to start processing to the accelerator. In the execution method 3, the execution of the input API, the arithmetic processing A, the arithmetic processing B, and the output API is all executed on the accelerator side.
- Execution method 3 is realized by ignoring the offload designation unit by a general technique such as a compiler, and converting the input processing API to the output processing API into an execution method designation program 342 by the conversion device 310 as an offload unit. . It is known that the data transfer information necessary for offloading from the input processing API to the output processing API is only a command line argument because a prescribed API is used. Execution method 3 uses only accelerator processor resources for one input. This can be realized even when there are multiple types of accelerators by changing the designation of the offload destination accelerator.
- FIG. 16 shows a configuration example of various programs 1600 (hereinafter referred to as “program group”) used in the execution method combination system of this embodiment, and the relationship between the programs included in the program group 1600.
- program group various programs 1600 (hereinafter referred to as “program group”) used in the execution method combination system of this embodiment, and the relationship between the programs included in the program group 1600.
- the six programs included in the program group 1600 are classified into the following four.
- Program input as processing target The target program 340 is a target program to be processed by this execution method combination system.
- the target program 340 is a program that is assumed to be offloaded in the execution method 1 by the host processor 320.
- the target program 340 is input to the conversion device 310 and converted into a post-conversion program 341.
- Program executed by the conversion device 310 The conversion program 1606 is executed by the conversion device 310 to convert the target program 340 into the converted program 341.
- the conversion program 1606 includes a specific process determination unit 311 and a process conversion unit 312.
- the specific process determination unit 311 and the process conversion unit 312 are software functions for converting a program.
- the conversion apparatus 310 executes the conversion program 1606, adds programs of three types of execution methods with different presence / absence or form of offload to the target program 340, and generates a converted program 341. That is, the conversion apparatus 310 generates a post-conversion program 341 including an execution method designation program 342 executed by other execution methods 2 and 3 having different offload forms based on the target program 340.
- the execution method designation program 342 is executed by one or both of the host processor 320 and the accelerator 330.
- the post-conversion program 341 includes a process switching unit 343 that is realized by conditional branching therein. 3) Programs executed by the host processor 320 The post-conversion program 341, the multiple execution program 1601, and the execution method determination program 1607 are executed by the host processor 320 and control the entire offload of the execution method combination system.
- the execution method determination program 1607 includes an execution method determination unit 321 and an execution method setting unit 322.
- the execution method determination means 321 and the execution method setting means 322 are daemon thread functions.
- the multiple execution program 1601 includes an input data confirmation unit 1602, an execution method inquiry unit 1603, and a thread execution unit 1604.
- Thread execution means 1604 generates a thread using thread generation means (not shown) possessed by a general OS in order to execute the multiple execution program 1601.
- Information exchange between the multiple execution program 1601 and the execution method determination program 1607 is performed using communication means (not shown) of a general OS.
- the host processor 320 uses the multiple execution program 1601 and the execution method determination program 1607 to select an execution method for each input data. Then, the host processor 320 executes a program corresponding to the selected execution method among the execution method designation programs 342 included in the converted program 341. When the selected execution method is a method that uses the accelerator 330, the host processor 320 causes the accelerator to execute an accelerator designating program 344 that is an offload unit to the accelerator. 4) Program Executed by Accelerator 330 The accelerator designation program 344 is a program that is offloaded by the host processor 320 and executed by the accelerator 330.
- FIG. 17 shows how each means moves as a thread in the host processor 320. The operation of each thread will be described later.
- the operation of the execution method combination system 1600 is divided into two phases: (1) a phase in which a converted program is created from the target program and (2) a phase in which the converted program is executed while switching the execution method.
- (1) Phase for creating post-conversion program from original program The post-conversion program is created from the target program by the specific process determining means and the process conversion means.
- the specific process determination unit 311 searches the program source for the positions of the input process API and the output process API of the target program 340, and notifies the process conversion unit 312 of the positions of the input process API and the output process API (step S401).
- the process conversion unit 312 sets the range from the input process API to the output process API from the position of the input process API and the output process API notified from the specific process determination unit 311 as an offloadable range. Then, the processing conversion unit 312 generates the execution method designation program 342 executed by the above-described three types of execution methods by copying the program in the offloadable range, adding or deleting the offload start API, and the offload end API. To do.
- the execution method designation program 342 of the execution method 1 is a range from a process immediately after the offload start API included in the target program 340 to a process immediately before the offload end API, that is, an offload possible range. It is realized by deleting the program included in.
- the execution method designation program 342 of the execution method 2 is realized by using the program of the target program 340 as it is.
- the execution method designation program 342 of the execution method 2 is not actually offloaded, but is called an offload program without being distinguished from other execution methods.
- the offload start API and the offload end API included in the offloadable range are deleted.
- an offload start API is added immediately before the input processing API together with designation of data necessary for the input processing API.
- an offload end API is added immediately after the output processing API together with designation of data output from the output processing API. Execution method 3 is realized by the above processing.
- an execution method designation program 342 of the above three types of execution methods and a post-conversion program 341 having processing switching means 343 that switches the three types of execution methods by conditional branching are generated (step S402).
- the condition branching condition is notified from the execution method setting unit 321 using socket communication when the post-conversion program 341 is executed.
- (2) Phase in which the converted program is executed while switching the execution method The multiple execution program 1601 including the converted program 341 is executed while the execution method is switched by the execution method determining means 321 and the execution method setting means 322. .
- the thread of the multiple execution program 1601 and the execution method combination thread enter the execution state.
- the multiple execution program 1601 checks whether there is data in the input queue 1401. If there is no data in the input queue 1401, the multiple execution program waits until data enters the 1601 input queue 1401.
- the execution method inquiry unit 1603 makes an inquiry to the execution method determination unit 321 for the execution method combination thread in order to determine the execution method for the input data to be processed. Socket communication is used for this inquiry.
- the execution method determination means 321 determines the distribution number of each execution method from the program and system parameters using a distribution number determination algorithm.
- the execution method determination unit 321 selects an execution method in which the number of input data currently being executed for each execution method does not reach the determined number of distributions in the order of priority of the execution methods, and executes one input data. Determine the program execution method.
- the allocation number determination algorithm will be described later.
- the execution method setting unit 322 notifies the execution method inquiry unit 1603 of the execution method instructed by the execution method determination unit 321. Socket communication is used for this notification.
- the execution method inquiry unit 1603 sets the received execution unit as a variable.
- Thread execution means 1604 generates a thread to execute the post-conversion program 341 for one input data.
- the thread execution unit 1604 gives the variable set by the execution method inquiry unit 1603 as an argument when a thread is generated.
- the offload program is executed by the designated execution method from the variable indicating the execution method notified by the argument from the thread execution unit 1604 and the process switching unit 343 realized by the conditional branch. .
- the distribution of the number of data to be processed is determined for each execution method, and the execution method for each data is determined based on the distribution of the number of data.
- the flow of the allocation number determination algorithm is shown below.
- FIG. 18 is a flowchart showing the processing flow of this algorithm.
- Priority is set for all execution methods in ascending order of latency, and the unused rate of all processor resources is set to 100 percent (step S1801).
- priorities are set in ascending order of execution method latency.
- the latency of each execution method is a vector operation that occupies 40% of the entire program execution time and the latency of operation A, which is a scalar operation that occupies 40% of the entire program execution time.
- the latency of the arithmetic processing B is the sum of latencies. Note that for the scalar arithmetic processing, the ratio between the performance of the host processor 320 and the performance of the accelerator 330 is 4: 1. For vector operation processing, the ratio between the performance of the host processor 320 and the performance of the accelerator 330 is 1: 2.
- the latency of execution method 2 is 1.0 because all processing is performed by the host processor 320.
- the execution method is the execution method 1, the execution method 2, and the execution method 3 in descending order of priority.
- the execution method with the highest priority is selected from among the execution methods in which the unused rate of processor resources is zero, that is, among the execution methods that do not use a program in which all resources have been used (step S1802).
- the processor resource unused rate of both the host and the accelerator is 100%, all execution methods from execution method 1 to execution method 3 are targeted.
- the execution method with the highest priority is the execution method 1, so the execution method 1 is selected.
- step S1803 It is determined whether the selected execution method is an execution method that uses both the host processor resource and the accelerator processor resource (step S1803). If the execution method uses both, the process branches to step S1804. If the execution method uses only one, the process branches to step S1806. Since execution method 1 uses the processor resources of the host and the accelerator, the process proceeds to step S1804.
- the number of input data sufficient to use up processor resources is obtained (step S1804).
- the execution method 1 since the host and the accelerator are used, the number of input data (hereinafter referred to as “necessary data number”) enough to use up each processor resource is obtained. Is obtained from (number of input data that can be processed by one core per unit time) ⁇ number of cores.
- unit time is the processing time when the entire program is executed by the host.
- step S1805 The smaller one of the necessary data numbers for using up the unused processor resource amounts of the host and the accelerator is determined as the data number of the execution method (step S1805). From the result obtained in step S1804, since the smaller required number of data is 20, the execution method 1 determines to handle 20 input data.
- the processor resource unused amount of each computer is updated (step S1807).
- execution method 1 since 20 pieces of data are processed, all of the processor resources of the host are used, and thus the unused amount of processor resources of the host becomes zero. Since 133.33 data is required to use up the accelerator, the processor resource usage of the accelerator is 15% from 20 ⁇ 133.33. From the above, the processor resource unused amount is updated to 0% for the host and 85% for the accelerator.
- step S1802 step S1808
- step S1802 since the accelerator is the only computer whose processor resource surplus is not 0%, execution method 3 in which all processing is executed by the accelerator is selected.
- step S1803 since execution method 3 uses only an accelerator, the process proceeds to step S1806.
- the program used here, the number of data necessary to use up the processor resources of the accelerator is determined (step S1806).
- the number of data required to use up all the processor resources of the accelerator is 1.0 / 1.9 ⁇ 40 to 21.05. Since the surplus amount of accelerator processor resources is 85%, the number of data that can be handled by the surplus accelerator processor resources is determined to be 21.05 ⁇ 0.85 to 17.9 data. When the necessary number of data is determined, the process proceeds to step S1807.
- step S1807 the processor resource of the accelerator is zero because the processor resource of the accelerator is used up by processing 17.9 data in the execution method 3.
- the processor resource unused amount for each of the host and the accelerator becomes zero, and the execution method allocation number determination algorithm ends.
- the number of data to be processed is determined to be 20 for execution method 1, 0 for execution method 2, and 17.9 for execution method 3.
- 37.9 pieces of data can be processed by using all the processor resources.
- this embodiment it is possible to prepare a plurality of execution methods by converting a program as shown in FIG. 17 and switch the execution method for each input data. Furthermore, by determining the number of allocations, in the case of execution method 1, which is an execution method in normal offloading, only 20 data per unit time can be handled. In this embodiment, unused accelerator processor resources are allocated. Since it can be used up, 37.9 data per unit time can be handled. Thus, there is an effect that the throughput of the system can be greatly improved.
- directives such as pragmas indicating the offload start position and offload end position may be inserted into the program. Therefore, when such a directive is searched in the program and found, the position of the directive may be set as an offload start position or an offload end position.
- the case of a host processor and an accelerator is shown as an example of a plurality of processors.
- Each of the host processor and the accelerator includes a plurality of CPU cores.
- the host processor and accelerator are suitable for scalar operations and vector operations, respectively.
- the architecture, the number of built-in CPUs, and suitable applications of a plurality of processors in the present invention are not particularly limited.
- the processor according to the present invention only needs to satisfy the following conditions. 1)
- the number of processors is two or more.
- processors Since the present invention is directed to a system that offloads all or part of a program, a plurality of processors are required. 2) Each processor has resources that affect the performance of a given process.
- each processor includes a resource that changes the processing performance of the entire system depending on whether it is used, for example, CPU usage time, a plurality of CPU cores, and the like. 3)
- the architecture of each processor is arbitrary.
- each processor only needs to be capable of processing the offload unit, and the internal structure and processing method of the processor are arbitrary. 4) The number of built-in CPU cores in each processor is one or more.
- the unused amount of resources of each processor is considered.
- unused resources such as CPU usage time can be defined, so the number of built-in CPU cores is arbitrary.
- ROM Read Only Memory
- RAM Random Access Memory
- semiconductor memory device such as a flash memory, an optical disk, a magnetic disk, or a magneto-optical disk. May be.
- the present invention can be applied to a system in which services such as video monitoring, video conversion, image conversion, and financial processing are realized by a server with a sub processor.
- services such as video monitoring, video conversion, image conversion, and financial processing are realized by a server with a sub processor.
- an application that performs a predetermined process for each input and obtains a result and the offload unit is specified is realized in a system configured with clusters having different computer performances, The present invention is applicable.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
- Multi Processors (AREA)
Abstract
Description
(発明の目的)
本発明は上記のような技術的課題に鑑みて行われたもので、システムが備えるプロセッサのリソースを最大限に使用し、処理能力を向上させることができる、プログラム変換装置及び方法、処理切替方法及びプログラム、実行方式決定方法及びプログラム、プロセッサシステム並びに並列実行方法を提供することを目的とする。
次に、本発明の第1の実施形態について図面を参照して詳細に説明する。
オフロード指定を無視することによって、全処理をホストプロセッサのみで動作する実行方式である。
入力処理から出力処理までの範囲に限り、オフロード対象とする実行方式である。
M箇所のオフロード指定部の一部のみをオフロードし、その他のオフロード指定を無視する実行方式である。
対象プログラム中の任意の部分をオフロードできるように、所定の変換ソフトウェアを用いて生成された、変換後プログラムを実行するための実行方式である。
(第1の実施形態の動作)
これらの手段は、それぞれ、以下のように動作する。
(第2の実施形態)
第2の実施形態では、特定処理判定手段311は、対象プログラム340の「入力処理から出力処理までの範囲」を特定処理と判定し、オフロード可能な範囲として扱う。
(第3の実施形態)
第3の実施形態では、特定処理判定手段311は、対象プログラム340の「オフロード指定がされた範囲」を特定処理部と判定し、オフロード可能な範囲として扱う。すなわち、既にオフロード指定がされた範囲がオフロード可能な範囲と扱う。
(第4の実施形態)
第4の実施形態では、各プロセッサのプロセッサリソースを使い切るように、各実行方式で同時に処理する配分数が決定される。そして、配分数を元にして、各入力データの実行方式が決定される。
(第5の実施形態)
第5の実施形態では、プログラムとプロセッサのパラメータから実行方式ごとの配分数が決定される。実行方式決定手段311が配分数を決定するときの動作を示すフローチャートを図9に示す。
(第6の実施形態)
第6の実施形態では、仮の配分数を決定し、その配分数から実行方式を選択する。
(第7の実施形態)
次に、第7の実施形態として、具体的な計算機システムの例を示す。
1)処理対象として入力されるプログラム
対象プログラム340は、本実行方式併用システムによって処理される対象のプログラムである。対象プログラム340は、ホストプロセッサ320によって、実行方式1でのオフロードが想定されたプログラムである。対象プログラム340は、変換装置310に入力され、変換後プログラム341に変換される。
2)変換装置310によって実行されるプログラム
変換プログラム1606は、変換装置310によって実行され、対象プログラム340を変換後プログラム341に変換する。
3)ホストプロセッサ320によって実行されるプログラム
変換後プログラム341、複数実行プログラム1601、及び実行方式決定プログラム1607は、ホストプロセッサ320によって実行され、本実行方式併用システムのオフロードの全体を制御する。
4)アクセラレータ330によって実行されるプログラム
アクセラレータ指定プログラム344は、ホストプロセッサ320によってオフロードされ、アクセラレータ330によって実行されるプログラムである。
(1)元プログラムから変換後プログラムを作成するフェーズ
特定処理判定手段と処理変換手段によって、対象プログラムから変換後プログラムが作成される。
(2)実行方式を切り替えながら変換後プログラムを実行するフェーズ
変換後プログラム341を含む複数実行プログラム1601は、実行方式決定手段321と実行方式設定手段322によって、実行方式が切り替えられながら、実行される。
1)プロセッサの個数は2個以上である。
2)各プロセッサは、所定の処理についての性能に影響を与えるリソースを保有する。
3)各プロセッサのアーキテクチャは任意である。
4)各プロセッサの内蔵CPUコアの個数は1以上である。
Claims (14)
- 特定の処理について、第1のプロセッサが備える第1のリソースの第1の使用量と第2のプロセッサが備える第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式指定プログラムを含む対象プログラム中の、前記部分プログラムの範囲を判定する特定処理判定手段と、
前記部分プログラムを、前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式指定プログラムに変換し、変換後プログラムを生成する処理変換手段
を備えることを特徴とするプログラム変換装置。 - 前記変換後プログラムは、
前記第1の実行方式指定プログラムと、
前記第2の実行方式指定プログラムと、
外部からの指定に基づいて、前記第1の実行方式指定プログラム又は第2の実行方式指定プログラムを切り替える処理切替プログラムを含む
ことを特徴とする請求項1に記載のプログラム変換装置。 - 特定の処理について、第1のプロセッサが備える第1のリソースの第1の使用量と第2のプロセッサが備える第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式指定プログラムを含む対象プログラム中の、前記部分プログラムの範囲を判定し、
前記部分プログラムを、前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式指定プログラムに変換する
を備えることを特徴とするプログラム変換方法。 - 外部からの指定に基づいて、特定の処理について、前記第1のプロセッサが備える第1のリソースの第1の使用量と第2のプロセッサが備える第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式による第1の処理手段と、又は前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式による第2の処理手段を切り替える
ことを特徴とする処理切替方法。 - 第1のプロセッサに、
特定の処理について、前記第1のプロセッサが備える第1のリソースの第1の使用量と第2のプロセッサが備える第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式による第1の処理手段と、
前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式による第2の処理手段と、
外部からの指定に基づいて、前記第1の処理手段又は第2の処理手段を切り替える処理切替手段
として動作させるための実行方式決定プログラムを格納した非一時的な記憶媒体。 - 特定の処理について、第1のプロセッサが備える第1のリソースの第1の使用量と第2のプロセッサが備える第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式、又は前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式のいずれかを選択し、
前記選択結果に基づいて、前記第1の実行方式又は前記第2の実行方式を設定する
ことを特徴とする実行方式決定方法。 - 前記第1のリソースの未使用量及び前記第2のリソースの未使用量が少なくなるように、前記第1の実行方式又は前記第2の実行方式のいずれかを選択する
ことを特徴とする請求項6に記載の実行方式決定方法。 - 複数の入力データのうちの第1の配分数の入力データについて前記第1の実行方式を適用し、前記複数の入力データのうちの第2の配分数の入力データについて前記第2の実行方式を適用する
ことを特徴とする請求項6又は7に記載の実行方式決定方法。 - 前記第1の実行方式及び前記第2の実行方式の優先度に基づいて、前記第1の配分数及び前記第2の配分数を決定する
ことを特徴とする請求項8に記載の実行方式決定方法。 - 前記第1の配分数及び前記第2の配分数を変化させたときの性能を計測し、
前記性能に基づいて、前記第1の配分数及び前記第2の配分数を決定し、
前記決定された配分数に基づいて、前記第1の実行方式及び前記第2の実行方式のいずれかを選択する
ことを特徴とする請求項8又は9に記載の実行方式決定方法。 - 第1のプロセッサに、
特定の処理について、前記第1のプロセッサが備える第1のリソースの第1の使用量と第2のプロセッサが備える第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式、又は前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式のいずれかを選択する実行方式決定手段と、
前記選択結果に基づいて、前記第1の実行方式又は前記第2の実行方式を設定する実行方式設定手段
として動作させるためのプログラムを格納した非一時的な記憶媒体。 - 第1のリソースを具備する第1のプロセッサと、
第2のリソースを具備する第2のプロセッサを備え、
前記第1のプロセッサは、外部からの指定に基づいて、特定の処理について、前記第1のリソースの第1の使用量と前記第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式による第1の処理手段、又は前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式による第2の処理手段を切り替える処理切替手段、並びに前記第1の処理手段及び前記第2の処理手段のうちの前記第1のリソースを使用する第1の部分処理手段として動作する
ことを特徴とするプロセッサシステム。 - 前記第2のプロセッサは、前記第1の処理手段及び前記第2の処理手段のうちの前記第2のリソースを使用する第2の部分処理手段として動作する
ことを特徴とする請求項12に記載のプロセッサシステム。 - 第1のプロセッサを用いて、外部からの指定に基づいて、特定の処理について、前記第1のプロセッサの第1のリソースの第1の使用量と第2のプロセッサの第2のリソースの第2の使用量との使用比率を第1の比率として併用して動作する第1の実行方式による第1の処理手段、又は前記使用比率を前記第1の比率とは異なる第2の比率として併用して動作する第2の実行方式による第2の処理手段を切り替え、前記第1の処理手段及び前記第2の処理手段のうちの前記第1のリソースを使用する第1の部分処理手段として動作し、
前記第2のプロセッサを用いて、前記第1の処理手段及び前記第2の処理手段のうちの前記第2のリソースを使用する第2の部分処理手段として動作する
ことを特徴とする並列実行方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/411,256 US9483324B2 (en) | 2012-06-26 | 2013-06-12 | Program conversion device and method, process switching method, method of determining execution scheme and program storage medium therefor, processor system, and parallel execution scheme |
JP2014522403A JPWO2014002412A1 (ja) | 2012-06-26 | 2013-06-12 | プログラム変換装置及び方法、処理切替方法、実行方式決定方法及びプログラム記憶媒体、プロセッサシステム並びに並列実行方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012142901 | 2012-06-26 | ||
JP2012-142901 | 2012-06-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014002412A1 true WO2014002412A1 (ja) | 2014-01-03 |
Family
ID=49782612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/003684 WO2014002412A1 (ja) | 2012-06-26 | 2013-06-12 | プログラム変換装置及び方法、処理切替方法、実行方式決定方法及びプログラム記憶媒体、プロセッサシステム並びに並列実行方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9483324B2 (ja) |
JP (1) | JPWO2014002412A1 (ja) |
WO (1) | WO2014002412A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017119098A1 (ja) * | 2016-01-07 | 2017-07-13 | 株式会社日立製作所 | 計算機システム及び計算機の制御方法 |
WO2017135219A1 (ja) * | 2016-02-01 | 2017-08-10 | 日本電気株式会社 | 設計支援装置、設計支援方法、および設計支援プログラムを格納した記録媒体 |
JP2018132981A (ja) * | 2017-02-16 | 2018-08-23 | 日本電気株式会社 | アクセラレータを有する情報処理装置および情報処理方法 |
WO2019216127A1 (ja) * | 2018-05-09 | 2019-11-14 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
WO2020090142A1 (ja) * | 2018-10-30 | 2020-05-07 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
JPWO2020235087A1 (ja) * | 2019-05-23 | 2020-11-26 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6540072B2 (ja) * | 2015-02-16 | 2019-07-10 | 富士通株式会社 | 管理装置、情報処理システム及び管理プログラム |
US11063910B2 (en) * | 2017-07-31 | 2021-07-13 | Fastly, Inc. | Web application firewall for an online service |
US11281474B2 (en) * | 2020-03-31 | 2022-03-22 | International Business Machines Corporation | Partial computer processor core shutoff |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007316710A (ja) * | 2006-05-23 | 2007-12-06 | Nec Corp | マルチプロセッサシステム、ワークロード管理方法 |
JP2011204209A (ja) * | 2010-03-26 | 2011-10-13 | Toshiba Corp | ソフトウェア変換プログラム、および、計算機システム |
US20120154412A1 (en) * | 2010-12-20 | 2012-06-21 | International Business Machines Corporation | Run-time allocation of functions to a hardware accelerator |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6654780B1 (en) | 1997-03-28 | 2003-11-25 | International Business Machines Corporation | System of managing processor resources in a non-dedicated computer system |
US6535971B1 (en) * | 1998-11-24 | 2003-03-18 | Minolta Co., Ltd. | Data processing system having plurality of processors and executing series of processings in prescribed order |
JP3891936B2 (ja) * | 2001-02-28 | 2007-03-14 | 富士通株式会社 | 並列プロセス実行方法、及びマルチプロセッサ型コンピュータ |
US20030004673A1 (en) * | 2001-06-29 | 2003-01-02 | Thurman Robert W. | Routing with signal modifiers in a measurement system |
CN100489783C (zh) * | 2004-06-28 | 2009-05-20 | 李晓波 | 在单计算机上可在同一时刻执行多道程序的方法及系统 |
US7672236B1 (en) * | 2005-12-16 | 2010-03-02 | Nortel Networks Limited | Method and architecture for a scalable application and security switch using multi-level load balancing |
JP2010287213A (ja) * | 2009-05-11 | 2010-12-24 | Nec Corp | ファイル変換装置、ファイル変換方法およびファイル変換プログラム |
US9135048B2 (en) * | 2012-09-20 | 2015-09-15 | Amazon Technologies, Inc. | Automated profiling of resource usage |
-
2013
- 2013-06-12 JP JP2014522403A patent/JPWO2014002412A1/ja active Pending
- 2013-06-12 US US14/411,256 patent/US9483324B2/en active Active
- 2013-06-12 WO PCT/JP2013/003684 patent/WO2014002412A1/ja active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007316710A (ja) * | 2006-05-23 | 2007-12-06 | Nec Corp | マルチプロセッサシステム、ワークロード管理方法 |
JP2011204209A (ja) * | 2010-03-26 | 2011-10-13 | Toshiba Corp | ソフトウェア変換プログラム、および、計算機システム |
US20120154412A1 (en) * | 2010-12-20 | 2012-06-21 | International Business Machines Corporation | Run-time allocation of functions to a hardware accelerator |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017119098A1 (ja) * | 2016-01-07 | 2017-07-13 | 株式会社日立製作所 | 計算機システム及び計算機の制御方法 |
JPWO2017119098A1 (ja) * | 2016-01-07 | 2018-11-08 | 株式会社日立製作所 | 計算機システム及び計算機の制御方法 |
WO2017135219A1 (ja) * | 2016-02-01 | 2017-08-10 | 日本電気株式会社 | 設計支援装置、設計支援方法、および設計支援プログラムを格納した記録媒体 |
JPWO2017135219A1 (ja) * | 2016-02-01 | 2018-11-29 | 日本電気株式会社 | 設計支援装置、設計支援方法、および設計支援プログラム |
US10909021B2 (en) | 2016-02-01 | 2021-02-02 | Nec Corporation | Assistance device, design assistance method, and recording medium storing design assistance program |
JP2018132981A (ja) * | 2017-02-16 | 2018-08-23 | 日本電気株式会社 | アクセラレータを有する情報処理装置および情報処理方法 |
JPWO2019216127A1 (ja) * | 2018-05-09 | 2020-12-10 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
WO2019216127A1 (ja) * | 2018-05-09 | 2019-11-14 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
US11106439B2 (en) | 2018-05-09 | 2021-08-31 | Nippon Telegraph And Telephone Corporation | Offload server and offload program |
WO2020090142A1 (ja) * | 2018-10-30 | 2020-05-07 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
JPWO2020090142A1 (ja) * | 2018-10-30 | 2021-06-10 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
JP6992911B2 (ja) | 2018-10-30 | 2022-01-13 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
JPWO2020235087A1 (ja) * | 2019-05-23 | 2020-11-26 | ||
WO2020235087A1 (ja) * | 2019-05-23 | 2020-11-26 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
JP7184180B2 (ja) | 2019-05-23 | 2022-12-06 | 日本電信電話株式会社 | オフロードサーバおよびオフロードプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20150205643A1 (en) | 2015-07-23 |
US9483324B2 (en) | 2016-11-01 |
JPWO2014002412A1 (ja) | 2016-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014002412A1 (ja) | プログラム変換装置及び方法、処理切替方法、実行方式決定方法及びプログラム記憶媒体、プロセッサシステム並びに並列実行方法 | |
JP4082706B2 (ja) | マルチプロセッサシステム及びマルチグレイン並列化コンパイラ | |
JP5245722B2 (ja) | スケジューラ、プロセッサシステム、プログラム生成装置およびプログラム生成用プログラム | |
US8051412B2 (en) | Global compiler for controlling heterogeneous multiprocessor | |
EP2472398B1 (en) | Memory-aware scheduling for NUMA architectures | |
US20130212594A1 (en) | Method of optimizing performance of hierarchical multi-core processor and multi-core processor system for performing the method | |
US20150309842A1 (en) | Core Resource Allocation Method and Apparatus, and Many-Core System | |
US9471387B2 (en) | Scheduling in job execution | |
JPH0659906A (ja) | 並列計算機の実行制御方法 | |
US9405349B2 (en) | Multi-core apparatus and job scheduling method thereof | |
CN104536937A (zh) | 基于cpu-gpu异构集群的大数据一体机实现方法 | |
EP1365321A2 (en) | Multiprocessor system | |
JP2008152470A (ja) | データ処理システム及び半導体集積回路 | |
CN102662740A (zh) | 非对称多核系统及其实现方法 | |
JP2014078239A (ja) | マルチコアプロセッサで行われるプログラムのコンパイル方法、マルチコアプロセッサのタスクマッピング方法及びタスクスケジューリング方法 | |
JP2013117790A (ja) | 情報処理装置、情報処理方法、及びプログラム | |
US8775767B2 (en) | Method and system for allocating memory to a pipeline | |
JP2007188523A (ja) | タスク実行方法およびマルチプロセッサシステム | |
JP2007305148A (ja) | マルチプロセッサシステム | |
TW202107408A (zh) | 波槽管理之方法及裝置 | |
CN112685174A (zh) | 一种容器创建方法、装置、设备及介质 | |
JP2007102332A (ja) | 負荷分散システム及び負荷分散方法 | |
JP5983623B2 (ja) | タスク配置装置及びタスク配置方法 | |
WO2014188642A1 (ja) | スケジュールシステム、スケジュール方法、及び、記録媒体 | |
US20120137300A1 (en) | Information Processor and Information Processing Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13810818 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014522403 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14411256 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13810818 Country of ref document: EP Kind code of ref document: A1 |