US8407679B2 - Source code processing method, system and program - Google Patents
Source code processing method, system and program Download PDFInfo
- Publication number
- US8407679B2 US8407679B2 US12/603,598 US60359809A US8407679B2 US 8407679 B2 US8407679 B2 US 8407679B2 US 60359809 A US60359809 A US 60359809A US 8407679 B2 US8407679 B2 US 8407679B2
- Authority
- US
- United States
- Prior art keywords
- source code
- process block
- critical path
- processors
- block groups
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
Definitions
- the present invention relates to a method and system for reducing program execution time. More particularly, the present invention relates to a method and system for reducing program execution time in a multiprocessor system.
- Multiprocessor systems e.g. computing systems which include multiple processors
- computing systems which include multiple processors
- an application program generates multiple processes and assigns the processes to individual processors.
- the processors each perform their processing while communicating with each other by using, for example, a shared memory space.
- Simulation systems use software for simulation in mechatronics systems for robots, vehicles, and airplanes.
- the development of electronic component and software technology has enabled electronic control of machines such as robots, vehicles, or airplanes using a wireless LAN or wired connections spread over the machine.
- Hardware in the loop simulation is a technique that has been conventionally used for such tests.
- an environment for testing the electronic control units (ECUs) of an entire vehicle is called full-vehicle HILS.
- ECUs electronice control units
- full-vehicle HILS actual ECUs are connected to a hardware device for emulating an engine mechanism or a transmission mechanism, for example, in a laboratory. Tests are then carried out for predetermined scenarios. Outputs from the ECUs are inputted to a monitoring computer, and are then displayed. Thus, the test operator checks for abnormal operation while looking at the display.
- HILS HILS
- a special hardware device is required and physical wiring must be made between the special hardware device and actual ECUs.
- HILS involves much advance preparation.
- the wiring needs to be physically rearranged. This requires time and effort.
- actual ECUs are used, real-time testing is needed. Accordingly, when a test is performed for many scenarios, a large amount of time is required.
- hardware devices for HILS emulation are generally extremely expensive.
- SILS software in the loop simulation
- MATLAB®/Simulink® is a simulation modeling system available from Cybernet Systems Co., LTD.
- MATLAB®/Simulink® a simulation program can be created by arranging functional blocks A, B, C, . . . J on a display through a graphical interface, and then specifying process flows as shown by arrows in FIG. 1 .
- each function can be converted into a C source code describing an equivalent function by a function of Real-Time Workshop®.
- a simulation can be performed as a SILS in a different computer system.
- a computer system is a multiprocessor system can contribute much to an improvement of processing speed by dividing processing into as many processes as possible and assigning the divided process blocks to individual processors.
- CP scheduling techniques are known.
- the block diagram shown in FIG. 1 is converted into a task graph shown in FIG. 2 .
- the task graph shown in FIG. 2 consists of four vertical rows in which the process lines are assigned to four individual CPUs operating in parallel. With this configuration, the processing speed can be twice as fast as that of the case in which the processing is executed by a single CPU.
- the critical path in FIG. 2 is a path consisting of B-D-F-H-J. The processing time cannot be reduced to be shorter than the time required for the CPU to process this critical path.
- JP-06083608-A2 discloses a technique for detecting, by a critical path analysis, a bottleneck for program execution by a parallel computer.
- JP-07021240-A2 Japanese Unexamined Patent Application Publication No. Hei 7-21240 relates to layout design for a logical circuit, and discloses a system for shortening the critical path and minimizing the number of nets crossing a cut line.
- the above system includes: a critical path extraction unit for extracting a critical path; a cut line generation unit for generating a cut line; a merge pair selection unit for determining a block to be merged with each block on the basis of the connection degrees of the blocks and critical path information; a merging unit for merging each block with the block determined by the merge pair selection unit; and a pair-wise unit for changing pairs so that the number of nets crossing the cut line is minimized.
- JP-08180100-A2 discloses a technique for calculating an optimal solution at a high speed for a job shop scheduling problem including machine assignment. In this technique, an efficient neighborhood is generated and is then combined with an approximate solution.
- Japanese Unexamined Patent Application Publication Hei 6-83608 and Japanese Patent Application Publication Hei 8-180100 each disclose only an overview of task scheduling.
- Japanese Unexamined Patent Application Publication Hei 7-21240 describes a technique for shortening the critical path in the layout design of a logical circuit, but the critical path in question is one in a physical layout. Accordingly, the technique is not applicable to logical critical path processing by software.
- the object of the present invention can be achieved as follows. The critical path of a program to be reduced in execution time is cut appropriately so as to be divided into multiple processes. Then, the resulting processes are assigned to individual processors. Consequently, optimal codes for speculative execution in simulation can be outputted.
- a processing program of the present invention to be reduced in execution time loads the source code of the program which consists of multiple process blocks. Next, the processing program tests all possible cuts for a critical path, and then finds a cut which enables the processing time of the process blocks after the cut is determined to be the shortest.
- a phase in which the program is compiled and execution times and other values of the respective process blocks are measured is performed in advance.
- the values measured in this phase include measurement data such as: messaging cost when a process is set to be performed by multiple processors; processing time required for speculative execution; rollback cost when speculation fails; and the degree to which correct calculation of an input to each block is made, e.g., speculation success possibility.
- Critical path cutting is recursively applied to paths resulting from the cutting.
- the cutting is stopped before the overall processing time becomes long, instead of becoming short, with respect to inter-processor communication times.
- Multiple groups of process blocks are obtained in this manner. In this description, each process block group is called a block chunk.
- the individual chunks are compiled without any linking, and are then assigned to the processors in an execution environment.
- the processing program of the present invention attempts to link some block chunks so as to make the number of block chunks and the number of processors equal. In this case, linking which can minimize the maximum value of processing time of a critical path among the block chunks resulting from the linking is selected.
- the resulting block chunks are each compiled, and are then assigned to the processors in the execution environment.
- each of the block chunks is assigned to a single processor, resulting in optimal parallel processing.
- the present invention can enable in a multiprocessor environment, high-speed program execution which is improved in terms of both the length of the critical path and processor assignment.
- a source code processing method implemented by a computing apparatus to enable parallel execution of a divided source code in a multiprocessor system.
- the method includes the steps of: inputting an original source code by an input device into the computing apparatus; finding a critical path in the original source code by a critical path cut module; cutting the critical path in the original source code into a plurality of process block groups by the critical path cut module; and dividing the plurality of process block groups among a plurality of processors in the multiprocessor system by a CPU assignment code generation module to produce the divided source code, thereby enabling parallel execution of the divided source code in the multiprocessor system by the computing apparatus.
- a source code processing system to enabling parallel execution of a divided source code in a multiprocessor system.
- the system includes: an input device for inputting an original source code; a critical path cut module for finding a critical path in the original source code and for cutting the critical path in the original source code into a plurality of process block groups; and a CPU assignment code generation unit for dividing the plurality of process block groups among a plurality of processors in the multiprocessor system to produce the divided source code by using processing times of the process blocks; and wherein expected processing time of the divided source code is shorter than processing time of the original source code.
- FIG. 1 is a diagram showing an example of a block diagram created by using a simulation modeling tool.
- FIG. 2 is a diagram showing an example of a CP scheduling technique.
- FIG. 3 is a block diagram of hardware for implementing the present invention.
- FIG. 4 is a functional block diagram according to an embodiment of the present invention.
- FIG. 5 is a flowchart showing the flow of a process according to an embodiment of the present invention.
- FIG. 6 is a flowchart showing the critical path cutting processing.
- FIG. 7 is a flowchart showing the critical path cutting processing.
- FIG. 8 is a schematic diagram showing an example of the critical path cutting processing.
- FIG. 9 is a diagram showing expected execution time in a case including speculation.
- FIG. 10 is a schematic diagram showing an example of block chunk creating.
- FIG. 11 is a flowchart of CPU assignment code generation processing.
- FIG. 12 is a flowchart of the CPU assignment code generation processing.
- FIG. 13 is a schematic diagram showing an example of block chunk linking.
- FIG. 14 is a schematic diagram showing an example of block chunk linking.
- FIG. 15 is a diagram explaining dependency relationships between blocks.
- FIG. 3 Computer hardware used for implementing the present invention is described with reference to FIG. 3 .
- multiple CPUs CPU 1 304 a , CPU 2 304 b , CPU 3 304 c , . . . CPU n 304 n , are connected to a host bus 302 .
- a main memory 306 for arithmetic processing of the CPU 1 304 a , CPU 2 304 b , CPU 3 304 c , . . . CPU n 304 n is further connected.
- Input devices a keyboard 310 , a mouse 312 , a display 314 and a hard disk drive 316 are connected to an I/O bus 308 .
- the I/O bus 308 is connected to the host bus 302 through an I/O bridge 318 .
- the keyboard 310 and the mouse 312 are used by the operator for operations. For example, the operator inputs a command by using the keyboard 310 , or clicks on a menu by using the mouse 312 .
- the display 314 is used when needed to display a menu for operating a program according to an embodiment of the present invention through a GUI.
- IBM® System X is the preferable computer system to be used for the purpose of implementing the present invention.
- the CPU 1 304 a , CPU 2 304 b , CPU 3 304 c , . . . CPU n 304 n are each Intel® Xeon®, for example, and the operating system is WindowsTM Server 2003.
- the operating system is stored in the hard disk drive 316 , and is loaded into the main memory 306 from the hard disk drive 316 at the time of starting the computer system.
- the computer system hardware which can be used for implementing the present invention is not limited to IBM® System X, and any computer system capable of running a simulation program according to an embodiment of the present invention can be used.
- the operating system is not limited to Windows®. Another operating system such as Linux® or Mac OS® can be used.
- a computer system such as IBM® System P using AIXTM as the operating system, based on POWERTM 6, can be used.
- the hard disk drive 316 further stores MATLAB®/Simulink®, a C compiler or a C++ compiler, a module for cutting a critical path according to an embodiment of the present invention, a module for generating a code for CPU assignment, a module for measuring expected execution time for each process block. These are each loaded into and executed by the main memory 306 in response to a keyboard operation or a mouse operation by the operator.
- the usable simulation modeling tool is not limited to MATLAB®/Simulink®, and any simulation modeling tool such as an open-source Scilab/Scicos can be used, for example.
- source codes for the simulation system can be directly written in C or C++ without using any simulation modeling tool.
- the embodiment of the present invention is also applicable to such a case.
- FIG. 4 is a functional block diagram according to the embodiment of the present invention. Each block corresponds to a module stored in the hard disk drive 316 .
- a simulation modeling tool 402 may be any existing tool such as MATLAB®/Simulink® or Scilab/Scicos.
- the simulation modeling tool 402 has a function which enables the operator to arrange functional blocks on the display 314 through the GUI, to write required attributes such as expressions, and to describe a block diagram by associating the functional blocks with each other when necessary.
- the simulation modeling tool 402 also has the function of outputting C source codes each describing an equivalent function to one of the described block diagram.
- C++ or Fortran for example, can be used in place of C.
- the simulation modeling tool can be installed in a personal computer.
- the source code generated in the personal computer can be downloaded to the hard disk drive 316 through a network, for example.
- Source codes 404 thus outputted are stored in the hard disk drive 316 .
- the source codes 404 are compiled by a compiler module 406 , and a resulting executable program is transmitted to a test module 408 .
- the test module 408 has the function of carrying out an execution test and the function of carrying out a speculative test.
- an execution test average processing times of the respective blocks as shown in FIG. 1 , inter-processor communication times and speculation success probabilities are measured on the basis of a predetermined scenario. Preferably, a single scenario is executed multiple times in order to obtain an average time.
- Measurement results 410 are stored in the hard disk drive 316 for later use.
- the resulting executable program is speculatively executed on the basis of a different predetermined scenario.
- speculation preparation processing time that is, time required for a process for storing a speculative input value for a case in which speculation fails and rollback is required
- speculation success/failure checking processing time that is, time required for a process for determining, when receiving actual data, whether the data matches the speculative data
- rollback processing time that is, time required when speculation turns out to be a failure, when the speculative input and the actual value turn out to be different for post-processes including stopping the processing performed on the basis of the incorrect input and deleting the data, for example.
- rollback processing time that is, time required when speculation turns out to be a failure, when the speculative input and the actual value turn out to be different for post-processes including stopping the processing performed on the basis of the incorrect input and deleting the data, for example.
- Such values are also stored in the hard disk drive 316 as the measurement results 410 for the later
- the speculation success probability can be calculated without actually performing speculative execution.
- speculative execution since processing is performed before an input which is to be inputted is received, the processing is performed by predicting the input. Accordingly, the speculation success probability is equal to a success rate of prediction on input. This means that when an algorithm to be used for input prediction is determined, a speculation success probability of the speculation algorithm can be calculated only by using actual input data without actually performing speculative execution. This is done without performing block processing based on predicted input data.
- the speculation success probability can be obtained by simply recording an input to each block in an “execution test,” and calculating a prediction of success probability of the input prediction algorithm from the input data series.
- time required for performing speculative execution and time required when the speculative execution failed cannot be obtained without actually performing speculative execution. For this reason, a speculative test is carried out to obtain such information.
- the critical path cut module 412 has the function of processing the source codes 404 in blocks and finding and cutting a critical path, thereby finding a cut resulting in an optimal execution time. For this, information on the measurement results 410 is used. The module 412 further generates subdivided block chunks shown in FIG. 10 by recursively applying the critical path cut function. Block chunk information pieces 414 thus generated are stored in the hard disk drive 316 for later use.
- the critical path cut function will be described later in detail with reference to a flowchart.
- a CPU assignment code generation module 416 generates codes 418 a , 418 b , . . . 418 m to be assigned to the CPU 1 to CPU n, by using the block chunk information pieces 414 and the measurement results 410 .
- the block chunk codes are directly assigned to the CPU 1 to CPU n.
- the block chunks are linked as schematically shown in FIG. 14 so that the number of the block chunks and the number of the CPU 1 to CPU n becomes equal.
- the links are optimally selected so as to minimize expected execution time of the resulting critical path.
- the codes 418 a , 418 b , . . . 418 m to be assigned to the CPU 1 to CPU n, and dependency relationship information pieces 420 are generated.
- the dependency relationship information pieces 420 are needed for the following reason. Specifically, when an original process flow is divided by the critical path cut function as shown in FIG. 10 , original dependency relationships between the blocks are sometimes cut off. In order to compensate for the cut-off relationships, the module 416 provides the dependency relationship information pieces 420 indicating, for example, which code returns a variable used in which code among codes other than itself. In practice, the dependency relationship information pieces 420 are created by the critical path cut module 412 at the time of cutting, and the CPU assignment code generation module 416 consequently uses the dependency relationship information pieces 420 thus created.
- the codes 418 a , 418 b , . . . 418 m thus generated are individually compiled as executable programs by the compiler 424 , and are individually assigned to the CPU 1 to CPU n in an execution environment 424 so as to be executed in parallel by the corresponding CPU 1 to CPU n.
- the dependency relationship information pieces 420 are placed in a shared memory area of the main memory 306 so as to be commonly referred to by the CPU 1 to CPU n.
- the dependency relationship information pieces 420 are referred to by each of the CPU 1 to CPU n to obtain information pieces for codes performed by other CPUs as necessary.
- FIG. 5 shows a flow of the entire processing according to an embodiment of the present invention.
- the flow in FIG. 5 shows an operation procedure, the individual steps of the operation procedure do not necessarily correspond to those of a computer processing flow.
- Step 502 the developer or the operator creates a block diagram of a particular simulation target on a system shown in FIG. 3 or a different computer, by using the simulation modeling tool 402 such as MATLAB®/Simulink®.
- Step 504 the developer or the operator generates the source codes 404 corresponding to the created block diagram by using one of the functions of the simulation modeling tool 402 , and then stores the generated source codes 404 in the hard disk drive 316 .
- Step 506 the developer or the operator compiles the source codes 404 by using the compiler 406 .
- Resultant executable programs thus compiled are temporarily stored in the hard disk drive 316 , which is not shown in FIG. 5 .
- Step 508 the developer or the operator carries out an execution test in the test module 408 by using the compiled execution programs. Measurement data on average processing times of the blocks, inter-processor communication times and speculation success probabilities obtained through the execution test are stored in the hard disk drive 316 as were the measurement results 410 in Step 510 .
- Step 512 the developer or the operator carries out a speculative test in the test module 408 by using the compiled execution programs. Measurement data on speculation preparation processing time, speculation success/failure checking processing time and rollback processing time obtained through the speculative test are stored in the hard disk drive 316 as the measurement results in Step 514 .
- Step 516 the computer processing is started in response to an operation by the developer or the operator. Basically, the process from Step 516 to Step 524 computer apparatus processing automatically proceeds.
- the critical path cut module 412 performs processing on the source codes 404 .
- a critical path in the entire processing flow described by the source codes 404 is found by using an algorithm.
- the critical path is optimally cut in terms of processing time, and, in the processing flow after the cutting, processing for cutting the critical path is recursively performed.
- the measurement results 410 are used.
- Step 518 information pieces on the block chunks are stored in the hard disk drive 316 as the block chunks 414 .
- the information pieces on the block chunks can be stored in any data structure such as XML, as long as the structure is computer readable and is capable of describing source code contents, link relationships, and links.
- the CPU assignment code generation module 416 generates codes to be individually assigned to the CPU 1 to CPU n, by using the block chunk information pieces 414 .
- the number of block chunks is equal to or smaller than the number of CPU 1 to CPU n
- a single block chunk is assigned to each of the CPU 1 to CPU n.
- the block chunks are linked so that the number of the block chunks and the number of the CPU 1 to CPU n becomes equal so that execution time is minimized.
- the measurement results 410 are used.
- Step 522 the codes generated by the module 416 are compiled by the compiler 422 . Then, in Step 524 , the compiled programs are assigned to and then executed by the processors CPU 1 to CPU n.
- Step 602 of FIG. 6 processing for finding an optimal cut for the critical path is performed.
- FIG. 8 is referred to for the explanation of the optimal cut.
- FIG. 8 shows a process flow including blocks A to I.
- the path B-C-D-E-F is identified as the critical path by the algorithm for finding the critical path.
- the critical path cut module 412 sequentially tests possible cuts c 1 , c 2 , c 3 and c 4 along the path B-C-D-E-F. For example, testing the cut c 3 means that the critical path is cut at the cut c 3 and the cut-out flow is logically moved to the side.
- evaluating the cut c 3 means that, on the assumption that the speculation success probability is 100 percent, expected execution times of the proximate two flows are compared and the value T c of the longer of the execution times is evaluated. However, since a speculation success probability is generally lower than 100 percent, the value T c is evaluated by taking into account the speculation success probability.
- the cut with which the smallest value T c can be obtained is called the optimal cut. More detailed processing, e.g. a subroutine, for finding the optimal cut will be described later with reference to the flowchart in FIG. 7 .
- the expected execution times of the respective blocks are measured in advance in the execution test shown in Step 508 in FIG. 5 , and are then stored in the hard disk drive 316 as the measurement results 410 . It is to be noted that these measurement results 410 are used in the calculation of an expected execution time of the given flow.
- MSCxy message sending cost from a block X to a block Y when the block X and the block Y are cut apart.
- MRCxy message receiving cost from the block X to the block Y when the block X and the block Y are cut apart.
- SCxy speculation cost from the block X to the block Y
- RBCxy rollback cost when speculation from the block X to the block Y fails
- the costs of the blocks are also measured in advance in the execution test shown in Step 508 and the speculative test shown in Step 512 in FIG. 5 , and are then stored in the hard disk drive 316 as the measurement results 410 .
- T cs
- T cf
- a success probability p c of speculation is measured in advance in the execution test shown in Step 508 in FIG. 5 , and is then stored in the hard disk drive 316 as the measurement result 410 .
- the resulting expected execution time is calculated by using this measurement result 410 , as follows.
- T c p c T cs +(1 ⁇ p c ) T cf
- the critical path cut module 412 determines whether or not an optimal cut exists in Step 604 . Having an optimal cut means that the expected processing time overall is shortened as a result of the cutting. Cutting does not always result in shortening processing time. Specifically, in consideration of the above-described sending cost, receiving cost and speculation cost, cutting cannot shorten the processing time in some cases. In such cases, in Step 604 , the critical path cut module 412 determines that there is no optimal cut. Then, in Step 606 , block chunk information pieces which are currently under evaluation are preferably stored in the hard disk drive 316 .
- the critical path cut module 412 moves the cut-out block in Step 608 . This is shown, for example, as the processing in FIG. 8 .
- Step 610 the processing shown in the flowchart of FIG. 6 is recursively invoked for the entire set of paths resulting from the cutting. This will be explained by using the blocks shown in FIG. 8 .
- the blocks are first divided into the blocks A, B, C, D, E and F and the blocks G, H and I. Then, the processing shown in the flowchart of FIG. 6 is recursively invoked.
- Step 702 processing for finding a critical path is performed.
- processing for finding a critical path There are conventional methods of processing for finding a critical path in a process flow.
- a method based on Program Evaluation and Review Technique (PERT) can be used. For example, see the web page, http://www.kogures.com/hitoshi/webtext/or-pt-pert/index.html or http://en.wikipedia.org/wiki/Program_Evaluation_and_Review_Technique.
- PROT Program Evaluation and Review Technique
- Step 706 it is determined whether or not the set C is empty. If the determination is NO, the process advances to Step 708 , and each cut c is selected from the set C.
- Step 710 expected execution time resulting from the cutting using the cut c is calculated, and the calculated execution time is substituted into t c .
- the calculation of this execution time is also based on the case of speculative execution explained above in relation to FIG. 9 .
- Steps 708 , 710 , 712 and 714 are performed on each of the cuts included in the set C, and the resulting c min is returned in Step 602 of FIG. 6 .
- FIG. 10 schematically shows a result of such processing.
- the block processing flow shown on the left side of FIG. 10 is cut at multiple positions by the processing shown in the flowchart of FIG. 6 performed recursively. Consequently, multiple block chunks subdivided as shown on the right side of FIG. 10 are obtained.
- Step 520 of FIG. 5 With reference to flowcharts shown in FIG. 11 and FIG. 12 , the CPU assignment code generation processing corresponding to Step 520 of FIG. 5 will be described. This processing is performed by the CPU assignment code generation module 416 shown in FIG. 4 .
- Step 1104 it is determined whether or not p ⁇ b is satisfied. If the determination results in NO, that is, p ⁇ b, the number of processors is large enough for the assignment of the block chunks to the processors without linking any block chunks. Accordingly, in Step 1106 , the block chunks are individually assigned to the processors as appropriate, and the processing is then terminated.
- Step 1108 processing is performed in which two of the block chunks are linked to each other to reduce the number of block chunks by one.
- Step 1108 an optimal combination is found which minimizes the expected processing time resulting from the linking of two block chunks.
- FIG. 14 schematically shows such processing.
- the determination in Step 1104 results in NO. This indicates that the number of processors is enough for the assignment of the block chunks without linking any block chunks. Accordingly, in Step 1106 , the resulting block chunks stored at this time are assigned to the processors, and the process is then terminated. When it is desired that some CPUs are to be reserved for different processing, the number of block chunks may be reduced until b ⁇ p is satisfied.
- FIG. 12 is a flowchart describing the processing in Step 1108 of FIG. 11 further in detail.
- ⁇ indicates that the corresponding number is an appropriate constant number which is larger than the number actually calculated in a corresponding state.
- Step 1204 it is determined whether or not the set S 1 is empty. If the set S 1 is empty, the processing is completed, and the process returns to Step 1108 in the flowchart of FIG. 11 . If the set S 1 is not empty, a single block chunk s 1 is taken from the set S 1 in Step 1206 .
- Step 1210 it is determined whether or not the set S 2 is empty. If the set S 2 is empty, the process returns to Step 1204 . If the set S 2 is not empty, a single block chunk s 2 is taken from the set S 2 in Step 1212 .
- Step 1214 the execution time when the block chunk s 2 is linked below the block chunk s 1 is calculated by using the measurement results 410 of the blocks shown in FIG. 4 , and the calculated value is substituted into T s1s2 .
- Step 1216 it is determined whether T s1s2 is equal to T min . If T s1s2 is equal to T min , cost expected when the set s 2 is linked below the set s 1 is calculated, and the calculated value is substituted into U s1s2 .
- the cost is the expected value of the entire CPU expended time, and is calculated such that the speculation success probability for each of the two cases where possible speculation succeeds or fails is assigned as a weight to execution times of the respective blocks. Also included is message sending and receiving costs between blocks performed by different processors, speculation costs, speculation checking costs, and rollback costs at the time of speculation failure.
- Step 1224 it is determined whether T s1s2 ⁇ T min is satisfied. If T S1S2 ⁇ T min is satisfied, Step 1222 is performed, and the process then returns to Step 1210 to make the determination whether or not the set S 2 is empty. If T S1S2 ⁇ T min is not satisfied, the process returns from Step 1224 immediately to Step 1210 to make a determination whether or not the set S 2 is empty.
- FIG. 13 shows an example of block chunk linking. As shown in FIG. 13 , this example includes four block chunks bc 1 , bc 2 , bc 3 and bc 4 . When the order of the block chunks of each link does not need to be limited to the upstream/downstream relationship in the original flow, twelve ways of linking can be made out of the block chunks bc 1 , bc 2 , bc 3 and bc 4 .
- expected execution time t bc2 bc3 and expected execution cost u bc2 bc3 are calculated as follows.
- t bc2bc3
- +MRCac+MRC if u bc2bc3
- t bc1bc4 p 1 p 2 ⁇ (
- u bc1bc4
- p 1 and p 2 denote speculation success probabilities in the paths shown in FIG. 13 , respectively. All the individual values in the expressions are obtained from the measurement results 410 .
- FIG. 14 shows processing in a case where, since six block chunks bc 1 , bc 2 , bc 3 , bc 4 , bc 5 and bc 6 are present while only five CPUs are provided, the CPU assignment code generation module 416 proceeds to link two block chunks in order to reduce the number of the block chunks by 1.
- the block bc 6 is linked below the block bc 4 , and the execution time of the block bc 3 results in the longest execution time t s1s2 .
- the block bc 5 is linked below the block bc 1 , and the execution time of the block bc 1 results in the longest execution time t s1s2 .
- the CPU assignment code generation module 416 calculates the longest execution time t s1s2 for each of all the combinations of block chunks, and then selects the link of block chunks whose execution time t s1s2 is consequently the shortest.
- the generated codes for the respective CPUs are individually compiled by the compiler 422 and converted into executable codes, and are then temporarily stored in the hard disk drive 316 .
- FIG. 15 is a schematic view for explaining such a dependency relationship.
- a code formed of the block A and the block C is denoted by Code 1
- a code formed of the block B and the block D is denoted by Code 2
- a code formed of the block F, the block H and the block J is denoted by Code 3
- a code formed of the block E, the block G and the block I is denoted by Code 4 .
- Code 1 , Code 2 , Code 3 and Code 4 are as shown in FIG. 15 .
- the argument of Code 3 uses the first return values of the Code 1 , Code 2 and Code 4 . This is described as follows. For example, the 1st output of Code 1 is included in the 1st argument of Code 3 ; the 1st output of Code 2 is included in the 2nd argument of Code 3 ; and the 1st output of Code 4 is included in the 3rd argument of Code 3 .
- the CPU assignment code generation module 416 generates these information pieces together with corresponding CPU assignment codes.
- the compiler 422 can be notified of dependency relationship information pieces in such a manner that the dependency relationship information pieces are included in the corresponding CPU assignment codes.
- the dependency relationship information pieces are, for example, stored directly in the shared memory of the execution environment 424 so that the CPU 1 to CPU n can refer to the information pieces when executing the assigned codes.
- the compiled executable programs for the CPUs are sequentially loaded into the main memory 306 by the execution environment 424 , and the execution environment 424 assigns the processes generated in association with the executable programs to the individual processors.
- the simulation program is divided into multiple executable programs, and the multiple executable programs are executed in parallel by the respective processors.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Devices For Executing Special Programs (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
T cs =|D|+|E|+|F|+SCcd+MRCif+MRCcd+SCCcd
T cf =|B|+|C|+|D|+|E|+|F|+MRCac+MSCcd+MRCcd+RBCcd+MRCif
T c =p c T cs+(1−p c)T cf
t bc2bc3 =|B|+|C|+|D|+|E|+|F|+MRCac+MRCif
u bc2bc3 =|A|+|B|+|C|+|D|+|E|+|F|+|G|+|H|+|I|+MRCac+MRCif+MSCac+MSCif
t bc1bc4 =p 1 p 2×(|D|+|E|+|F|+SCcd+SCif+MRcd+SCCcd+MRCif+SCCif)+p 1(1−p 2)×(|A|+|G|+|H|+|I|+|F|+MSAac+MSCif+MRCif+SCCif+RBCif)+(1−p 1)×(|B|+|C|+|D|+|E|+|F|+MRCac+MSCcd+MRCcd+SCCcd+RBCcd+MRCif)
u bc1bc4 =|A|+|B|+|C|+|D|+|E|+|F|+|G|+|H|+|I|+p 1 p 2×(SCcd+SCif+MRcd+SCCcd+MRCif+SCCif)+p 1(1−p 2)×(MSAac+MSCif+MRCif+SCCif+RBCif)+(1−p 1)×(MRCac+MSCcd+MRCcd+SCCcd+RBCcd+MRCif)
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/750,112 US8595712B2 (en) | 2008-10-24 | 2013-01-25 | Source code processing method, system and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2008274686 | 2008-10-24 | ||
JP2008-274686 | 2008-10-24 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/750,112 Continuation US8595712B2 (en) | 2008-10-24 | 2013-01-25 | Source code processing method, system and program |
Publications (2)
Publication Number | Publication Date |
---|---|
US20100106949A1 US20100106949A1 (en) | 2010-04-29 |
US8407679B2 true US8407679B2 (en) | 2013-03-26 |
Family
ID=42118628
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/603,598 Active 2032-01-25 US8407679B2 (en) | 2008-10-24 | 2009-10-22 | Source code processing method, system and program |
US13/750,112 Active US8595712B2 (en) | 2008-10-24 | 2013-01-25 | Source code processing method, system and program |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/750,112 Active US8595712B2 (en) | 2008-10-24 | 2013-01-25 | Source code processing method, system and program |
Country Status (6)
Country | Link |
---|---|
US (2) | US8407679B2 (en) |
EP (1) | EP2352087A4 (en) |
JP (1) | JP5209059B2 (en) |
KR (1) | KR101522444B1 (en) |
CN (1) | CN102197376B (en) |
WO (1) | WO2010047174A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140258998A1 (en) * | 2013-03-08 | 2014-09-11 | Facebook, Inc. | Enlarging control regions to optimize script code compilation |
US10289469B2 (en) * | 2016-10-28 | 2019-05-14 | Nvidia Corporation | Reliability enhancement utilizing speculative execution systems and methods |
US11915149B2 (en) | 2018-11-08 | 2024-02-27 | Samsung Electronics Co., Ltd. | System for managing calculation processing graph of artificial neural network and method of managing calculation processing graph by using the same |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9053239B2 (en) * | 2003-08-07 | 2015-06-09 | International Business Machines Corporation | Systems and methods for synchronizing software execution across data processing systems and platforms |
JP4988789B2 (en) * | 2009-05-19 | 2012-08-01 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Simulation system, method and program |
US8863128B2 (en) * | 2010-09-30 | 2014-10-14 | Autodesk, Inc | System and method for optimizing the evaluation of task dependency graphs |
US8752035B2 (en) * | 2011-05-31 | 2014-06-10 | Microsoft Corporation | Transforming dynamic source code based on semantic analysis |
US9256401B2 (en) | 2011-05-31 | 2016-02-09 | Microsoft Technology Licensing, Llc | Editor visualization of symbolic relationships |
US8789018B2 (en) | 2011-05-31 | 2014-07-22 | Microsoft Corporation | Statically derived symbolic references for dynamic languages |
JP5775386B2 (en) * | 2011-07-14 | 2015-09-09 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Parallelization method, system, and program |
JP6021342B2 (en) * | 2012-02-09 | 2016-11-09 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Parallelization method, system, and program |
CN108717387B (en) | 2012-11-09 | 2021-09-07 | 相干逻辑公司 | Real-time analysis and control for multiprocessor systems |
US8954939B2 (en) | 2012-12-31 | 2015-02-10 | Microsoft Corporation | Extending a development environment |
US20150363230A1 (en) * | 2013-01-23 | 2015-12-17 | Waseda University | Parallelism extraction method and method for making program |
KR101883475B1 (en) | 2013-02-28 | 2018-07-31 | 한화지상방산 주식회사 | Mini Integrated-control device |
US10592278B2 (en) * | 2013-03-15 | 2020-03-17 | Facebook, Inc. | Defer heavy operations while scrolling |
US9698791B2 (en) | 2013-11-15 | 2017-07-04 | Scientific Concepts International Corporation | Programmable forwarding plane |
US9294097B1 (en) | 2013-11-15 | 2016-03-22 | Scientific Concepts International Corporation | Device array topology configuration and source code partitioning for device arrays |
US10326448B2 (en) | 2013-11-15 | 2019-06-18 | Scientific Concepts International Corporation | Code partitioning for the array of devices |
CN104678775A (en) * | 2013-11-27 | 2015-06-03 | 联创汽车电子有限公司 | HILS (hardware-in-the-loop simulation) system and synchronous deviation correction method thereof |
JP6427054B2 (en) * | 2015-03-31 | 2018-11-21 | 株式会社デンソー | Parallelizing compilation method and parallelizing compiler |
US10282498B2 (en) * | 2015-08-24 | 2019-05-07 | Ansys, Inc. | Processor-implemented systems and methods for time domain decomposition transient simulation in parallel |
JP2018124605A (en) * | 2017-01-30 | 2018-08-09 | オムロン株式会社 | Image processing system, information processing apparatus, information processing method, and information processing program |
DE112019006739B4 (en) * | 2019-02-26 | 2023-04-06 | Mitsubishi Electric Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING PROGRAM |
US11645219B2 (en) * | 2021-02-02 | 2023-05-09 | American Megatrends International, Llc | Method for generating a hybrid BMC system and hybrid BMC system |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0380337A (en) | 1989-04-28 | 1991-04-05 | Hitachi Ltd | Parallel form producing device |
JPH0683608A (en) | 1992-09-04 | 1994-03-25 | Fujitsu Ltd | Program analysis support device |
JPH0721240A (en) | 1993-06-23 | 1995-01-24 | Nec Corp | Device for timing considering arrangement |
JPH08180100A (en) | 1994-12-27 | 1996-07-12 | Fujitsu Ltd | Method and device for scheduling |
US5774728A (en) * | 1995-12-27 | 1998-06-30 | International Business Machines Corporation | Method and system for compiling sections of a computer program for multiple execution environments |
US20020056078A1 (en) * | 2000-10-30 | 2002-05-09 | International Business Machines Corporation | Program optimization |
US20040230770A1 (en) * | 1999-01-12 | 2004-11-18 | Matsushita Electric Industrial Co., Ltd. | Method and system for processing program for parallel processing purposes, storage medium having stored thereon program getting program processing executed for parallel processing purposes, and storage medium having stored thereon instruction set to be executed in parallel |
US20050144602A1 (en) | 2003-12-12 | 2005-06-30 | Tin-Fook Ngai | Methods and apparatus to compile programs to use speculative parallel threads |
US20060123401A1 (en) * | 2004-12-02 | 2006-06-08 | International Business Machines Corporation | Method and system for exploiting parallelism on a heterogeneous multiprocessor computer system |
US20070011684A1 (en) | 2005-06-27 | 2007-01-11 | Du Zhao H | Mechanism to optimize speculative parallel threading |
JP2007048052A (en) | 2005-08-10 | 2007-02-22 | Internatl Business Mach Corp <Ibm> | Compiler, control method and compiler program |
US7197747B2 (en) * | 2002-03-13 | 2007-03-27 | International Business Machines Corporation | Compiling method, apparatus, and program |
JP2008515051A (en) | 2004-09-28 | 2008-05-08 | インテル コーポレイション | System, method and apparatus for dependency chain processing |
US20080184011A1 (en) | 2007-01-30 | 2008-07-31 | Nema Labs Ab | Speculative Throughput Computing |
JP2009129179A (en) | 2007-11-22 | 2009-06-11 | Toshiba Corp | Program parallelization support device and program parallelization support method |
US20090254892A1 (en) * | 2006-12-14 | 2009-10-08 | Fujitsu Limited | Compiling method and compiler |
US20120079467A1 (en) * | 2010-09-27 | 2012-03-29 | Nobuaki Tojo | Program parallelization device and program product |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5325525A (en) * | 1991-04-04 | 1994-06-28 | Hewlett-Packard Company | Method of automatically controlling the allocation of resources of a parallel processor computer system by calculating a minimum execution time of a task and scheduling subtasks against resources to execute the task in the minimum time |
JP2008059304A (en) * | 2006-08-31 | 2008-03-13 | Sony Corp | Communication device, method, and program |
-
2009
- 2009-08-24 JP JP2010534746A patent/JP5209059B2/en active Active
- 2009-08-24 WO PCT/JP2009/064698 patent/WO2010047174A1/en active Application Filing
- 2009-08-24 KR KR1020117009462A patent/KR101522444B1/en active IP Right Grant
- 2009-08-24 EP EP09821874A patent/EP2352087A4/en not_active Withdrawn
- 2009-08-24 CN CN200980142515.2A patent/CN102197376B/en active Active
- 2009-10-22 US US12/603,598 patent/US8407679B2/en active Active
-
2013
- 2013-01-25 US US13/750,112 patent/US8595712B2/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0380337A (en) | 1989-04-28 | 1991-04-05 | Hitachi Ltd | Parallel form producing device |
JPH0683608A (en) | 1992-09-04 | 1994-03-25 | Fujitsu Ltd | Program analysis support device |
JPH0721240A (en) | 1993-06-23 | 1995-01-24 | Nec Corp | Device for timing considering arrangement |
JPH08180100A (en) | 1994-12-27 | 1996-07-12 | Fujitsu Ltd | Method and device for scheduling |
US5774728A (en) * | 1995-12-27 | 1998-06-30 | International Business Machines Corporation | Method and system for compiling sections of a computer program for multiple execution environments |
US20040230770A1 (en) * | 1999-01-12 | 2004-11-18 | Matsushita Electric Industrial Co., Ltd. | Method and system for processing program for parallel processing purposes, storage medium having stored thereon program getting program processing executed for parallel processing purposes, and storage medium having stored thereon instruction set to be executed in parallel |
US20020056078A1 (en) * | 2000-10-30 | 2002-05-09 | International Business Machines Corporation | Program optimization |
JP2002149416A (en) | 2000-10-30 | 2002-05-24 | Internatl Business Mach Corp <Ibm> | Method for optimizing program and compiler using the same |
US7197747B2 (en) * | 2002-03-13 | 2007-03-27 | International Business Machines Corporation | Compiling method, apparatus, and program |
US20050144602A1 (en) | 2003-12-12 | 2005-06-30 | Tin-Fook Ngai | Methods and apparatus to compile programs to use speculative parallel threads |
JP2008515051A (en) | 2004-09-28 | 2008-05-08 | インテル コーポレイション | System, method and apparatus for dependency chain processing |
US20060123401A1 (en) * | 2004-12-02 | 2006-06-08 | International Business Machines Corporation | Method and system for exploiting parallelism on a heterogeneous multiprocessor computer system |
US20070011684A1 (en) | 2005-06-27 | 2007-01-11 | Du Zhao H | Mechanism to optimize speculative parallel threading |
JP2007048052A (en) | 2005-08-10 | 2007-02-22 | Internatl Business Mach Corp <Ibm> | Compiler, control method and compiler program |
US20090254892A1 (en) * | 2006-12-14 | 2009-10-08 | Fujitsu Limited | Compiling method and compiler |
US20080184011A1 (en) | 2007-01-30 | 2008-07-31 | Nema Labs Ab | Speculative Throughput Computing |
JP2009129179A (en) | 2007-11-22 | 2009-06-11 | Toshiba Corp | Program parallelization support device and program parallelization support method |
US20120079467A1 (en) * | 2010-09-27 | 2012-03-29 | Nobuaki Tojo | Program parallelization device and program product |
Non-Patent Citations (1)
Title |
---|
Program Evaluation and Review Technique (PERT), Sep. 17, 2009, http://en.wikipedia.org/wiki/Program-Evaluation-and-Review-Technique. |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140258998A1 (en) * | 2013-03-08 | 2014-09-11 | Facebook, Inc. | Enlarging control regions to optimize script code compilation |
US9552195B2 (en) * | 2013-03-08 | 2017-01-24 | Facebook, Inc. | Enlarging control regions to optimize script code compilation |
US10289469B2 (en) * | 2016-10-28 | 2019-05-14 | Nvidia Corporation | Reliability enhancement utilizing speculative execution systems and methods |
US11915149B2 (en) | 2018-11-08 | 2024-02-27 | Samsung Electronics Co., Ltd. | System for managing calculation processing graph of artificial neural network and method of managing calculation processing graph by using the same |
Also Published As
Publication number | Publication date |
---|---|
JPWO2010047174A1 (en) | 2012-03-22 |
JP5209059B2 (en) | 2013-06-12 |
WO2010047174A1 (en) | 2010-04-29 |
US20100106949A1 (en) | 2010-04-29 |
CN102197376B (en) | 2014-01-15 |
KR101522444B1 (en) | 2015-05-21 |
EP2352087A1 (en) | 2011-08-03 |
US20130139131A1 (en) | 2013-05-30 |
CN102197376A (en) | 2011-09-21 |
KR20110071097A (en) | 2011-06-28 |
EP2352087A4 (en) | 2012-08-08 |
US8595712B2 (en) | 2013-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8407679B2 (en) | Source code processing method, system and program | |
JP4629768B2 (en) | Parallelization processing method, system, and program | |
JP4931978B2 (en) | Parallelization processing method, system, and program | |
US8677334B2 (en) | Parallelization method, system and program | |
JP4988789B2 (en) | Simulation system, method and program | |
US8868381B2 (en) | Control system design simulation using switched linearization | |
JP6021342B2 (en) | Parallelization method, system, and program | |
Jannach et al. | Parallelized hitting set computation for model-based diagnosis | |
CN114139475A (en) | Chip verification method, system, device and storage medium | |
JP5479942B2 (en) | Parallelization method, system, and program | |
US20150331787A1 (en) | Software verification | |
US9218317B2 (en) | Parallelization method, system, and program | |
US8661424B2 (en) | Auto-generation of concurrent code for multi-core applications | |
Fritzsch et al. | Experiences from Large-Scale Model Checking: Verifying a Vehicle Control System with NuSMV | |
Schaefer et al. | Future Automotive Embedded Systems Enabled by Efficient Model-Based Software Development | |
KR101137034B1 (en) | System and method for distributed runtime diagnostics in hierarchical parallel environments | |
US10088834B2 (en) | Control system having function for optimizing control software of numerical controller in accordance with machining program | |
CN114153750B (en) | Code checking method and device, code compiling method and electronic equipment | |
US11294647B1 (en) | Support apparatus and design support method | |
Haberl et al. | Seamless model-driven development put into practice | |
WO2022196219A1 (en) | Program analysis device, program analysis method, and tracing process addition device | |
JPH0916642A (en) | Method for evaluating architecture in data processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION,NEW YO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOMATSU, HIDEAKI;YOSHIZAWA, TAKEO;REEL/FRAME:023406/0110 Effective date: 20090928 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KOMATSU, HIDEAKI;YOSHIZAWA, TAKEO;REEL/FRAME:023406/0110 Effective date: 20090928 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |