US9760354B2 - Information processing apparatus and compiling method - Google Patents

Information processing apparatus and compiling method Download PDF

Info

Publication number
US9760354B2
US9760354B2 US15/070,048 US201615070048A US9760354B2 US 9760354 B2 US9760354 B2 US 9760354B2 US 201615070048 A US201615070048 A US 201615070048A US 9760354 B2 US9760354 B2 US 9760354B2
Authority
US
United States
Prior art keywords
function
function call
calls
call
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US15/070,048
Other languages
English (en)
Other versions
US20160321048A1 (en
Inventor
Takayuki Matsuura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Assigned to FUJITSU LIMITED reassignment FUJITSU LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MATSUURA, TAKAYUKI
Publication of US20160321048A1 publication Critical patent/US20160321048A1/en
Application granted granted Critical
Publication of US9760354B2 publication Critical patent/US9760354B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4443Inlining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4442Reducing the number of cache misses; Data prefetching

Definitions

  • the embodiments discussed herein are related to an information processing apparatus and a compiling method.
  • source code is often written in a high-level language easily understood by humans, and is converted into machine-readable object code by a compiler.
  • a set of operations is defined as a function in order to increase the reusability.
  • the function is used repeatedly by executing a function call.
  • Inline expansion of function calls is an optimization technique that replaces a function call instruction with instructions included in the called function, and thereby reduces the number of function calls in the object code. Inline expansion of function calls often improves the performance of object code.
  • the proposed compiler estimates, for a function call, the number of times the loop to which the function call belongs is executed.
  • the compiler calculates a determination value of the function call, based on the estimated loop count, the object code size of the called function, and the properties of the target processor. When the calculated determination value is greater than a predetermined threshold, the compiler determines to inline the function call.
  • the proposed program conversion apparatus counts, for each of a plurality of functions, instructions in the function that access variables.
  • the program conversion apparatus selects a function with the highest count of instructions, and inlines a function call that calls the selected function.
  • the proposed compiler described above is configured to determine for each function call whether to perform inline expansion, and is not configured to select a function call to be inlined, from a plurality of function calls.
  • the proposed program conversion apparatus described above is configured to inline a function call that calls a function with the greatest number of instructions involving access to a variable. That is, there is still room for improvement in terms of performance.
  • an information processing apparatus includes: a memory configured to store code including a plurality of functions and a plurality of function calls, each of the plurality of function calls calling one of the plurality of functions; and a processor configured to perform a procedure including: calculating, for each of the plurality of functions included in the code, a plurality of index values including a first index value and a second index value, the first index value indicating an iteration status of a loop in the function, the second index value indicating a code size of the function; calculating, for each of the plurality of function calls included in the code, an evaluation value based on the plurality of index values that are calculated for the function called by the function call; and selecting one or more of the plurality of function calls, based on the evaluation value, and inlining the selected one or more function calls.
  • FIG. 1 illustrates an example of an information processing apparatus
  • FIG. 2 is a block diagram illustrating exemplary hardware of a compiling apparatus
  • FIG. 3 is a block diagram illustrating exemplary functions of the compiling apparatus
  • FIG. 4 illustrates an example of inline expansion
  • FIG. 5 illustrates an example of inline expansion of hierarchical function calls
  • FIG. 6 illustrates an example of a function call graph
  • FIG. 7 illustrates an example of a queue and a stack
  • FIG. 8 illustrates an example of function data
  • FIG. 9 illustrates an example of function call index data and function index data
  • FIG. 10 illustrates an example of function call index extraction
  • FIG. 11 illustrates an example of function index extraction
  • FIG. 12 illustrates an example of an evaluation criteria table
  • FIG. 13 illustrates an example of an evaluation value table
  • FIG. 14 illustrates an example of function data update
  • FIG. 15 illustrates an example of evaluation value recalculation
  • FIG. 16 is a flowchart illustrating an example of the procedure of compilation
  • FIG. 17 is a flowchart illustrating an example of the procedure of forward function scan
  • FIG. 18 is a flowchart illustrating the procedure of function call index extraction
  • FIG. 19 is a flowchart illustrating an example of the procedure of backward function scan
  • FIG. 20 is a flowchart illustrating an example of the procedure of function index extraction.
  • FIG. 21 illustrates an example of the procedure of inline expansion.
  • FIG. 1 illustrates an example of an information processing apparatus 10 .
  • the information processing apparatus 10 of the first embodiment compiles source code written in a high-level language so as to generate machine-readable object code.
  • the information processing apparatus 10 may be referred to as a “compiling apparatus”.
  • the information processing apparatus 10 may be a computer.
  • the information processing apparatus 10 executes a compiler as software.
  • the information processing apparatus 10 may be a terminal apparatus (such as a client computer and the like) that is operated by the user, or may be a server apparatus (such as a server computer and the like) that is accessed by a terminal apparatus.
  • the information processing apparatus 10 includes a storage unit 11 and a conversion unit 12 .
  • the storage unit 11 include volatile storage devices such as random access memory (RAM) and the like, and also include non-volatile storage devices such as hard disk drive (HDD), flash memory, and the like.
  • the conversion unit 12 include processors such as central processing unit (CPU), digital signal processor (DSP), and the like. However, the conversion unit 12 may include an application specific electronic circuit such as application specific integrated circuit (ASIC), field programmable gate array (FPGA), and the like.
  • the processor executes a program stored in a memory such as a RAM or the like.
  • the processor executes a compiler program that compiles source code, for example.
  • a set of multiple processors (a multiprocessor) may also be referred to as a “processor”.
  • the storage unit 11 stores code 13 .
  • the code 13 is source code or intermediate code converted from source code.
  • the code 13 includes a plurality of functions including functions 14 a , 14 b , and 14 c .
  • the function 14 a executes a process A when called; the function 14 b executes a process B when called; and the function 14 c executes a process C when called.
  • the code 13 also includes a plurality of function calls including function calls 15 a , 15 b , and 15 c . Each function call calls a function.
  • the function call 15 a calls the function 14 a ;
  • the function call 15 b calls the function 14 b ;
  • the function call 15 c calls the function 14 c.
  • the conversion unit 12 calculates, for each function in the code 13 , a plurality of index values including an index value 16 a (a first index value) and an index value 16 b (a second index value).
  • the index value 16 a indicates the iteration status of a loop in the function (for example, the loop count).
  • the index value 16 b indicates the code size of the function (for example, the number of lines of the source code, the number of instructions in the intermediate code, or the like).
  • the plurality of index values calculated by the conversion unit 12 may further include at least one of a third index value, a fourth index value, and a fifth index value.
  • the third index value indicates whether additional information indicating inline expansion of the function is added. The additional information is added to the source code by the user, for example.
  • the fourth index value indicates the number of other function calls included in the function.
  • the fifth index value indicates the number of instructions that are not pipelined, among instructions included in the function. The type of instructions that are not pipelined depends on the target processor. Examples of such include single instruction multiple data (SIMD) instructions and the like.
  • SIMD single instruction multiple data
  • the conversion unit 12 calculates an evaluation value 17 for each function call in the code 13 .
  • the evaluation value 17 uses the above-described plurality of index values that are calculated for the called function. For example, the conversion unit 12 calculates the evaluation value 17 by weighting the index values 16 a and 16 b with respective predetermined weights, and adding together the weighted index values 16 a and 16 b . The weight may be changed in accordance with the target processor.
  • the evaluation value of the function call 15 a is calculated based on the index values of the function 14 a .
  • the evaluation value of the function call 15 b is calculated based on the index values of the function 14 b .
  • the evaluation value of the function call 15 c is calculated based on the index values of the function 14 c.
  • the conversion unit 12 may calculate another index value for a module including the function call (for example, a function including the function call), and calculate the evaluation value 17 based on the other evaluation value in addition to the index values of the called function.
  • the other evaluation value may be a sixth index value, a seventh index value, or the like, for example.
  • the sixth index value indicates the iteration status of a loop to which the function call belongs (for example, the loop count).
  • the seventh index value indicates the number of instructions that are not pipelined, among instructions included in the module to which the function call belongs.
  • the conversion unit 12 selects one or more of the plurality of function calls, based on the evaluation value 17 , and inlines the selected function calls. That is, the conversion unit 12 replaces the instruction of each selected function call with instructions included in the called function. Function calls with higher evaluation values 17 are preferentially selected. Note that the selection is performed under the condition that the code size of each module after inline expansion does not exceed the size of the instruction cache (for example, Layer 1 (L1) instruction cache) of the target processor.
  • L1 Layer 1
  • the function calls 15 a , 15 b , and 15 c are included in the main function.
  • the evaluation value of the function call 15 a is “60”; the evaluation value of the function call 15 b is “100”; and the evaluation value of the function call 15 c is “80”.
  • the function call 15 b is preferentially selected, and then the function call 15 c is selected, under the condition that the code size of the main function does not exceed the size of the instruction cache.
  • the function calls 15 b and 15 c are inlined, while the function call 15 a at the top is not likely to be inlined.
  • a plurality of index values including the index value 16 a indicating the iteration status of a loop in the function and the index value 16 b indicating the code size of the function are calculated.
  • the evaluation value 17 is calculated based on the plurality of index values that are calculated for the called function. Then, one or more of the plurality of function calls 15 a , 15 b , and 15 c are selected based on the evaluation value 17 , and are inlined.
  • a compiling apparatus 100 of the second embodiment compiles source code written in a high-level language so as to generate machine-readable object code.
  • the compiling apparatus 100 may be a terminal apparatus that is operated by the user, or may be a server apparatus that is accessed by a terminal apparatus.
  • the compiling apparatus 100 is implemented by a computer, for example. In this case, the compiling apparatus 100 executes a compiler and a linker as software.
  • FIG. 2 is a block diagram illustrating exemplary hardware of the compiling apparatus 100 .
  • the compiling apparatus 100 includes a CPU 101 , a RAM 102 , an HDD 103 , an image signal processing unit 104 , an input signal processing unit 105 , a media reader 106 , and a communication interface 107 . These units are connected to a bus 108 .
  • the CPU 101 is a processor including an arithmetic circuit that executes instructions of a program.
  • the CPU 101 loads at least part of a program and data stored in the HDD 103 into the RAM 102 , and executes the program.
  • the CPU 101 may include multiple processor cores, and the compiling apparatus 100 may include multiple processors. Thus, processes described below may be executed in parallel by using multiple processors or processor cores.
  • a set of multiple processors (a multiprocessor) may be referred to as a “processor”.
  • the RAM 102 is a volatile semiconductor memory that temporarily stores a program executed by the CPU 101 and data used for processing by the CPU 101 .
  • the compiling apparatus 100 may include other types of memories than a RAM, and may include a plurality of memories.
  • the HDD 103 is a non-volatile storage device that stores programs of software (such as an operation system (OS), middleware, application software, and the like) and data.
  • the programs include a compiler program and a linker program.
  • the compiling apparatus 100 may include other types of storage devices such as a flash memory, a solid state drive (SSD), and the like, and may include a plurality of non-volatile storage devices.
  • the image signal processing unit 104 outputs an image to a display 111 connected to the compiling apparatus 100 , in accordance with an instruction from the CPU 101 .
  • Examples of the display 111 include cathode ray tube (CRT) displays, liquid crystal displays (LCDs), plasma display panels (PDPs), organic electro-luminescence (OEL) displays, and the like.
  • the input signal processing unit 105 obtains an input signal from an input device 112 connected to the compiling apparatus 100 , and outputs the input signal to the CPU 101 .
  • the input device 112 include pointing devices (such as a mouse, a touch panel, a touch pad, a trackball, and the like), a keyboard, a remote controller, a button switch, and the like.
  • pointing devices such as a mouse, a touch panel, a touch pad, a trackball, and the like
  • a keyboard such as a keyboard, a remote controller, a button switch, and the like.
  • a plurality of types of input devices may be connected to the compiling apparatus 100 .
  • the media reader 106 is a reading device that reads a program and data stored in a storage medium 113 .
  • the storage medium 113 include magnetic discs (such as flexible disk (FD), HDD, and the like), optical disc (such as compact disc (CD), digital versatile disc (DVD), and the like), magneto-optical discs (MOs), semiconductor memories, and the like.
  • the media reader 106 reads, for example, a program and data from the storage medium 113 , and stores the read program and data in the RAM 102 or the HDD 103 .
  • the communication interface 107 is connected to a network 114 , and is an interface that communicates with another computer via the network 114 .
  • the communication interface 107 may be a wired communication interface connected to a communication apparatus such as a switch with a cable, or may be a radio communication interface connected to a base station via a radio link.
  • the compiling apparatus 100 does not have to include the media reader 106 . If the compiling apparatus 100 is controllable from a terminal apparatus operated by the user, the compiling apparatus 100 does not have to include the image signal processing unit 104 or the input signal processing unit 105 . Further, the display 111 and the input device 112 may be integrally formed with the housing of the compiling apparatus 100 .
  • FIG. 3 is a block diagram illustrating exemplary functions of the compiling apparatus 100 .
  • the compiling apparatus 100 includes a file storage unit 120 , a compiler 130 , and a linker 150 .
  • the file storage unit 120 is implemented as a storage area reserved in the RAM 102 or the HDD 103 , for example.
  • the compiler 130 and the linker 150 are implemented as modules of programs (a compiler program and a linker program) executed by the CPU 101 , for example. Some or all of the functions of the compiler 130 and the linker 150 may be implemented as electronic circuits instead of as software.
  • the file storage unit 120 stores a source file 121 , an object file 122 , and an executable file 123 .
  • the source file 121 includes source code written in a high-level language such as C++ and the like.
  • the object file 122 includes machine-readable object code.
  • the executable file 123 is in a format executable by the target processor. Note that the executable file 123 may be executed by the CPU 101 , another CPU of the compiling apparatus 100 , or a CPU of a computer other than the compiling apparatus 100 .
  • the compiler 130 reads the source file 121 from the file storage unit 120 , converts source code into object code, and stores the object file 122 in the file storage unit 120 .
  • the compiler 130 includes an input and output control unit 131 , a file input unit 132 , an intermediate code generation unit 133 , an intermediate code storage unit 134 , an assembly code generation unit 135 , a file output unit 136 , an optimization unit 140 , and a control information storage unit 143 .
  • the input and output control unit 131 selects an input and output method corresponding to the type of files, and controls the file input unit 132 and the file output unit 136 .
  • the file input unit 132 opens the source file 121 , and reads the source code from the source file 121 , in accordance with an instruction from the input and output control unit 131 .
  • the intermediate code generation unit 133 analyzes the source code read by the file input unit 132 , converts the source code into intermediate code written in an intermediate language, which is used in the compiler 130 , and stores the intermediate code in the intermediate code storage unit 134 .
  • the analysis of source code includes lexical analysis, syntactic analysis, semantic analysis, and so on.
  • the intermediate code storage unit 134 is a storage area reserved in the RAM 102 , and stores the intermediate code.
  • the assembly code generation unit 135 converts the intermediate code optimized by the optimization unit 140 into assembly code written in an assembly language, which is a low-level language.
  • the file output unit 136 generates an object file 122 in accordance with an instruction from the input and output control unit 131 . Then, the file output unit 136 converts the assembly code generated by the assembly code generation unit 135 into object code, and writes the object code to the object file 122 .
  • the optimization unit 140 optimizes the intermediate code stored in the intermediate code storage unit 134 in order to improve the execution speed.
  • the optimization unit 140 includes an analysis unit 141 and an optimization execution unit 142 .
  • the analysis unit 141 analyzes intermediate code so as to determine an optimization method.
  • the optimization method determined by the analysis unit 141 includes inline expansion of function calls, which replaces a function call instruction with instructions included in the called function and thereby reduces the number of function calls.
  • the optimization execution unit 142 optimizes the intermediate code with the optimization method selected by the analysis unit 141 . Optimization performed by the optimization execution unit 142 includes inline expansion.
  • the control information storage unit 143 is a storage area reserved in the RAM 102 or the HDD 103 , and stores various types of control information that is generated or referred to by the optimization unit 140 during an optimization process. The details of the control information will be described below.
  • the linker 150 reads the object file 122 from the file storage unit 120 , analyzes the object code, and detects other object files and libraries that are referred to. Then, the linker 150 links the object file 122 with the detected other object files and libraries so as to generate the executable file 123 . Note that the functions of the linker 150 may be integrated into the compiler 130 .
  • FIG. 4 illustrates an example of inline expansion.
  • Source code 21 is an example of source code included in the source file 121 .
  • the source code 21 includes a function main, a function big_subA, a function big_subB, and a function inline_sub.
  • a process A is defined by 1,990 lines of statements.
  • a process B is defined by 1,990 lines of statements.
  • a process C is defined by 20 lines of statements.
  • the function main includes a function call 21 a that calls the function big_subA, a function call 21 b that calls the function big_subB, and a function call 21 c that calls the function inline_sub.
  • the function call 21 c is inside a loop that iterates 100 times.
  • the function calls 21 a and 21 b are outside the loop.
  • inline expansion optimization is performed for the source code 21 .
  • the optimization is performed under the condition that the number of lines of the function main does not exceed 4,000 lines.
  • the term “the number of lines” as used herein indicates the number of actual statements that end in a semicolon.
  • the number of lines of the function main in the source code 21 in FIG. 4 is 5.
  • the first method is one that inlines function calls in order of nearest to the top of the source code 21 .
  • the source code 21 is converted into source code 22 . More specifically, the function call 21 a is first selected. If the function call 21 a is inlined, the function main will have 1,994 lines. Accordingly, the selected function call 21 a is inlined. Then, the function call 21 b is selected. If the function call 21 b is inlined, the function main will have 3,983 lines. Accordingly, the selected function call 21 b is inlined. Then, the function call 21 c is selected. If the function call 21 c is inlined, the function main will have 4,002 lines. Accordingly, the optimization ends without inlining the selected function call 21 c.
  • the second method evaluates the function calls 21 a , 21 b , and 21 c , and preferentially selects function calls with higher evaluation values.
  • the function calls 21 c , 21 a , and 21 b are selected in this order based on the evaluation values.
  • the source code 21 is converted into source code 23 . More specifically, the function call 21 c is first selected. If the function call 21 c is inlined, the function main will have 24 lines. Accordingly, the selected function call 21 c is inlined. Then, the function call 21 a is selected. If the function call 21 a is inlined, the function main will have 2,023 lines. Accordingly, the selected function call 21 a is inlined. Then, the function call 21 b is selected. If the function call 21 b is inlined, the function main will have 4,002 lines. Accordingly, the optimization ends without inlining the selected function call 21 b.
  • Object code corresponding to the source code 23 generated by the second method has less function calls than the object code corresponding to the source code 22 generated by the first method. This indicates that the performance is improved.
  • FIG. 5 illustrates an example of inline expansion of hierarchical function calls.
  • a function 31 includes a function call that calls a function (subA), a function call that calls a function 33 (subB), and a function call that calls a function 34 (subC).
  • the function includes a function call that calls a function 35 (subX) and a function call that calls a function 36 (subY).
  • the function 34 includes a function call that calls a function 37 (subZ).
  • FIG. 6 illustrates an example of a function call graph 40 .
  • the analysis unit 141 analyzes intermediate code stored in the intermediate code storage unit 134 , and thereby generates the function call graph 40 .
  • the function call graph 40 includes nodes representing functions and links representing function calls.
  • the function call graph is a graph representing hierarchical function calls, and has a tree structure or a structure similar to a tree. More specifically, the function call graph 40 is the same as a tree in having a single root node. However, the function call graph 40 is different from a tree in that multiple links may be created between the same two nodes, and in that different parent nodes may be connected to the same child node.
  • the function call graph 40 includes nodes corresponding to functions # 1 through # 12 and links corresponding to function calls #A through #K.
  • # 1 through # 12 are IDs assigned to the functions in the intermediate code by the analysis unit 141 .
  • #A through #M are IDs assigned to the function calls in the intermediate code by the analysis unit 141 .
  • the function # 12 includes the function call #A that calls the function # 9 , the function call #B that calls the function # 10 , and the function call #C that calls the function # 11 .
  • the function # 11 includes the function call #D that calls the function # 7 and the function call #E that calls the function # 8 .
  • the function # 9 includes the function call #F that calls the function # 5 and the function call #G that calls the function # 6 .
  • the function # 8 includes the function call #H that calls the function # 3 , the function call #I that calls the function # 3 , and the function call #J that calls the function # 4 .
  • the function # 6 includes the function call #K that calls the function # 1 and the function call #L that calls the function # 2 .
  • the function # 7 includes the function call #M that calls the function # 2 .
  • the evaluation values of the function calls #A through #M may be calculated by scanning all the functions # 1 through # 12 twice in accordance with the function call graph 40 .
  • the first scan is for scanning the functions in breadth-first order from the root to the leaves of the function call graph 40 , and may be regarded as a forward function scan. More specifically, in the first scan, the analysis unit 141 scans the functions # 1 through # 12 in the order of the functions # 12 , # 11 , # 10 , # 9 , # 8 , # 7 , # 6 , # 5 , # 4 , # 3 , # 2 , and # 1 .
  • the second scan is for scanning the functions in reverse order to the order of the first scan, and may be regarded as a backward function scan.
  • the analysis unit 141 scans the functions # 1 through # 12 in the order of the functions # 1 , # 2 , # 3 , # 4 , # 5 , # 6 , # 7 , # 8 , # 9 , # 10 , # 11 , and # 12 .
  • FIG. 7 illustrates an example of a queue 161 and a stack 162 .
  • the queue 161 and the stack 162 are storage areas provided in the control information storage unit 143 . Each of the queue 161 and the stack 162 stores function IDs that identify the functions # 1 through # 12 .
  • the queue 161 has a First In First Out (FIFO) data structure, and allows the first inserted function ID to be extracted first.
  • the stack 162 has a Last In First Out (LIFO) data structure, and allows the last inserted function ID to be extracted first.
  • FIFO First In First Out
  • LIFO Last In First Out
  • the analysis unit 141 inserts the function ID of the detected function into the queue 161 and the stack 162 .
  • the function ID inserted in the queue 161 is used in the subsequent steps of the forward function scan.
  • the analysis unit 141 extracts a function ID from the end of the queue 161 (from the opposite side of the entrance).
  • the function ID inserted in the stack 162 is used in the second scan (backward function scan) described above.
  • the analysis unit 141 extracts a function ID from the top (entrance) of the stack 162 .
  • FIG. 8 illustrates an example of function data 163 .
  • the analysis unit 141 Upon inserting a function ID into the queue 161 and the stack 162 , the analysis unit 141 generates the function data 163 .
  • the function data 163 is stored in the control information storage unit 143 .
  • the function data 163 includes records corresponding to the respective functions. Each record includes the following items: function ID, address, caller, and callee.
  • the item “address” indicates the start position of the function.
  • the item “caller” indicates the address of another function that calls the function.
  • the item “caller” may include addresses of a plurality of other functions. However, in the record corresponding to the function at the root (the function # 12 in the example of FIG. 6 ), the item “caller” is empty.
  • the item “callee” indicates the address of another function that is called by the function, and a function call ID that identifies the function call.
  • the item “callee” may include addresses of a plurality of other functions and a plurality of function call IDs. However, in the records corresponding to the functions at the leaf nodes (the functions # 1 , # 2 , # 3 , # 4 , # 5 , and # 10 in the example of FIG. 6 ), the item “callee” is empty.
  • a record corresponding to the function # 8 includes a function ID “8”, an address “0x0888”, a caller “0x1111”, and callees “0x0333, H”, “0x0333, I”, and “0x0444, J”.
  • “0x1111” is the address of the function # 11 ;
  • “0x0333” is the address of the function # 3 ;
  • “0x0444” is the address of the function # 4 .
  • FIG. 9 illustrates an example of function call index data 164 and function index data 165 .
  • the analysis unit 141 extracts index values for each function call from the intermediate code, and generates the function call index data 164 including the extracted index values.
  • the function call index data 164 is stored in the control information storage unit 143 .
  • the function call index data 164 includes records corresponding to the respective function calls. Each record includes the following items: function call ID, loop count, innermost loop flag, and number of non-pipelined instructions.
  • the item “loop count” indicates how many times the loop to which the function call belongs iterates. If there is no loop in the block (a unit of compilation) to which the function call belongs, the loop count is set to 0. In the case where the iteration count is not known from the intermediate code (for example, in the case where the iteration count is determined dynamically during execution), the loop count may be set to a predetermined value such as 0 or other values.
  • the item “innermost loop flag” indicates whether the function call belongs to the innermost loop (whether there is no loop in the loop to which the function call belongs). If there is no loop in the block to which the function call belongs or if there is no loop in the loop to which the function call belongs, the innermost loop flag is set to True.
  • the item “number of non-pipelined instructions” indicates how many instructions are not pipelined, among instructions included in the block to which the function call belongs. The type of instructions that are not pipelined depends on the architecture of the target processor. An example non-pipelined instructions is a SIMD instruction.
  • the analysis unit 141 extracts index values for each function from the intermediate code, and generates the function index data 165 including the extracted index values.
  • the function index data 165 is stored in the control information storage unit 143 .
  • the function index data 165 includes records corresponding to the respective functions. Each record includes the following items: function ID, loop count, number of source code lines, number of intermediate code instructions, user directive flag, number of function calls, and number of non-pipelined instructions.
  • the item “loop count” indicates how many times the loop included in the function iterates. If there is no loop in the function, the loop count is set to 0. In the case where the iteration count is not known from the intermediate code (for example, in the case where the iteration count is determined dynamically during execution), the loop count may be set to a predetermined value such as 0 or other values.
  • the item “number of source code lines” indicates how many lines of the source code define the function. Note that “the number of lines” includes only the number of lines of actual statements, and does not include the number of lines of function names, brackets, and comments.
  • the item “number of intermediate code instructions” indicates the number of instructions in the intermediate code defining the function.
  • the user directive flag indicates whether a directive for inline expansion of the function is added.
  • the directive for inline expansion is written in the source code by the user. If a directive for inline expansion is added, the user directive flag is set to True.
  • the item “number of function calls” indicates how many function call instructions are included in the function.
  • the item “number of non-pipelined instructions” indicates how many instructions are not pipelined, among instructions included in the function.
  • FIG. 10 illustrates an example of function call index extraction.
  • Source code 24 is an example of source code included in the source file 121 .
  • the source code 24 includes the function # 11 (represented as “func 11 ”).
  • the function # 11 includes the function call #E that calls the function # 8 (represented as “func 8 ”).
  • the function call #E belongs to a loop that iterates 100 times. That is, the function # 8 is repeatedly called 100 times.
  • the source code 24 is converted into intermediate code 51 by the intermediate code generation unit 133 .
  • the intermediate code 51 is stored in the intermediate code storage unit 134 .
  • the analysis unit 141 in a forward function scan, the analysis unit 141 generates a record 164 a corresponding to the function call #E, and adds the record 164 a to the function call index data 164 .
  • the record 164 a includes a function call ID “E”. Further, since the loop to which the function call #E belongs iterates 100 times, the record 164 a includes the loop count “100”. The loop count may be extracted from the intermediate code 51 by detecting an assignment statement for the loop variable. Further, since the function call #E belongs to the innermost loop, the record 164 a includes an innermost loop flag “True”. Further, since the function # 11 does not include any instruction that is not pipelined, the record 164 a includes the number of non-pipelined instructions “0”.
  • FIG. 11 illustrates an example of function index extraction.
  • Source code 25 is an example of source code included in the source file 121 .
  • the source code 25 includes the function # 8 (represented as “func 8 ”).
  • the function # 8 includes the function call #J that calls the function # 4 (represented as “func 4 ”), the function call #I that calls the function # 3 (represented as “func 3 ”), and the function call #H that calls the function # 3 .
  • the function calls #J and #I belong to a loop that iterates 10 times. That is, the functions # 3 and # 4 are called alternately 10 times each.
  • the source code 25 is converted into intermediate code 52 by the intermediate code generation unit 133 .
  • the intermediate code 51 is stored in the intermediate code storage unit 134 .
  • the analysis unit 141 in a backward function scan, the analysis unit 141 generates a record 165 a corresponding to the function # 8 , and adds the record 165 a to the function index data 165 .
  • the record 165 a includes the function ID “8”.
  • the function # 8 since the function # 8 includes a loop that iterates 10 times, the record 165 a includes the loop count “10”.
  • the loop count may be extracted from the intermediate code 52 by detecting an assignment statement for the loop variable.
  • the source code 25 since the source code 25 includes four statements that end in a semicolon, the record 165 a includes the number of source code lines “4”.
  • the intermediate code 52 since the intermediate code 52 includes two “move” instructions, three “callpe” instructions, one “add” instruction, and one “bct” instruction, the record 165 a includes the number of intermediate code instructions “7”.
  • the record 165 a since a directive for inline expansion is not added to the source code 25 , the record 165 a includes a user directive flag “False”. Further, since the function # 8 includes three function call instructions (“callpe” instructions), the record 165 a includes the number of function calls “3”. Further, since the function # 8 does not include any instruction that is not pipelined, the record 165 a includes the number of non-pipelined instructions “0”.
  • FIG. 12 illustrates an example of an evaluation criteria table 166 .
  • the evaluation criteria table 166 indicates a calculation method for calculating an evaluation value of each function call from the function call index data 164 and the function index data 165 .
  • the evaluation criteria table 166 is prepared in advance for each processor architecture, and is stored in the control information storage unit 143 .
  • the reason why the evaluation criteria table 166 is prepared for each architecture is because the instruction cache size and the instruction length vary from one architecture to another, and therefore the criteria for determining whether the performance improves depend on the architecture.
  • the evaluation criteria table 166 includes the following items: architecture name, L1 instruction cache, instruction length, loop count, number of source code lines, number of intermediate code instructions, innermost loop flag, user directive flag, number of function calls, and number of non-pipelined instructions.
  • the item “architecture name” indicates the name of the processor architecture, that is, the type of processor.
  • the item “L1 instruction cache” indicates the size of an L1 instruction cache.
  • the item “instruction length” indicates the size per instruction in the object code. If the size varies from one instruction to another, the item “instruction length” indicates the average size.
  • each index value is converted into an evaluation value using a factor A.
  • the item “loop count” indicates a conversion method for converting the loop count in the function call index data 164 and the function index data 165 into an evaluation value. For example, if the sum of the loop count of a function call and the loop count of a function called by the function call is N, then 10 ⁇ A ⁇ N is added to the evaluation value of the function call. As the loop count increases, the evaluation value increases, because the execution cost increases and consequently because inline expansion provides greater benefits.
  • the item “number of source code lines” indicates a conversion method for converting the number of source code lines in the function index data 165 into an evaluation value. For example, if the number of source code lines of a function that is called by a function call is N, then 10 ⁇ A ⁇ N is added to the evaluation value of the function call.
  • the item “number of intermediate code instructions” indicates a conversion method for converting the number of intermediate code instructions in the function index data 165 into an evaluation value. For example, if the number of intermediate code instructions in a function that is called by a function call is N, then 100 ⁇ A ⁇ N is added to the evaluation value of the function call. As the number of instructions decreases, the evaluation value increases, because the relative overhead of the function call increases and consequently because inline expansion provides greater benefits.
  • the item “innermost loop flag” indicates a conversion method for converting the innermost loop flag in the function call index data 164 into an evaluation value. For example, if the innermost loop flag of a function call is True, then A is added to the evaluation value of the function call. If the innermost loop flag is False, the evaluation value of the function call is not increased. Since optimization of the innermost loop is often very beneficial, the evaluation value of the function call belonging to the innermost loop is increased.
  • the item “user directive flag” indicates a conversion method for converting the user directive flag in the function index data 165 into an evaluation value. For example, if the user directive flag of a function that is called by a function call is True, then 20 ⁇ A is added to the evaluation value of the function call. If the user directive flag is False, the evaluation value of the function call is not increased. This is because when there is a directive from the user, inline expansion is often very beneficial.
  • the item “number of function calls” indicates a conversion method for converting the number of function calls in the function index data 165 into an evaluation value. For example, if the number of function calls (child function call) included in a function that is called by a function call is N, then A ⁇ N is subtracted from the evaluation value of the function call. As the number of child function calls increases, the evaluation value decreases, because inline expansion becomes less effective in reducing the number of function calls.
  • the item “number of non-pipelined instructions” indicates a conversion method for converting the number of non-pipelined instructions in the function call index data 164 and the function index data 165 into an evaluation value. For example, if the sum of the number of non-pipelined instructions of a function call and the number of non-pipelined instructions of a function called by the function call is N, then A ⁇ N is subtracted from the evaluation value of the function call. As the number of non-pipelined instructions increases, the evaluation value decreases, because it becomes more difficult to execute instructions in parallel and consequently because it becomes more likely that the performance decreases.
  • FIG. 13 illustrates an example of an evaluation value table 167 .
  • the analysis unit 141 calculates the evaluation value of each function call based on the function call index data 164 , the function index data 165 , and the evaluation criteria table 166 described above so as to generate the evaluation value table 167 .
  • the evaluation value table 167 is stored in the control information storage unit 143 .
  • the evaluation value table 167 includes the following items: function call ID and evaluation value.
  • the item “function call ID” identifies a function call.
  • the item “evaluation value” indicates an evaluation value calculated for the function call.
  • the analysis unit 141 sorts the function calls #A through #M in descending order of evaluation value, and preferentially select function calls with higher evaluation values as candidates for inline expansion.
  • the analysis unit 141 inlines a selected function call if the number of instructions per function after inline expansion does not exceed a threshold. For example, assume that the evaluation values of the function calls #A through #M are calculated to be 10, 30, 50, 40, 100, 20, 60, 70, 30, 90, 30, 20, and 10, respectively. In this case, the analysis unit 141 selects the function call #E with the highest evaluation value as the first candidate for inline expansion.
  • the analysis unit 141 updates the function data 163 . Further, when a function call is inlined, the index values of one or more functions and the index values of one or more of the other function calls are changed. Then, the analysis unit 141 updates the function call index data 164 and the function index data 165 , and recalculates the evaluation values. However, only the index values of the functions and function calls that are affected by the inline expansion need to be changed, and there is no need to update the index values of all the functions and function calls. Further, only the evaluation values of the function calls that are affected by the updated index values need to be recalculated, and there is no need to recalculate all the evaluation values. The analysis unit 141 sorts the function calls by the recalculated evaluation values, and selects the next candidate for inline expansion.
  • FIG. 14 illustrates an example of function data update.
  • a record 163 a is a record of the function data 163 corresponding to the function # 11 .
  • a record 163 b is a record of the function data 163 corresponding to the function # 8 . If the function call #E that calls the function # 8 from the function # 11 is inlined, the instructions of the function # 8 are inserted into the function # 11 . The function calls #H, #I, and #J included in the function # 8 are also inserted into the function # 11 . Thus, when determining to inline the function call #E, the analysis unit 141 updates the record 163 a as illustrated in FIG. 14 .
  • the function ID and the address in the record 163 a remain the same. Further, since the function call #C that calls the function # 11 is not changed, the caller information in the record 163 a remains the same. On the other hand, since the function call #E is eliminated by inline expansion, the function call #E is deleted from the callee information in the record 163 a . Further, since the function calls #H, #I, and #J included in the function # 8 are taken over to the function # 11 by inline expansion, the function calls #J, #I, and #J are added to the callee information in the record 163 a . Note that in the case where no caller calling the function # 8 exists anymore, the record 163 b may be deleted.
  • FIG. 15 illustrates an example of evaluation value recalculation.
  • the function call #E is inlined in the function call graph 40 .
  • the code of the function # 11 is changed, so that the index values of the function # 11 are changed.
  • the analysis unit 141 updates a record of the function # 11 in the function index data 165 .
  • the surrounding code of the function calls #H, #I, and #J is changed, so that the index values of the function calls #H, #I, and #J are changed.
  • the surrounding code of the function call #D is changed, so that the index values of the function call #D are also changed.
  • the analysis unit 141 updates records of the function calls #D, #H, #I, and #J in the function call index data 164 .
  • the analysis unit 141 recalculates the evaluation values that are affected by the changes. More specifically, since the index values of the function # 11 are changed, the analysis unit 141 recalculates the evaluation value of the function call #C that calls the function # 11 . Further, since the index values of the function calls #D, #H, #I, and #J are changed, the analysis unit 141 recalculates the evaluation values of the function calls #D, #H, #I, and #J. Further, since the function call #E is eliminated, the analysis unit 141 deletes the evaluation value thereof. The function calls #A, #B, #F, #G, #K, #L, and #M are not affected by the inline expansion of the function call #E, and therefore their evaluation values do not need to be recalculated.
  • the evaluation values of the function calls #A through #D and #F through #M are calculated to be 10, 30, 20, 30, 20, 60, 90, 40, 50, 30, 20, and 10, respectively.
  • the analysis unit 141 sorts the function calls #A through #D and #F through #M in descending order of evaluation value, and selects the function call #H with the highest evaluation value as the next candidate for inline expansion.
  • the following describes the procedure of compilation by the compiling apparatus 100 .
  • FIG. 16 is a flowchart illustrating an example of the procedure of compilation.
  • the intermediate code generation unit 133 reads the source code from the source file 121 , and analyzes the source code.
  • the analysis of source code includes lexical analysis, syntactic analysis, and semantic analysis. Then, the intermediate code generation unit 133 converts the source code into intermediate code, and stores the intermediate code in the intermediate code storage unit 134 .
  • the analysis unit 141 extracts functions from the intermediate code stored in the intermediate code storage unit 134 , and scans the functions from the caller to the callee (in the forward direction). In the forward function scan, the analysis unit 141 extracts, for each function call, index values of the function call. The details of the forward function scan will be described below.
  • the analysis unit 141 scans the functions extracted in step S 2 , from the callee to the caller (in the backward direction opposite to that in step S 2 ). In the backward function scan, the analysis unit 141 extracts, for each function, index values of the function. Further, the analysis unit 141 calculates, for each function call, an evaluation value based on the index values of the function call and the index values of the called function. The details of the backward function scan will be described below.
  • the analysis unit 141 selects a function call to be inlined, based on the evaluation values calculated in step S 3 .
  • the optimization execution unit 142 updates the intermediate code stored in the intermediate code storage unit 134 such that the function call selected by the analysis unit 141 is inlined. The details of the inline expansion will be described below.
  • the file output unit 136 converts the assembly code generated by the assembly code generation unit 135 into object code, and writes the object code to the object file 122 .
  • FIG. 17 is a flowchart illustrating an example of the procedure of forward function scan.
  • a forward function scan is executed in step S 2 described above.
  • the analysis unit 141 detects the first function (for example, a main function) from the intermediate code, and inserts the function ID of the first function into the queue 161 and the stack 162 .
  • the first function for example, a main function
  • the analysis unit 141 determines whether the queue 161 is empty. If the queue 161 is empty, the forward function scan ends. If the queue 161 is not empty, the process proceeds to step S 12 .
  • the analysis unit 141 extracts a function ID from the queue 161 .
  • the function ID to be extracted is one that is inserted first among the function IDs stored in the queue 161 .
  • a function indicated by the function ID that is extracted in this step is referred to as a function F 1 .
  • the analysis unit 141 generates a record corresponding to the function F 1 , and adds the record to the function data 163 .
  • the function ID in the generated record is identification information assigned to the function F 1 .
  • the address in the generated record is the start address of the function F 1 in the intermediate code.
  • the analysis unit 141 refers to the intermediate code, and determines whether the function F 1 includes a function call. If the function F 1 includes a function call, the process proceeds to step S 15 . If the function F 1 does not include any function call, the process returns to step S 11 . In the former case, in FIG. 17 , the function call included in the function F 1 is referred to as a function call C 1 . Note that in the case where a plurality of function calls are included in the function F 1 , the operations in steps S 15 through S 19 (described below) are performed for each of the function calls included in the function F 1 .
  • the analysis unit 141 registers the information on the function call C 1 as a callee, in the record generated in step S 13 . More specifically, the analysis unit 141 registers the address of a function that is called by the function call C 1 , and the identification information assigned to the function call C 1 .
  • the analysis unit 141 extracts index values for the function call C 1 , and adds the index values to the function call index data 164 .
  • the details of the function call index extraction will be described below.
  • the analysis unit 141 determines whether the function that is called by the function call C 1 has been detected, that is, whether the function ID of the called function is in the stack 162 . If the function ID has been detected, the process proceeds to step S 19 . If the function ID has not been detected, the process proceeds to step S 18 . In FIG. 17 , the function that is called by the function call C 1 is referred to as a function F 2 .
  • the analysis unit 141 inserts the function ID of the function F 2 (the function ID of a child function in the function call graph 40 ) into the queue 161 and the stack 162 .
  • the analysis unit 141 stores the address of the function F 1 as the caller calling the function F 2 . If a record corresponding to the function F 2 is present in the function data 163 , the analysis unit 141 registers the address of the function F 1 in the record. If a record corresponding to the function F 2 is not present in the function data 163 , the analysis unit 141 stores the address of the function F 1 separately such that when the record is generated, the address of the function F 1 is registered in step S 13 . Then, the process returns to step S 11 .
  • FIG. 18 is a flowchart illustrating the procedure of function call index extraction.
  • the function call index extraction is executed in step S 16 described above.
  • the analysis unit 141 specifies a block to which the function call C 1 (the function call included in the function F 1 in step S 14 described above) belongs.
  • a block is a unit of intermediate code representing a set of operations, and is a unit of compilation processing.
  • the block specified in this step is referred to as a block B 1 .
  • step S 21 The analysis unit 141 determines whether the block B 1 includes a loop. If a loop is included, the process proceeds to step S 22 . If no loop is included, the process proceeds to step S 23 .
  • the analysis unit 141 extracts the loop count from the intermediate code.
  • the analysis unit 141 determines whether the function call C 1 is inside the innermost loop. If the function call C 1 does not belong to any loop, the determination is False. If the function call C 1 belongs to a loop (a single loop) that is not a multiple loop, the determination is True. If the block B 1 includes a multiple loop, and the function call C 1 is outside the innermost loop thereof, the determination is False. If the block B 1 includes a multiple loop, and the function call C 1 is inside the innermost loop thereof, the determination is True.
  • the analysis unit 141 selects an instruction in the block B 1 in the intermediate code.
  • the instruction selected in this step is referred to as an instruction I 1 .
  • the analysis unit 141 determines whether the instruction I 1 is allowed to be pipelined. Whether the instruction I 1 is allowed to be pipelined depends on the architecture of the processor that executes the instruction. Examples of instructions allowed to be pipelined include arithmetic instructions, logical instructions, memory access instructions, and so on. Examples of instructions not allowed to be pipelined include complex instructions such as SIMD instructions and so on. If the instruction I 1 is allowed to be pipelined, the process proceeds to step S 27 . If not, the process proceeds to step S 26 .
  • the analysis unit 141 increments the number of non-pipelined instructions by 1.
  • step S 27 The analysis unit 141 determines whether all the instructions in the block B 1 have been selected in step S 24 . If all the instructions in the block B 1 have been selected, the process proceeds to step S 28 . If there is any unselected instruction, the process returns to step S 24 .
  • the analysis unit 141 generates a record corresponding to the function call C 1 .
  • the analysis unit 141 registers, in the record, the loop count extracted in step S 22 , the innermost loop flag indicating the determination result of step S 23 , and the number of non-pipelined instructions that is counted in step S 26 .
  • the analysis unit 141 adds the record to the function call index data 164 .
  • FIG. 19 is a flowchart illustrating an example of the procedure of backward function scan.
  • a backward function scan is executed in step S 3 described above.
  • step S 30 The analysis unit 141 determines whether the stack 162 is empty. If the stack 162 is empty, the backward function scan ends. If the stack 162 is not empty, the process proceeds to step S 31 .
  • the analysis unit 141 extracts a function ID from the stack 162 .
  • the function ID to be extracted is one that is inserted last among the function IDs stored in the stack 162 .
  • a function indicated by the function ID that is extracted in this step is referred to as a function F 1 .
  • the analysis unit 141 refers to a record of the function data 163 corresponding to the function F 1 , and determines whether there is a function that calls the function F 1 . If there is a function that calls the function F 1 , the process proceeds to step S 33 . If not, the process returns to step S 30 .
  • the analysis unit 141 extracts index values for the function F 1 , and adds the index values to the function index data 165 .
  • the details of the function index extraction will be described below.
  • the analysis unit 141 refers to the record of the function data 163 corresponding to the function F 1 , and determines whether the function F 1 includes a function call. If the function F 1 includes a function call, the process proceeds to step S 35 . If the function F 1 does not include any function call, the process returns to step S 30 . In the former case, in FIG. 19 , the function call included in the function F 1 is referred to as a function call C 1 . Note that in the case where a plurality of function calls are included in the function F 1 , the operations in steps S 35 through S 37 (described below) are performed for each of the function calls included in the function F 1 .
  • the analysis unit 141 retrieves index values of the function call C 1 from the function call index data 164 .
  • the retrieved index values include the loop count, an innermost loop flag, and the number of non-pipelined instructions. In FIG. 19 , the retrieved index values are referred to as index values P 1 .
  • the analysis unit 141 specifies a function that is called by the function call C 1 , and retrieves index values of the called function from the function index data 165 .
  • the retrieved index values include the loop count, the number of source code lines, the number of intermediate code instructions, a user directive flag, the number of function calls, and the number of non-pipelined instructions. In FIG. 19 , the retrieved index values are referred to as index values P 2 .
  • the analysis unit 141 calculates an evaluation value of the function call C 1 from the retrieved index values P 1 and P 2 . That is, the analysis unit 141 converts the retrieved index values P 1 and P 2 into an evaluation value, based on the evaluation criteria table 166 . In the case where a plurality of evaluation criteria tables are stored in the control information storage unit 143 , the analysis unit 141 selects an evaluation criteria table corresponding to the architecture of the target processor. The analysis unit 141 registers the calculated evaluation value in the evaluation value table 167 . Then, the process returns to step S 30 .
  • FIG. 20 is a flowchart illustrating an example of the procedure of function index extraction.
  • the function index extraction is executed in step S 33 described above.
  • the analysis unit 141 retrieves source code of the function F 1 (the function in step S 31 described above). The analysis unit 141 calculates the number of source code lines of the function F 1 by counting the actual statements (lines that end in a semicolon) included in the retrieved source code.
  • the analysis unit 141 refers to the record of the function data 163 corresponding to the function F 1 , and retrieves intermediate code of the function F 1 .
  • the analysis unit 141 calculates the number of intermediate code instructions of the function F 1 by counting the instructions included in the retrieved intermediate code.
  • the analysis unit 141 determines whether an inlining directive (additional information provided for control purposes and indicating inline expansion) is added to the source of code of the function F 1 .
  • step S 43 The analysis unit 141 determines whether the function F 1 includes a loop. If a loop is included, the process proceeds to step S 44 . If no loop is included, the process proceeds to step S 45 .
  • the analysis unit 141 extracts the loop count from the intermediate code.
  • the analysis unit 141 selects an instruction in the function F 1 in the intermediate code.
  • the instruction selected in this step is referred to as an instruction I 1 .
  • step S 46 The analysis unit 141 determines whether the instruction I 1 is allowed to be pipelined. If the instruction I 1 is allowed to be pipelined, the process proceeds to step S 48 . If not, the process proceeds to step S 47 .
  • the analysis unit 141 determines whether the instruction I 1 is a function call instruction (corresponding to the “callpe” instruction in FIG. 11 ). If the instruction I 1 is a function call instruction, the process proceeds to step S 49 . If not, the process proceeds to step S 50 .
  • step S 50 The analysis unit 141 determines whether all the instructions in the function F 1 have been selected in step S 45 . If all the instructions in the function F 1 have been selected, the process proceeds to step S 51 . If there is any unselected instruction, the process returns to step S 45 .
  • the analysis unit 141 generates a record corresponding to the function F 1 .
  • the analysis unit 141 registers, in the record, the loop count extracted in step S 44 , the number of source code lines and the number of intermediate code instructions calculated in steps S 40 and S 41 , and the user directive flag indicating the determination result of step S 42 . Further, the analysis unit 141 registers, in the record, the number of function calls counted in step S 40 and the number of non-pipelined instructions counted in step S 47 .
  • the analysis unit 141 adds the record to the function index data 165 .
  • FIG. 21 illustrates an example of the procedure of inline expansion.
  • the inline expansion is executed in step S 4 described above.
  • the analysis unit 141 sorts the function calls in descending order of evaluation value, based on the evaluation value table 167 storing the calculated evaluation values.
  • the analysis unit 141 selects the function call with the highest evaluation value, from the unselected function calls. However, function calls having been inlined are excluded. Further, the state of having been selected is cancelled when the function calls that are not inlined are sorted again in step S 69 (described below).
  • the function call selected in this step is referred to as a function call C 1 ; a function that calls the function call C 1 is referred to as a function F 1 ; and a function that is called by the function call C 1 is referred to as a function F 2 .
  • the analysis unit 141 determines whether the sum of the number of instructions in the function F 1 and the number of instructions in the function F 2 is equal to or less than a threshold.
  • the number of instructions in the function F 1 and the number of instructions in the function F 2 may be specified by referring to records of the function index data 165 corresponding to the functions F 1 and F 2 .
  • the analysis unit 141 determines to inline the function call C 1 .
  • the optimization execution unit 142 makes an update to the intermediate code stored in the intermediate code storage unit 134 so as to inline the function call C 1 in accordance with the determination by the analysis unit 141 .
  • the analysis unit 141 updates the record of the function data 163 corresponding to the function F 1 . That is, the analysis unit 141 deletes information (address and function call ID) on the function call C 1 from the record corresponding to the function F 1 . Further, the analysis unit 141 registers, in the record corresponding to the function F 1 , information on a function call included in the function F 2 .
  • the analysis unit 141 extracts index values for the function F 1 again, based on the intermediate code of the updated function F 1 .
  • the index values that are extracted again include the loop count, the number of source code lines, the number of intermediate code instructions, a user directive flag, the number of function calls, and the number of non-pipelined instructions.
  • the analysis unit 141 updates the record of the function index data 165 corresponding to the function F 1 .
  • the analysis unit 141 refers to the record of the function data 163 corresponding to the function F 1 that is updated in step S 64 , and determines whether the function F 1 includes a function call. If the function F 1 includes a function call, the process proceeds to step S 67 . If the function F 1 does not include any function call, the process returns to step S 69 . In the former case, in FIG. 21 , the function call included in the function F 1 is referred to as a function call C 2 . Note that in the case where a plurality of function calls are included in the function F 1 , the operations in steps S 67 and S 68 (described below) are performed for each of the function calls.
  • the analysis unit 141 extracts index values for the function call C 2 , based on the updated intermediate code of the function F 1 .
  • the extracted index values include the loop count, an innermost loop flag, and the number of non-pipelined instructions.
  • the analysis unit 141 updates a record of the function call index data 164 corresponding to the function call C 2 .
  • the analysis unit 141 retrieves the index values of the function call C 2 from the function call index data 164 . Further, the analysis unit 141 retrieves index values of the called function from the function index data 165 . The analysis unit 141 calculates an evaluation value of the function call C 2 , based on the retrieved index values and the evaluation criteria table 166 . The analysis unit 141 updates the evaluation value of the function call C 2 in the evaluation value table 167 .
  • the analysis unit 141 refers to the function data 163 , and detects a function call that calls the function F 1 .
  • the function call detected in this step is referred to as a function call C 3 .
  • the analysis unit 141 retrieves index values of the function call C 3 from the function call index data 164 . Further, the analysis unit 141 retrieves the index values of the function F 1 from the function index data 165 .
  • the analysis unit 141 recalculates the evaluation value of the function call C 3 , based on the retrieved index values and the evaluation criteria table 166 .
  • the analysis unit 141 updates the evaluation value of the function call C 3 in the evaluation value table 167 . Then, the analysis unit 141 sorts again the function calls in descending order of evaluation value, based on the evaluation value table 167 .
  • step S 70 The analysis unit 141 determines whether all the selectable function calls have been selected in step S 61 . If all the function calls have been selected, the inline expansion ends. If there is any unselected function call, the process returns to step S 61 .
  • the loop count, an innermost loop flag, and the number of non-pipelined instructions are extracted from the code of each calling function. Further, the loop count, the number of source code lines, the number of intermediate code instructions, a user directive flag, the number of function calls, and the number of non-pipelined instructions are extracted from the code of each called function. Then, an evaluation value of each function call is calculated based on these index values, and function calls with higher evaluation values are preferentially inlined. Thus, it is possible to preferentially select function calls whose inline expansion provides greater benefits. Accordingly, it is possible to improve the performance of the object code, compared to the case of using a method that selects function calls sequentially from the top of the code or a method that selects function calls sequentially from the bottom of the hierarchical structure.
  • the information processing in the first embodiment may be implemented by causing the information processing apparatus 10 to execute a program. Further, the information processing in the second embodiment may be implemented by causing the compiling apparatus 100 to execute a program.
  • the program may be recorded in a computer-readable storage medium (for example, the storage medium 113 ).
  • storage media include magnetic disks, optical discs, magneto-optical disks, semiconductor memories, and the like.
  • Magnetic disks include FD and HDD.
  • Optical discs include CD, CD-Recordable (CD-R), CD-Rewritable (CD-RW), DVD, DVD-R, and DVD-RW.
  • the program may be stored in a portable storage medium and distributed. In this case, the program may be copied (installed) from the portable storage medium to another storage medium such as an HDD or the like (for example, the HDD 103 ) so as to be executed.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
US15/070,048 2015-04-28 2016-03-15 Information processing apparatus and compiling method Active 2036-04-29 US9760354B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015091855A JP6572610B2 (ja) 2015-04-28 2015-04-28 情報処理装置、コンパイル方法およびコンパイルプログラム
JP2015-091855 2015-04-28

Publications (2)

Publication Number Publication Date
US20160321048A1 US20160321048A1 (en) 2016-11-03
US9760354B2 true US9760354B2 (en) 2017-09-12

Family

ID=57204045

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/070,048 Active 2036-04-29 US9760354B2 (en) 2015-04-28 2016-03-15 Information processing apparatus and compiling method

Country Status (2)

Country Link
US (1) US9760354B2 (ja)
JP (1) JP6572610B2 (ja)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102063966B1 (ko) * 2015-10-21 2020-01-09 엘에스산전 주식회사 Plc 명령어 컴파일 최적화 방법
US10783146B2 (en) 2016-07-19 2020-09-22 Sap Se Join operations in hybrid main memory systems
US10474557B2 (en) 2016-07-19 2019-11-12 Sap Se Source code profiling for line-level latency and energy consumption estimation
US10452539B2 (en) 2016-07-19 2019-10-22 Sap Se Simulator for enterprise-scale simulations on hybrid main memory systems
US11977484B2 (en) 2016-07-19 2024-05-07 Sap Se Adapting in-memory database in hybrid memory systems and operating system interface
US10540098B2 (en) 2016-07-19 2020-01-21 Sap Se Workload-aware page management for in-memory databases in hybrid main memory systems
US10698732B2 (en) 2016-07-19 2020-06-30 Sap Se Page ranking in operating system virtual pages in hybrid memory systems
US10387127B2 (en) 2016-07-19 2019-08-20 Sap Se Detecting sequential access data and random access data for placement on hybrid main memory for in-memory databases
US10083183B2 (en) * 2016-07-19 2018-09-25 Sap Se Full system simulator and memory-aware splay tree for in-memory databases in hybrid memory systems
US10437798B2 (en) 2016-07-19 2019-10-08 Sap Se Full system simulator and memory-aware splay tree for in-memory databases in hybrid memory systems
US10261763B2 (en) * 2016-12-13 2019-04-16 Palantir Technologies Inc. Extensible data transformation authoring and validation system
US11010379B2 (en) 2017-08-15 2021-05-18 Sap Se Increasing performance of in-memory databases using re-ordered query execution plans
US10754763B2 (en) * 2018-07-09 2020-08-25 International Business Machines Corporation Bypassing user-selected functions during test case generation
US11327802B2 (en) * 2019-07-31 2022-05-10 Microsoft Technology Licensing, Llc System and method for exporting logical object metadata

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05120029A (ja) 1991-10-29 1993-05-18 Hitachi Ltd 最適化方法
JPH06202875A (ja) 1992-12-28 1994-07-22 Nec Corp インライン展開による最適化を行うコンパイラ
US6292940B1 (en) * 1998-01-26 2001-09-18 Nec Corporation Program complete system and its compile method for efficiently compiling a source program including an indirect call for a procedure
JP2001282546A (ja) 2000-03-30 2001-10-12 Matsushita Electric Ind Co Ltd プログラム変換装置、プログラム変換方法及びプログラム記録媒体
US6367071B1 (en) * 1999-03-02 2002-04-02 Lucent Technologies Inc. Compiler optimization techniques for exploiting a zero overhead loop mechanism
US20050076172A1 (en) * 2003-07-16 2005-04-07 Infology Pty Limited Dba/Muvium.Com Architecture for static frames in a stack machine for an embedded device
US20050193359A1 (en) * 2004-02-13 2005-09-01 The Regents Of The University Of California Method and apparatus for designing circuits using high-level synthesis
US20110238954A1 (en) * 2010-03-25 2011-09-29 Fuji Xerox Co., Ltd. Data processing apparatus
US20140245274A1 (en) * 2013-02-22 2014-08-28 International Business Machines Corporation Determining a method to inline using an actual footprint calculation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04336333A (ja) * 1991-05-13 1992-11-24 Nec Corp 原始プログラムの自動インライン展開方式
US5740443A (en) * 1995-08-14 1998-04-14 International Business Machines Corporation Call-site specific selective automatic inlining
JP2004102597A (ja) * 2002-09-09 2004-04-02 Fujitsu Ltd コンパイル処理プログラム、コンパイル処理方法、およびコンパイル処理プログラム記録媒体
JP2006065682A (ja) * 2004-08-27 2006-03-09 Fujitsu Ltd コンパイラプログラム、コンパイル方法およびコンパイラ装置

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05120029A (ja) 1991-10-29 1993-05-18 Hitachi Ltd 最適化方法
JPH06202875A (ja) 1992-12-28 1994-07-22 Nec Corp インライン展開による最適化を行うコンパイラ
US6292940B1 (en) * 1998-01-26 2001-09-18 Nec Corporation Program complete system and its compile method for efficiently compiling a source program including an indirect call for a procedure
US6367071B1 (en) * 1999-03-02 2002-04-02 Lucent Technologies Inc. Compiler optimization techniques for exploiting a zero overhead loop mechanism
JP2001282546A (ja) 2000-03-30 2001-10-12 Matsushita Electric Ind Co Ltd プログラム変換装置、プログラム変換方法及びプログラム記録媒体
US20050076172A1 (en) * 2003-07-16 2005-04-07 Infology Pty Limited Dba/Muvium.Com Architecture for static frames in a stack machine for an embedded device
US20050193359A1 (en) * 2004-02-13 2005-09-01 The Regents Of The University Of California Method and apparatus for designing circuits using high-level synthesis
US20110238954A1 (en) * 2010-03-25 2011-09-29 Fuji Xerox Co., Ltd. Data processing apparatus
US20140245274A1 (en) * 2013-02-22 2014-08-28 International Business Machines Corporation Determining a method to inline using an actual footprint calculation

Also Published As

Publication number Publication date
US20160321048A1 (en) 2016-11-03
JP2016207161A (ja) 2016-12-08
JP6572610B2 (ja) 2019-09-11

Similar Documents

Publication Publication Date Title
US9760354B2 (en) Information processing apparatus and compiling method
US8291398B2 (en) Compiler for optimizing program
JP4794437B2 (ja) 編集処理中にプログラムコンポーネントの整合性を表現しチェックするための拡張型システム
US7856627B2 (en) Method of SIMD-ization through data reshaping, padding, and alignment
EP3572952A1 (en) Unified optimization of iterative analytical query processing
US8484630B2 (en) Code motion based on live ranges in an optimizing compiler
KR20130031896A (ko) 결합된 분기 타깃 및 프레디킷 예측
US8943484B2 (en) Code generation method and information processing apparatus
JP6432450B2 (ja) 並列計算装置、コンパイル装置、並列処理方法、コンパイル方法、並列処理プログラムおよびコンパイルプログラム
JP5966509B2 (ja) プログラム、コード生成方法および情報処理装置
US10127025B2 (en) Optimization techniques for high-level graph language compilers
US8468508B2 (en) Parallelization of irregular reductions via parallel building and exploitation of conflict-free units of work at runtime
US20150220315A1 (en) Method and apparatus for compiling
US9213548B2 (en) Code generation method and information processing apparatus
US20080046871A1 (en) Array value substitution and propagation with loop transformations through static analysis
JPH04213118A (ja) プログラム翻訳装置およびプログラム翻訳方法
US10108405B2 (en) Compiling apparatus and compiling method
US20090187897A1 (en) Compiling method and compiling program
US10235165B2 (en) Creating optimized shortcuts
US8856765B2 (en) Analyzing a pointer in an analysis target program or a partial program
US11579853B2 (en) Information processing apparatus, computer-readable recording medium storing compiling program, and compiling method
US20160371066A1 (en) Computer that performs compiling, compiling method and storage medium that stores compiler program
JP3638171B2 (ja) 資源割付装置
JP2019185486A (ja) コード変換装置、コード変換方法、及びコード変換プログラム
US20050183079A1 (en) Tail duplicating during block layout

Legal Events

Date Code Title Description
AS Assignment

Owner name: FUJITSU LIMITED, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MATSUURA, TAKAYUKI;REEL/FRAME:037995/0327

Effective date: 20160218

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4