WO2009091493A2 - Block count based procedure layout and splitting - Google Patents

Block count based procedure layout and splitting Download PDF

Info

Publication number
WO2009091493A2
WO2009091493A2 PCT/US2008/088483 US2008088483W WO2009091493A2 WO 2009091493 A2 WO2009091493 A2 WO 2009091493A2 US 2008088483 W US2008088483 W US 2008088483W WO 2009091493 A2 WO2009091493 A2 WO 2009091493A2
Authority
WO
WIPO (PCT)
Prior art keywords
code block
chains
act
ordering
code
Prior art date
Application number
PCT/US2008/088483
Other languages
English (en)
French (fr)
Other versions
WO2009091493A3 (en
Inventor
Grant A. Richins
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to BRPI0821772-6A priority Critical patent/BRPI0821772A2/pt
Priority to CN200880125324.0A priority patent/CN101918917B/zh
Priority to EP08870630A priority patent/EP2250551A4/de
Publication of WO2009091493A2 publication Critical patent/WO2009091493A2/en
Publication of WO2009091493A3 publication Critical patent/WO2009091493A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44557Code layout in executable memory

Definitions

  • Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g., word processing, scheduling, accounting, etc.) that prior to the advent of the computer system were performed manually. More recently, computer systems have been coupled to one another and to other electronic devices to form both wired and wireless computer networks over which the computer systems and other electronic devices can transfer electronic data. Accordingly, the performance of many computing tasks are distributed across a number of different computer systems and/or a number of different computing components. [0002] As computerized systems have increased in popularity, so have the complexity of the software and hardware employed within such systems.
  • typical processors have a natural preference for where to look for a next instruction when encountering a conditional branch.
  • many processors have algorithms for identifying the next instruction to execute upon reaching a conditional branch. These algorithms typically include various assumptions and/or predictions on how instructions are typically executed. For example, many such algorithms assume that if a conditional branch is forward (i.e., to a code block placed after the current code block in system memory) that the branch is not to be taken. On the other hand, these algorithms can also assume that if a branch is backwards (i.e., to a code block placed before the current code block in system memory) that the branch is to be taken.
  • executable code can include forward branches to frequently utilized code blocks and/or backward braches to less frequently utilized code.
  • calculation of block counts does not necessarily provide an accurate representation of how many times particular arcs were taken. That is, an indication of code block execution for a code block does not necessarily indicate which other code block was executed before or after the code block. Further, it may be difficult to gather accurate arc counts when code blocks call themselves recursively. [0008] Nonetheless, some mechanisms gather block counts and attempt to compute accurate arc counts. However, it is very difficult if not impossible to derive accurate arc counts from block counts for all possible combinations of blocks and arcs. As such, computing arc counts using these mechanisms are either approximations or have some degree of inaccuracy.
  • the present invention extends to methods, systems, and computer program products for block count based procedure layout and splitting.
  • a computer system accesses executable procedure that includes a plurality of different code blocks.
  • the executable procedure has a starting code block and one or more conditional statements that separate execution of the procedure into other code blocks.
  • the computer system accesses block count data for the executable procedure.
  • the block count data includes a block count for each code block indicating how many times the code block was executed during one or more prior executions of the executable procedure.
  • the computer system creates one or more code block chains of one or more code blocks for the executable procedure based on the accessed block count.
  • code block chain creation for each of the one or more code block chains includes accessing a code block with the highest block count that is not already included in code block chain. For each accessed code block, it is determined if the accessed code block is a successor to the last code block in an existing chain. For each accessed code block, it is also determined if the accessed code block is a predecessor to the first code block in an existing chain. The accessed code block is assigned to the one or more code block chains based on the results of determining if the accessed block is a successor and/or predecessor.
  • Assigning a code block to the one or more code block chains can include: A) creating a new code block chain and assigning the accessed code block to the new code block chain or B) assigning the accessed code block to an existing code block chain.
  • two existing code block chains can also be merged into a single code block chain.
  • at least an after set (and possible also a before set) of chains is collected for each code block chain.
  • an after set indicates a preference that the specified code block chain succeeds a set of other code block chains within an ordering of the one or more code block chains.
  • a before set of chains indicates a preference that the specified code block chain precede a set of the other code block chains within an ordering of the one or more code block chains.
  • Ordering code block chains can include creating an ordering for any non-zero count code block chains including connecting any non-zero count code block chains to one another in accordance with precedence relationships. Ordering code block chains can also include creating an ordering for any zero count code block chains including connecting any zero count code block chains to one another in accordance with precedence relationships. Ordering code block chains can include subsequently appending the ordering of zero count code block chains to the end of the ordering of non-zero count code block chains to create a total ordering for the one or more code block chains. Accordingly, any non-zero count blocks are placed before all zero count blocks within system memory when the total ordering is placed into system memory.
  • Code block chain ordering can be based on precedence relationships. Precedence relationships can include giving priority to an ordering that minimizes after set violations.
  • Figure 1 illustrates an example computer architecture that facilitates for block count based procedure layout and splitting.
  • Figure 2A illustrates an example block graph for an executable procedure.
  • Figure 2B illustrates chains of code blocks from the block graph of Figure
  • Figure 2C illustrates after sets and before sets for the chains of code blocks from Figure 2B without transitive closure
  • Figure 2E illustrates a total chain ordering for the chains of code blocks from Figure 2B.
  • Figures 3 illustrates a flow chart of an example method for optimizing the execution of a software procedure.
  • Figures 4 illustrates a flow chart of an example method for optimizing the execution of a software procedure.
  • Figure 5 illustrates an example of connections between code blocks chains. DETAILED DESCRIPTION
  • the present invention extends to methods, systems, and computer program products for block count based procedure layout and splitting.
  • a computer system accesses executable procedure that includes a plurality of different code blocks.
  • the executable procedure has a starting code block and one or more conditional statements that separate execution of the procedure into other code blocks.
  • the computer system accesses block count data for the executable procedure.
  • the block count data includes a block count for each code block indicating how many times the code block was executed during one or more prior executions of the executable procedure.
  • the computer system creates one or more code block chains of one or more code blocks for the executable procedure based on the accessed block count.
  • code block chain creation for each of the one or more code block chains includes accessing a code block with the highest block count that is not already included in code block chain. For each accessed code block, it is determined if the accessed code block is a successor to the last code block in an existing chain. For each accessed code block, it is also determined if the accessed code block is a predecessor to the first code block in an existing chain. The accessed code block is assigned to the one or more code block chains based on the results of determining if the accessed block is a successor and/or predecessor.
  • Assigning a code block to the one or more code block chains can include: A) creating a new code block chain and assigning the accessed code block to the new code block chain or B) assigning the accessed code block to an existing code block chain.
  • two existing code block chains can also be merged into a single code block chain. For example, when a code block is a predecessor to one code block chain and a successor to another code block chain.
  • at least an after set (and possible also a before set) of chains is collected for each code block chain.
  • an after set indicates a preference that the specified code block chain succeeds a set of other code block chains within an ordering of the one or more code block chains.
  • a before set of chains indicates a preference that the specified code block chain precede a set of the other code block chains within an ordering of the one or more code block chains.
  • Ordering code block chains can include creating an ordering for any non-zero count code block chains including connecting any non-zero count code block chains to one another in accordance with precedence relationships. Ordering code block chains can also include creating an ordering for any zero count code block chains including connecting any zero count code block chains to one another in accordance with precedence relationships. Ordering code block chains can include subsequently appending the ordering of zero count code block chains to the end of the ordering of non-zero count code block chains to create a total ordering for the one or more code block chains. Accordingly, any non-zero count blocks are placed before all zero count blocks within system memory when the total ordering is placed into system memory.
  • Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer- executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media.
  • embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: physical storage media and transmission media.
  • Physical storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
  • a "network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
  • a network or another communications connection can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
  • program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to physical storage media (or vice versa).
  • program code means in the form of computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a "NIC"), and then eventually transferred to computer system RAM and/or to less volatile physical storage media at a computer system.
  • a network interface module e.g., a "NIC”
  • NIC network interface module
  • physical storage media can be included in computer system components that also (or even primarily) utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
  • the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
  • the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like.
  • the invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
  • program modules may be located in both local and remote memory storage devices.
  • Figure IA illustrates an example computer architecture 100 that facilitates block count based procedure layout and splitting.
  • computer architecture 100 includes code arrangement module.
  • code arrangement module 101 is configured to receive an executable procedure and block count data for the executable procedure. From the block count data, code arrangement module 101 optimizes the arrangement of code blocks within the executable procedure for placement in system memory. Optimization can include positioning code blocks for more efficient execution, reducing memory overhead, etc.
  • code arrangement module 101 includes chain creation module 102, set collection module 103, and chain ordering module 104.
  • Each of the depicted components can be connected to one another over a system bus and/or over (or be part of) a network, such as, for example, a Local Area Network ("LAN”), a Wide Area Network (“WAN”), and even the Internet. Accordingly, each of the depicted components as well as any other connected components, can create message related data and exchange message related data (e.g., Internet Protocol (“IP”) datagrams and other higher layer protocols that utilize IP datagrams, such as, Transmission Control Protocol (“TCP”), Hypertext Transfer Protocol (“HTTP”), Simple Mail Transfer Protocol (“SMTP”), etc.) over the network.
  • IP Internet Protocol
  • TCP Transmission Control Protocol
  • HTTP Hypertext Transfer Protocol
  • SMTP Simple Mail Transfer Protocol
  • chain creation module 102 is configured to receive an executable procedure and block count data, such as, for example, method 111 including code blocks 201, 202, etc. (e.g., portions of a larger software application) and block count data 112.
  • Executable procedures can include conditional and other statements that cause various different code block paths to be taken during execution.
  • Block count data is data indicating how many times each code block was executed during previous (e.g., non-optimized) executions of the procedure.
  • An executable procedure can be profiled to generate block count data. For example, code blocks can be instrumented with probes that count each time the code block is executed. Subsequently, the instrumented executable procedure can be executed a number times using varied input values.
  • chain creation module 102 can formulate a block graph for the executable procedure. For example, as depicted in Figure 2A, chain creation module 102 can formulate block graph 200 for method 111.
  • Block graph 200 indicates the relative position of code blocks 201 through 213 to one another and a "count" of how many times each code block was executed during profiling.
  • the "count" values indicate how many times a corresponding probe detected that its corresponding code block was executed. Accordingly, it may that chain creation module 102 formulates block graph 200 from method 111 and block count data 112 to facilitate subsequent identification of code block chains within method 111.
  • chain creation module 102 can identify one or more code block chains, such as, for example, code block chains 121.
  • a code block chain includes one more connected code blocks (i.e., a chain of code blocks) are to be grouped together to optimize placement of the executable procedure in system memory.
  • chain creation module 102 attempts to chain code blocks with higher counts together (so that higher count code blocks are close together in system memory).
  • Chain creation module 102 can follow a path between various adjacent higher count code blocks until reaching the end of a block graph or until no further adjacent code blocks are available (e.g., the next adjacent code block has already been included in another code block chain or is a zero count code block).
  • Figure 2B depicts an example of code block chains 121 indentified from code blocks in block graph 200.
  • Code block 201 has the highest count and was before any other blocks with the same count in the original executable code order, so the algorithm selects it first. Since there are no chains to add it to, it is added to a new empty chain 292.
  • Code block 202 has the next highest count. Because it is the successor the last block of chain 292, it is appended to chain 292.
  • Code block 213 has the next highest count. It is not the successor or predecessor of any blocks in any existing chains, and so it is added to a new (and ultimately temporary) chain 298. Next, similar to code block 202, code block 203 is appended to chain 292.
  • Code block 212 is selected next because it has the highest count of the remaining blocks. Because it is the predecessor of code block 213, the first block of chain 298, it is prepended to the start of chain 298. Next code block 205 is selected. It is the successor to chain 292 and the predecessor to chain 298, so code block 205 is appended to chain 292 and then the blocks of chain 298 are appended and chain 298 is discarded.
  • Code block 204 is not a successor to the last block of any chain (only chain 292 at this point) or the predecessor to the first block of any chain, so it is added to a new chain 293.
  • Next code block 208 is appended to chain 293 because it is the successor to the last code block of that chain, code block 204.
  • code block 206 becomes part of a new chain 294, then code block 209 is added to chain 294.
  • Code blocks 207 and 210 cannot be added to any existing chains and so they form new chains 295 and 296 respectively.
  • the last code block to process is code block 211.
  • code block 208 It is the successor to the last block of chain 293, code block 208, but that chain contains non-zero count blocks and code block 293 has a count of zero, so it can only be added to chains with zero count blocks. Since none exist, it is added to a new chain 297.
  • Set collection module 103 is configured to receive code block chains, such as, for example, code block chains 121. From the code block chains, set collection module can create after sets and before sets for each chain in the code block chains, such as, for example, after sets 115 and before sets 116.
  • An after set for a specified code block chain indicates a preference that the specified code block chain comes after the code block chains in the after set in a total ordering of chains.
  • a before set for a specified code block chain indicates a preference that the specified code block chain come before the code block chains in the before set in a total ordering of chains.
  • Figure 2C depicts example after sets 115 and before sets 116 for code block chains 121 without transitive closure.
  • Chain ordering module 104 is configured to receive code block chains, any after sets and before sets, and precedence relationships, such as, for example, code block chains 121, after sets 115 and before sets 116, and precedence relationships
  • chain ordering module 104 can create a total ordering of code block chains, such as, for example, total chain ordering 118, for an executable procedure.
  • the total chain ordering can be optimized for placement in system memory.
  • Figure 2E depicts a more detailed representation of total chain ordering
  • Figure 3 illustrates a flow chart of an example method 300 for block count based procedure layout and splitting.
  • Method 300 will be described with respect to the components and data in computer architecture 100, block graph 200, code block chains 121 and total chain ordering 118.
  • Method 300 includes an act of accessing an executable procedure that includes a plurality of different code blocks, the executable procedure having a starting code block and one or more conditional statements that separate execution of the procedure into other code blocks (act 301).
  • chain creation module 102 can access method 111.
  • Method 300 includes an act of accessing block count data for the executable procedure, the block count data including a block count for each code block, each block count indicating how many times a code block was executed during one or more prior executions of the executable procedure (act 302).
  • chain creation module 102 can access block count data 112.
  • Method 300 includes an act of creating one or more chains of one or more code blocks for the executable procedure based on the accessed block count data (act 303).
  • chain creation module 102 can create code block chains 121 based on block count data 112. In some embodiments, code block chains are created simultaneously.
  • Chain creation for the one or more code block chains includes an act of accessing a code block with the highest block count that is not already included in a code block chain (act 304).
  • chain creation module 102 can access code block 201. If code block 201 is already included in a chain, chain creation module 201 can access block 202. Next, chain creation module 102 can access blocks, 213, 203, 212, 205, 204, 208, 206, 209, 207, 210, and 211 in that order.
  • the ordering of blocks with the same block count e.g., 201, 202, and 213, 203 and 212, and 207 and 210) can be varied.
  • the location of a code block within block graph 200 may or may not be taken into consideration. For example, when code blocks have identical block counts, the code block closer to the top of block graph 200 can be processed first. However, other than for ordering the processing of code blocks with identical block counts, location of a code block within block graph 200 has limited, if any, impact on the order code blocks are processed for inclusion in code block chains. For example, code block 213 can be processed before code block 203, even though code block 213 is below code bock 203 in block graph 200. [0053] For each accessed block, method 300 includes an act of determining if the accessed block is a successor to the last block in an existing code block chain (act 305). For example, upon accessing code block 201, chain creation module 102 can determine that block 201 is not a successor to the last block in an existing chain (since there are not yet any chains).
  • method 300 includes an act of determining if the accessed block is a predecessor to the first block in an existing code block chain (act 306). For example, upon accessing block 201, chain creation module 102 can determined that block 201 is not a predecessor to the first block in an existing chain (since there are not yet any chains).
  • method 300 includes an act of assigning the accessed code block to the one or more code block chains based at least on the results of determining if the accessed code block is a successor and/or predecessor (act 307).
  • Assigning a code block to the one or more code block chains can include creating a new chain for a code block and/or assigning (e.g., appending or prepending) a code block to an existing chain. For example, since code block 201 is neither a successor nor a predecessor of an existing chain, a new chain 292 is created. Code block 201 is assigned to chain 292.
  • Acts 305, 306, and 307 can be repeated for each accessed code block.
  • chain creation module 102 can process code block 202.
  • Chain creation module 102 can determine that code block 202 is a successor to code block 201 (the last code block in chain 292) and is not a predecessor to the first block in an existing chain. Thus, code block 202 is appended to chain 292.
  • Assigning code blocks to code block chains can also include merging code block chains.
  • chain creation module 102 can process code block 213.
  • Code block 213 is neither a successor nor a predecessor to an existing chain. Since code block 201 is neither a successor nor a predecessor of an existing chain, a new chain 298 is created. Code block 213 is assigned to chain 298.
  • chain creation module 102 can process code block 203 and append code block 202 to chain 292. Subsequent to processing code block 203, chain creation module 102 can process code block 212 and prepend code block 212 to chain 298. [0059] Subsequent to processing code block 212, chain creation module 102 can process code block 205. Chain creation module 102 can determine that code block 205 is both a successor to the last code block (code block 203) in chain 292 and a predecessor to the first code block (code block 212) in chain 298. As such, code block 205 is appended to chain 292 or prepended to chain 298. Subsequently, chains 292 and 298 are merged resulting in a single chain (as depicted in Figure 2A chain 292 is retained and chain 298 is discarded, however the inverse can also occur).
  • Code block chain creation can include separating chains of non-zero count code blocks from chains of zero count code blocks. Accordingly, method 300 can also include an act of, prior to assigning a code block to a code block chain, determining if a code block is a zero or non-zero count code block. As such, assigning an accessed code block to the one or more code block chains can include determining if a zero count block is a predecessor or successor to a non-zero block in an existing code block chain. Thus, even when a zero count block is a predecessor or successor of a non-zero count block, a new chain is created and the zero count block is assigned to the new chains.
  • code block 211 is a successor to the last code block (code block 208) in code block chain 293. However, since code block 211 is a zero count code block, a new code block chain 297 is created. Code block 211 is assigned to code block chain 297 instead of code block chain 293.
  • Chain creation module 102 can send code block chains 121, including nonzero count chains 113 (e.g., non-zero chains 292, 293, 294, 295, and 296) and zero count chains 114 (e.g., zero chain 297), to set collection module 143 and/or to chain ordering module 104.
  • Method 300 includes an act of ordering the one or more code block chains based on precedence relationships for placement in system memory to optimize subsequent execution of the executable procedure (act 309).
  • chain ordering module 104 can order code block chains 121 based on precedence relationships 117 for placement into system memory to optimize execution of method 111. Precedence relationships 117 can indicate that non-zero chains are to be ordered before zero chains.
  • ordering the one or more code block chains can include an act of creating an ordering for any non-zero count code block chains including connecting any non-zero count code block chains to one another in accordance with precedence relationships (act 310).
  • chain ordering module 104 can create non-zero ordering 119 for non-zero chains 292 through 296, including connecting non-zero chains 292 thorough 296 to one another in accordance with precedence relationships 117.
  • Ordering the one or more code block chains can also include an act of creating an ordering for any zero count code block chains including connecting any zero count code block chains to one another in accordance with precedence relationships (act 311).
  • Ordering the one or more code block chains can also include an act of appending any zero count code block chains to the end of the ordering of non-zero count code block chains to create a total ordering for the one or more code block chains such that any non-zero count blocks are placed before all zero count blocks within system memory when the total ordering is placed into system memory (act 311).
  • chain ordering module 104 can append zero chain 297 to the end of non-zero ordering 119 to create total chain ordering 118.
  • any non-zero count blocks i.e., 292-296
  • all zero count blocks i.e., 297. Placing non-zero count blocks before zero count blocks in system memory optimizes execution of method 111. During execution, non-zero blocks are grouped together significantly reducing the likelihood of a transition between non-zero blocks having to skip over any zero blocks in memory. Further, in some embodiments, zero count blocks are not even loaded into memory.
  • Figure 4 illustrates a flow chart of an example method 400 for block count based procedure layout and splitting.
  • Method 400 will be described with respect to the components and data in computer architecture 100, block graph 200, code block chains 121, after sets 115, before sets 116, and total chain ordering 118.
  • Method 400 includes an act of accessing an executable procedure that includes a plurality of different code blocks, the executable procedure having a starting code block and one or more conditional statements that separate execution of the procedure into other code blocks (act 401).
  • Method 400 includes an act of accessing block count data for the executable procedure, the block count data including a block count for each code block, each block count indicating how many times a code block was executed during one or more prior executions of the executable procedure (act 402).
  • chain creation module 102 can access method 111 and block count data 112.
  • Method 400 includes an act of creating one or more chains of one or more code blocks for the executable procedure based on the accessed block count data (act 403).
  • chain creation module 102 can create code block chains 121 based on block count data 112.
  • method 400 includes an act of collecting at least an after set of chains for the code block chain, the after set of chains indicating a preference that the chain is to succeed a set of other chains within an ordering of the one or more code block chains (act 404).
  • set collection module 143 can collect after sets 115 for code block chains 121.
  • An after set for a specified chain indicates a preference that the chain succeeds a set of other chains.
  • a plurality of after sets expresses preferences for a total ordering of chains.
  • a total ordering based on one of more expressed preferences can still be generated.
  • after sets 115 indicate that it is preferred in a total ordering of chains that chain 293 come after chain 292 and also that chain 292 come after chain 293.
  • a total ordering satisfying either preference likely reduces the overall number of unconditional branches, assists in prediction of remaining conditional branches, and places frequently executed code blocks closer together.
  • the preferences are in opposition satisfying either preference provides more efficient execution of method 111.
  • set collection module 143 can collect before sets 116.
  • a before set for a specified chain indicates a preference that the chain precede a set of other chains.
  • a plurality of before sets also expresses preferences for a total ordering of chains.
  • a total ordering based on one of more expressed preferences can still be generated.
  • Collection of before and after sets can be based on the position of code blocks within one chain relative to the position of code blocks within another chain in block graph 200.
  • non-zero chain 294 preferably comes before non- zero chain 296, in a total ordering, resulting in a forward branch from non-zero chain 294 to non-zero chain 296.
  • Arranging non-zero chain 294 before non-zero chain 296 can optimize execution. That is, for example, since a processor may assume/predict that forward branches are not to be taken, the least travelled branch out of code block 206 is indicated as a forward branch.
  • Various precedence relationships can use before sets and after sets as input to determine how to order code block chains relative to one another.
  • the code block chain that contains the entry block is given the highest priority and is thus added as the first sorted chain.
  • each chain with the highest priority is added to the sorted list of chains. Priorities of unsorted chains can change as other chains are added to the list of sorted chains.
  • method 400 includes act of ordering the one or more code block chains based on precedence relationships for placement in system memory to optimize subsequent execution of the executable procedure, the precedence relationships including giving priority to a total ordering that minimizes after set violations for the one or more code block chains (act 405).
  • chain ordering module 104 can order code block chains 121, based on precedence relationships 117, for placement in system memory to optimize subsequent execution of method 111.
  • Precedence relationships 117 can indicate one or more prioritized precedence relationships.
  • a higher priority precedence relationship in precedence relationship 117 can be minimizing after set violations. For example, placing non-zero chain 295 before non-zero chain 293 violates after sets 115.
  • chain ordering module 104 attempts to place non-zero chain 295 after nonzero chain 293 in total chain ordering 118.
  • chain 501 includes code blocks 511, 512, 513, 514, and 515.
  • Chain 502 includes code block 516 and chain 503 includes code blocks 517 and 518.
  • Connections 521 represent that there are three connections between chain 501 and 503.
  • connections 522 represent that there are two connections between chain 501 and 502.
  • each of non-zero chains 295 and 294 has two connections to the previously sorted non-zero chains 292 and 293.
  • Non-zero chain 295 has an empty before set, meaning that there is no preference for any other code block chain to be after it, or in other words, no preference for code block chain 295 to be before any other code block chain.
  • Non-zero chain 294 has non-zero chain 296 in its before set, meaning that it is preferred for non-zero chain 296 to come after non-zero chain 294, or in other words, it is preference that non-zero chain 294 come before non-zero chain 296.
  • non-zero chain 294 increases the set of code block chains that should be after the sorted code block chains, whereas adding chain 295 does not increase the set.
  • chain 295 does not increase the set.
  • Total ordering 118 can be (potentially significantly) different than the ordering of code blocks of method 111 used during profiling. Total ordering 118 is optimized to place more frequently executed code blocks near the top of memory. Total ordering 118 is also optimized to group more frequently executed code blocks near each other in memory.
  • Total ordering is also optimized to place zero count (untouched) code blocks after any non-zero (touched) code blocks. These optimizations facilitate memory utilization in a top down manner and mitigate the number of jumps between non contiguous portions of memory during execution resulting in more efficient execution of method 111.
  • embodiments of the invention utilize code block counts to provide code block layouts that improve execution time of generated procedure code by minimizing branches along more frequently executed paths.
  • Embodiments can also take advantage of hardware branch prediction on remaining branches, which predicts that backwards branches are taken and forward branches are not taken.
  • Embodiments also improve startup times for programs having increased disk input/output by decreasing the amount of unexecuted code that is read from disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)
PCT/US2008/088483 2008-01-17 2008-12-29 Block count based procedure layout and splitting WO2009091493A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
BRPI0821772-6A BRPI0821772A2 (pt) 2008-01-17 2008-12-29 Esboço e divisão de procedimento baseado em contagem de blocos
CN200880125324.0A CN101918917B (zh) 2008-01-17 2008-12-29 基于块计数的过程布局和拆分
EP08870630A EP2250551A4 (de) 2008-01-17 2008-12-29 Prozedurlayout und splitting auf blockzählungsbasis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/016,099 US8677336B2 (en) 2008-01-17 2008-01-17 Block count based procedure layout and splitting
US12/016,099 2008-01-17

Publications (2)

Publication Number Publication Date
WO2009091493A2 true WO2009091493A2 (en) 2009-07-23
WO2009091493A3 WO2009091493A3 (en) 2009-10-08

Family

ID=40877462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/088483 WO2009091493A2 (en) 2008-01-17 2008-12-29 Block count based procedure layout and splitting

Country Status (5)

Country Link
US (1) US8677336B2 (de)
EP (1) EP2250551A4 (de)
CN (1) CN101918917B (de)
BR (1) BRPI0821772A2 (de)
WO (1) WO2009091493A2 (de)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8793602B2 (en) 2004-01-15 2014-07-29 The Mathworks, Inc. System and method for scheduling the execution of model components using model events
US11681531B2 (en) 2015-09-19 2023-06-20 Microsoft Technology Licensing, Llc Generation and use of memory access instruction order encodings
US11977891B2 (en) 2015-09-19 2024-05-07 Microsoft Technology Licensing, Llc Implicit program order
CN108027735B (zh) * 2015-09-19 2021-08-27 微软技术许可有限责任公司 用于操作处理器的装置、方法和计算机可读存储介质
US10652319B2 (en) * 2015-12-16 2020-05-12 Dell Products L.P. Method and system for forming compute clusters using block chains
US10452428B2 (en) * 2016-03-14 2019-10-22 International Business Machines Corporation Application execution with optimized code for use profiles
US10585648B2 (en) * 2016-06-01 2020-03-10 The Mathworks, Inc. Systems and methods for aggregating implicit and explicit event code of executable models
CN106878528A (zh) * 2017-01-23 2017-06-20 北京思特奇信息技术股份有限公司 一种基于区块链技术的骚扰来电短信拦截方法及系统
US10581621B2 (en) * 2017-05-18 2020-03-03 International Business Machines Corporation Enhanced chaincode analytics provenance in a blockchain
US10719970B2 (en) * 2018-01-08 2020-07-21 Apple Inc. Low latency firmware command selection using a directed acyclic graph
US11336455B2 (en) 2019-09-25 2022-05-17 International Business Machines Corporation Consensus protocol for blockchain DAG structure
US11221835B2 (en) 2020-02-10 2022-01-11 International Business Machines Corporation Determining when to perform and performing runtime binary slimming
US20230315453A1 (en) * 2022-04-01 2023-10-05 Intel Corporation Forward conditional branch event for profile-guided-optimization (pgo)

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5212794A (en) * 1990-06-01 1993-05-18 Hewlett-Packard Company Method for optimizing computer code to provide more efficient execution on computers having cache memories
US5732273A (en) 1995-08-11 1998-03-24 Digital Equipment Corporation System for monitoring compute system performance
US6006033A (en) 1994-08-15 1999-12-21 International Business Machines Corporation Method and system for reordering the instructions of a computer program to optimize its execution
US5768592A (en) 1994-09-27 1998-06-16 Intel Corporation Method and apparatus for managing profile data
US5778232A (en) 1996-07-03 1998-07-07 Hewlett-Packard Company Automatic compiler restructuring of COBOL programs into a proc per paragraph model
US5999738A (en) * 1996-11-27 1999-12-07 Hewlett-Packard Company Flexible scheduling of non-speculative instructions
US5950009A (en) 1997-03-10 1999-09-07 International Business Machines Coporation Method and apparatus for profile-based reordering of program portions in a computer program
US6073230A (en) 1997-06-11 2000-06-06 Advanced Micro Devices, Inc. Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches
US5966541A (en) * 1997-12-04 1999-10-12 Incert Software Corporation Test protection, and repair through binary-code augmentation
US6481008B1 (en) 1999-06-30 2002-11-12 Microsoft Corporation Instrumentation and optimization tools for heterogeneous programs
US6976260B1 (en) * 1999-09-24 2005-12-13 International Business Machines Corporation Method and apparatus for serializing a message queue in a multiprocessing environment
US7100157B2 (en) * 2002-09-24 2006-08-29 Intel Corporation Methods and apparatus to avoid dynamic micro-architectural penalties in an in-order processor
US7155708B2 (en) * 2002-10-31 2006-12-26 Src Computers, Inc. Debugging and performance profiling using control-dataflow graph representations with reconfigurable hardware emulation
US7386838B2 (en) 2003-04-03 2008-06-10 International Business Machines Corporation Method and apparatus for obtaining profile data for use in optimizing computer programming code
US20050044538A1 (en) * 2003-08-18 2005-02-24 Srinivas Mantripragada Interprocedural computing code optimization method and system
US7210135B2 (en) * 2003-08-26 2007-04-24 Microsoft Corporation Data flow analysis of transactional processes
US7661095B2 (en) 2005-04-14 2010-02-09 Hewlett-Packard Development Company, L.P. System and method to build a callgraph for functions with multiple entry points
US7792666B2 (en) * 2006-05-03 2010-09-07 Sony Computer Entertainment Inc. Translation block invalidation prehints in emulation of a target system on a host system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PETTIS K ET AL.: "Profile guided code positioning", ACM SIGPLAN NOTICES, ACM, ASSOCIATION FOR COMPUTING MACHINERY, vol. 25, no. 6, 20 June 1990 (1990-06-20), pages 16 - 27

Also Published As

Publication number Publication date
US20090187887A1 (en) 2009-07-23
US8677336B2 (en) 2014-03-18
WO2009091493A3 (en) 2009-10-08
CN101918917A (zh) 2010-12-15
EP2250551A2 (de) 2010-11-17
CN101918917B (zh) 2017-08-08
BRPI0821772A2 (pt) 2015-06-16
EP2250551A4 (de) 2012-06-13

Similar Documents

Publication Publication Date Title
US8677336B2 (en) Block count based procedure layout and splitting
US7401329B2 (en) Compiling computer programs to exploit parallelism without exceeding available processing resources
US7751317B2 (en) Cost-aware networking over heterogeneous data channels
US7873817B1 (en) High speed multi-threaded reduced instruction set computer (RISC) processor with hardware-implemented thread scheduler
USRE45199E1 (en) Compiler apparatus
US5920723A (en) Compiler with inter-modular procedure optimization
US20080216062A1 (en) Method for Configuring a Dependency Graph for Dynamic By-Pass Instruction Scheduling
US8745607B2 (en) Reducing branch misprediction impact in nested loop code
US9141354B2 (en) Advantageous state merging during symbolic analysis
US20070033578A1 (en) System and method for improving virtual machine performance using an offline profile repository
US7589719B2 (en) Fast multi-pass partitioning via priority based scheduling
US20050289530A1 (en) Scheduling of instructions in program compilation
EP2718808A2 (de) Bindung ausführbarer codes an eine laufzeit
GB2492457A (en) Predicting out of order instruction level parallelism of threads in a multi-threaded processor
WO2023011236A1 (zh) 一种程序源码的编译优化方法及相关产品
CN113157318B (zh) 基于倒计时缓冲的gpdsp汇编移植优化方法及系统
US20050125786A1 (en) Compiler with two phase bi-directional scheduling framework for pipelined processors
US20160004568A1 (en) Data processing system and method
Kiran et al. Execution time prediction of imperative paradigm tasks for grid scheduling optimization
US9465631B2 (en) Automatic caching of partial results while editing software
CN113141407B (zh) 一种页面资源加载方法、装置和电子设备
RU2206119C2 (ru) Способ получения объектного кода
Dos Santos et al. A code-motion pruning technique for global scheduling
Kiran et al. A prediction module to optimize scheduling in a grid computing environment
Xiao et al. Optimization on operation sorting for HLS scheduling algorithms

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880125324.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08870630

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 4093/CHENP/2010

Country of ref document: IN

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2008870630

Country of ref document: EP

ENP Entry into the national phase

Ref document number: PI0821772

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20100623