WO2021107765A1 - System and method for executing heterogeneous compilation - Google Patents

System and method for executing heterogeneous compilation Download PDF

Info

Publication number
WO2021107765A1
WO2021107765A1 PCT/MY2020/050124 MY2020050124W WO2021107765A1 WO 2021107765 A1 WO2021107765 A1 WO 2021107765A1 MY 2020050124 W MY2020050124 W MY 2020050124W WO 2021107765 A1 WO2021107765 A1 WO 2021107765A1
Authority
WO
WIPO (PCT)
Prior art keywords
passes
code
generating
self
processing
Prior art date
Application number
PCT/MY2020/050124
Other languages
French (fr)
Inventor
Keh Kok YONG
Pek Yin SIAN
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2021107765A1 publication Critical patent/WO2021107765A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/47Retargetable compilers

Definitions

  • the disclosures made herein relate generally to the field of software code compilation and, more particularly, to a system and method for executing heterogeneous compilation.
  • any computing system includes a compiler program for translating a high- level programming language e.g. C++, C, clang-mcpu, NVCC, Python, PyCUDA and the like, to a machine language, wherein a programmer or developer uses the high- level programming language to create a source code.
  • Basic functions of a typical compiler program includes recognize legal and illegal programs, generate correct code, manage storage of all variables and code and agree on format for object or assembly code.
  • Traditional compilers include C compiler (gcc), Java JIT compiler (hotspot), system assembler (as), system linker (Id), IDEs (Xcode), refactoring tools and the like.
  • LLVM low level virtual machine
  • IR intermediate representation
  • LLVM provides modular and reusable components for building compilers at reduced cost and time consumption. Also, LLVM allows choice of right components for a task and sharing of the components across different compilers. It is not capable of parsing a language’s grammar, directly addressing a larger culture and providing a garbage-collector mechanism.
  • Traditional LLVM architecture consists of a language frontend, core engine and code generator. The frontend is responsible for tokenizing words and parsing the sources. The backend takes preprocessed object code, and then performs transformation and optimization. However, there are always challenges in dealing with optimization to consider orders in a heterogeneous way.
  • United States Patent Publication No.: US 2014/0068581 A1 discloses a system for optimized division of work in a heterogeneous processing system.
  • the system divides a source code into multiple sections for parallel execution on corresponding allocated processor, analysing each section and transforming the sections based on analysis. Execution of each section is monitored and next code is allocated based on an execution parameter such as processor availability and time consumption. Since parallel execution is performed based on availability of resources, issues in dealing with optimization to consider orders in a heterogeneous way are still not solved.
  • the present invention proposes a system for executing heterogeneous compilation.
  • the system comprises an input unit for inputting a source code and a code parser for parsing and tokenizing the source code into multiple device codes in the form of multiple tokenized words and tags.
  • a dynamic re-planning engine processes each device code to generate an intermediate representation, IR, for transformation and optimization processes.
  • a code generator (14) generates a program binary code for runtime.
  • the dynamic re-planning engine includes a blackboard multi-agent optimization composer (BlackMAOC), multiple self-generating pass agents and a flight manual, wherein the BlackMAOC segments and transforms IR passes and the self-generating pass agents process a re-plan route for the IR passes and feedback to the flight manual.
  • BlackMAOC blackboard multi-agent optimization composer
  • the flight manual stores associate information of a platform and program execution.
  • the BlackMAOC includes a springer memory to represent a heterogeneous data structure and a V-ring hooker for IR processing.
  • the present invention extends IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocks aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline of multi-agent passes.
  • the present invention is capable of efficiently complying with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.
  • CPU central processing unit
  • GPU general-purpose graphics processing unit
  • the present invention also includes a method for executing heterogeneous compilation, wherein the method comprises the steps of: inputting a source code to an input unit, parsing and tokenizing the source code at least one code parser into multiple device codes, processing each device code to generate an intermediate representation and generating a program binary code at a code generator.
  • FIGURE 1 illustrates a block representation of the system for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.
  • FIGURE 2 illustrates a block representation of a dynamic re-planning engine, in accordance with an exemplary embodiment of the present invention.
  • FIGURE 3 illustrates a flow diagram of the method for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.
  • the description hereinafter, of the specific embodiment will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify or adapt or perform both for various applications such specific embodiment without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments.
  • the phraseology or terminology employed herein is for the purpose of description and not of limitation.
  • the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware or programmable instructions) or an embodiment combining software and hardware aspects that may all generally be referred to herein as an “unit,” “module,” or “system.”
  • Compiling A process of converting a code written in a compiled programming language into a machine executable program, wherein the machine includes but not limited to a desktop computer, mobile computer, smartphone, personal digital assistant, server, microprocessor and the like. Few examples of the machine include Intel, NVIDIA, AMD, ARM and PowerPC.
  • the present invention provides a system and a method for executing heterogeneous compilation.
  • the system comprises a dynamic re-planning engine including a blackboard multi-agent optimization composer (BlackMAOC) and multiple self- generating pass agents, wherein a springer memory in the BlackMAOC represents a heterogeneous data structure and a V-ring hooker in the BlackMAOC uses the data structure and interfaces with self-generating passes agents for intermediate representation (IR) processing.
  • BlackMAOC blackboard multi-agent optimization composer
  • IR intermediate representation
  • FIGURE 1 illustrates a block representation of the system for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.
  • the system (10) comprises an input unit (11) for inputting a source code and a code parser (12) for parsing and tokenizing the source code into multiple device codes in a form of multiple tokenized words and tags.
  • the device codes include transformation host code and transformation device code.
  • the input unit (11) is a storage drive capable of reading the source code stored in a storage medium such as magnetic disk, optical disk, solid stage storage device and the like.
  • the input unit (11) may also be an input device such as keyboard, voice recorder, scanner and the like, for receiving direct input of the source code.
  • a dynamic re-planning engine (13) processes each device code to generate an intermediate representation, IR, for transformation and optimization processes.
  • the dynamic re-planning engine (13) includes a blackboard multi-agent optimization composer (BlackMAOC) (14), multiple self-generating pass agents (15) and a flight manual (16), as shown in FIGURE 2.
  • BlackMAOC (14) segments and transforms IR passes and the self-generating pass agents (15) process a re-plan route for the IR passes and feeding back to the flight manual (16).
  • the flight manual (16) functions as a knowledgebase to store associate information of a platform and program execution
  • the BlackMAOC (14) includes a springer memory (17) to represent a heterogeneous data structure i.e. a dynamic combination refactor (DCR) structure, wherein data is stored in the spring memory (17) in the DCR structure.
  • DCR dynamic combination refactor
  • the BlackMAOC (14) also includes a V-ring hooker (18) to process said IR, wherein V-ring hooker (18) analyses the device codes based on the information stored in the flight manual (16).
  • the V-ring hooker (18) formulates the DCR, executes invariant passes in pipeline, checks if there is any change in the IR and processes the IR accordingly.
  • the V-ring hooker (18) sends the IR to the self generating pass agents (15) for trampoline execution of the IR passes and stores associate information of the IR passes to the flight manual (16).
  • the V-ring hooker (18) restarts IR processing. Prior to restarting IR processing, the V-ring hooker (18) shares potential results on a blackboard to activate the self-generating pass, resolves phase ordering, removes obsolete transformation paths from the blackboard and writes the potential results delegated to runtime passes on the blackboard.
  • the V-ring hooker (18) sends the IR passes to the self-generating pass agents (15) for processing a re-plan route.
  • the self-generating pass agents (15) receive and execute the IR passes from the BlackMAOC and send results to the BlackMAOC (14) as changes in the IR.
  • the self-generating pass agents (15) ingest the IR passes to a trampoline jump and process the IR passes for multi agent pipeline execution.
  • the self-generating pass agents classify failed potential optimizations based on corresponding reasons for failure such as aliasing between two variables and the like. Further, the self-generating pass agents (15) store the failed potential optimizations.
  • the self-generating pass agents check if the processed IR passes are ready for generating the program binary code for runtime. If no, the self-generating pass agents (15) return to ingestion of the IR passes. Otherwise, the self-generating pass agents (15) insert shared and specialized micro heterogeneous runtime code in the IR passes for creating one or more template codes and the associate information.
  • the self-generating pass agents (15) store the template codes and the associate information to the flight manual (16).
  • a code generator (19, shown in FIGURE 1) generates a program binary code for runtime from the IR passes, which is executable at a machine for performing a function for which the source code is created by a programmer or developer.
  • the present invention extends IR optimization in the dynamic re-planning engine (13) for scrutinizing pipelining phase issues and unlocks aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline of multi-agent passes.
  • the present invention is capable of efficiently complying with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.
  • FIGURE 3 shows a flow diagram of the method (20) for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.
  • the method (20) comprises the steps of inputting a source code to an input unit (21), parsing and tokenizing the source code at a code parser into multiple device codes in the form of multiple tokenized words and tags (22), processing each device code at a dynamic re-planning engine to generate an intermediate representation (IR) for transformation and optimization processes (23) and generating a program binary code at a code generator (24).
  • IR intermediate representation
  • Each device code is processed by segmenting and transforming IR passes at a blackboard multi-agent optimization composer (BlackMAOC) of the dynamic re- planning engine and by processing a re-plan route for the IR passes and feeding back to a flight manual using multiple self-generating pass agents in the dynamic re planning engine. Further, associate information of a platform and program execution is stored in a flight manual in the dynamic re-planning engine, which functions as a knowledgebase.
  • BlackMAOC blackboard multi-agent optimization composer
  • a springer memory in the BlackMAOC to represent a heterogeneous data structure i.e. a dynamic combination refactor (DCR) structure, wherein data is stored in the spring memory in the DCR structure.
  • a V-ring hooker in the BlackMAOC processes the IR, wherein the device codes are analysed based on the information stored in the flight manual.
  • the V-ring hooker formulates the DCR, executes invariant passes in pipeline, checks if there is any change in the IR and processes the IR accordingly.
  • the IR is sent to the self-generating pass agents for trampoline execution of the IR passes and the associate information of the IR passes is stored in the flight manual.
  • IR processing is restarted. Prior to restarting IR processing, potential results are shared on a blackboard to activate the self-generating pass, phase ordering is resolved, obsolete transformation paths are removed from the blackboard and the potential results delegated to runtime passes are entered in the blackboard.
  • the IR passes are sent to the self-generating pass agents for processing a re-plan route, wherein the IR passes from the BlackMAOC are executed and results are sent to the BlackMAOC as changes in the IR.
  • the IR passes are ingested to a trampoline jump at the self-generating pass agents and are processed for multi agent pipeline execution. During executing the IR passes, failed potential optimizations are classified based on corresponding reasons for failure such as aliasing between two variables and the like and are stored. Further, the self-generating pass agents check if the processed IR passes are ready for generating the program binary code for runtime. If no, the process is returned to ingestion of the IR passes. Otherwise, shared and specialized micro heterogeneous runtime code is inserted in the IR passes for creating one or more template codes and the associate information.
  • the present invention is capable of extending IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocking aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline. Therefore, the present invention efficiently complies with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.
  • CPU central processing unit
  • GPU general-purpose graphics processing unit
  • the template codes and the associate information are stored to the flight manual.
  • the program binary code for runtime is generated from the IR passes, which is executable at a machine for performing a function for which the source code is created by a programmer or developer.
  • An apparatus for practicing various embodiments of the present invention may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the disclosure could be accomplished by modules, routines, subroutines, or subparts of a computer program product.

Abstract

The present invention relates to a system and method for executing heterogeneous compilation. The system (10) comprises an input unit (11) for inputting a source code and a code parser (12) for parsing and tokenizing the source code into multiple device codes in the form of multiple tokenized words and tags. A dynamic re-planning engine (13) processes each device code to generate an intermediate representation (IR) for transformation and optimization processes. A code generator (19) generates a program binary code for runtime.

Description

SYSTEM AND METHOD FOR EXECUTING HETEROGENEOUS COMPILATION
FIELD OF THE DISCLOSURE
The disclosures made herein relate generally to the field of software code compilation and, more particularly, to a system and method for executing heterogeneous compilation.
BACKGROUND
Generally, any computing system includes a compiler program for translating a high- level programming language e.g. C++, C, clang-mcpu, NVCC, Python, PyCUDA and the like, to a machine language, wherein a programmer or developer uses the high- level programming language to create a source code. Basic functions of a typical compiler program includes recognize legal and illegal programs, generate correct code, manage storage of all variables and code and agree on format for object or assembly code. Traditional compilers include C compiler (gcc), Java JIT compiler (hotspot), system assembler (as), system linker (Id), IDEs (Xcode), refactoring tools and the like.
Modern computing systems increasingly rely on heterogeneity processing to achieve greater performance, scalability and energy efficiency. Thus, compilers need massive tool-chains with multiple middle layer components to be a complete heterogeneous compiler system. A compiler infrastructure, also called low level virtual machine (LLVM), is used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is developed based on an intermediate representation (IR) independent of source code language.
LLVM provides modular and reusable components for building compilers at reduced cost and time consumption. Also, LLVM allows choice of right components for a task and sharing of the components across different compilers. It is not capable of parsing a language’s grammar, directly addressing a larger culture and providing a garbage-collector mechanism. Traditional LLVM architecture consists of a language frontend, core engine and code generator. The frontend is responsible for tokenizing words and parsing the sources. The backend takes preprocessed object code, and then performs transformation and optimization. However, there are always challenges in dealing with optimization to consider orders in a heterogeneous way.
United States Patent Publication No.: US 2014/0068581 A1 discloses a system for optimized division of work in a heterogeneous processing system. The system divides a source code into multiple sections for parallel execution on corresponding allocated processor, analysing each section and transforming the sections based on analysis. Execution of each section is monitored and next code is allocated based on an execution parameter such as processor availability and time consumption. Since parallel execution is performed based on availability of resources, issues in dealing with optimization to consider orders in a heterogeneous way are still not solved.
Hence, there is a need for a system and method for executing heterogeneous compilation, which enables scrutinizing pipelining phase issues and unlocking aggressive heterogeneous optimization without slowing down compiling performance.
SUMMARY
The present invention proposes a system for executing heterogeneous compilation. The system comprises an input unit for inputting a source code and a code parser for parsing and tokenizing the source code into multiple device codes in the form of multiple tokenized words and tags. A dynamic re-planning engine processes each device code to generate an intermediate representation, IR, for transformation and optimization processes. A code generator (14) generates a program binary code for runtime.
In a preferred embodiment, the dynamic re-planning engine includes a blackboard multi-agent optimization composer (BlackMAOC), multiple self-generating pass agents and a flight manual, wherein the BlackMAOC segments and transforms IR passes and the self-generating pass agents process a re-plan route for the IR passes and feedback to the flight manual. The flight manual stores associate information of a platform and program execution.
In one aspect of the present invention, the BlackMAOC includes a springer memory to represent a heterogeneous data structure and a V-ring hooker for IR processing. The present invention extends IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocks aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline of multi-agent passes. By this way, the present invention is capable of efficiently complying with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.
The present invention also includes a method for executing heterogeneous compilation, wherein the method comprises the steps of: inputting a source code to an input unit, parsing and tokenizing the source code at least one code parser into multiple device codes, processing each device code to generate an intermediate representation and generating a program binary code at a code generator.
Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
In the figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
FIGURE 1 illustrates a block representation of the system for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.
FIGURE 2 illustrates a block representation of a dynamic re-planning engine, in accordance with an exemplary embodiment of the present invention.
FIGURE 3 illustrates a flow diagram of the method for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.
DETAILED DESCRIPTION
In accordance with the present invention, there is provided a system and a method for executing heterogeneous compilation, which will now be described with reference to the embodiment shown in the accompanying drawings. The embodiment does not limit the scope and ambit of the disclosure. The description relates purely to the exemplary embodiment and its suggested applications.
The embodiment herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiment in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiment herein may be practiced and to further enable those of skill in the art to practice the embodiment herein. Accordingly, the description should not be construed as limiting the scope of the embodiment herein. The description hereinafter, of the specific embodiment will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify or adapt or perform both for various applications such specific embodiment without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware or programmable instructions) or an embodiment combining software and hardware aspects that may all generally be referred to herein as an “unit,” “module,” or “system.”
Various terms as used herein are defined below. To the extent a term used in a claim is not defined below, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.
Definitions:
Compiling: A process of converting a code written in a compiled programming language into a machine executable program, wherein the machine includes but not limited to a desktop computer, mobile computer, smartphone, personal digital assistant, server, microprocessor and the like. Few examples of the machine include Intel, NVIDIA, AMD, ARM and PowerPC.
The present invention provides a system and a method for executing heterogeneous compilation. The system comprises a dynamic re-planning engine including a blackboard multi-agent optimization composer (BlackMAOC) and multiple self- generating pass agents, wherein a springer memory in the BlackMAOC represents a heterogeneous data structure and a V-ring hooker in the BlackMAOC uses the data structure and interfaces with self-generating passes agents for intermediate representation (IR) processing. By this way, the present invention is capable of extending IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocking aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline. Therefore, the present invention is efficiently complies with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.
Referring to the accompanying drawings, FIGURE 1 illustrates a block representation of the system for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention. The system (10) comprises an input unit (11) for inputting a source code and a code parser (12) for parsing and tokenizing the source code into multiple device codes in a form of multiple tokenized words and tags. Preferably, the device codes include transformation host code and transformation device code. Further, the input unit (11) is a storage drive capable of reading the source code stored in a storage medium such as magnetic disk, optical disk, solid stage storage device and the like. Alternatively, the input unit (11) may also be an input device such as keyboard, voice recorder, scanner and the like, for receiving direct input of the source code.
A dynamic re-planning engine (13) processes each device code to generate an intermediate representation, IR, for transformation and optimization processes. In an exemplary embodiment, the dynamic re-planning engine (13) includes a blackboard multi-agent optimization composer (BlackMAOC) (14), multiple self-generating pass agents (15) and a flight manual (16), as shown in FIGURE 2. The BlackMAOC (14) segments and transforms IR passes and the self-generating pass agents (15) process a re-plan route for the IR passes and feeding back to the flight manual (16). The flight manual (16) functions as a knowledgebase to store associate information of a platform and program execution
The BlackMAOC (14) includes a springer memory (17) to represent a heterogeneous data structure i.e. a dynamic combination refactor (DCR) structure, wherein data is stored in the spring memory (17) in the DCR structure. A typical DCR structure is shown in the table 1. Table 1. DCR structure
Figure imgf000008_0001
The BlackMAOC (14) also includes a V-ring hooker (18) to process said IR, wherein V-ring hooker (18) analyses the device codes based on the information stored in the flight manual (16). The V-ring hooker (18) formulates the DCR, executes invariant passes in pipeline, checks if there is any change in the IR and processes the IR accordingly.
If there is no change in the IR, the V-ring hooker (18) sends the IR to the self generating pass agents (15) for trampoline execution of the IR passes and stores associate information of the IR passes to the flight manual (16). On the contrary, if there is a change in the IR, the V-ring hooker (18) restarts IR processing. Prior to restarting IR processing, the V-ring hooker (18) shares potential results on a blackboard to activate the self-generating pass, resolves phase ordering, removes obsolete transformation paths from the blackboard and writes the potential results delegated to runtime passes on the blackboard.
Further, the V-ring hooker (18) sends the IR passes to the self-generating pass agents (15) for processing a re-plan route. The self-generating pass agents (15) receive and execute the IR passes from the BlackMAOC and send results to the BlackMAOC (14) as changes in the IR. The self-generating pass agents (15) ingest the IR passes to a trampoline jump and process the IR passes for multi agent pipeline execution.
During execution of the IR passes, the self-generating pass agents (15) classify failed potential optimizations based on corresponding reasons for failure such as aliasing between two variables and the like. Further, the self-generating pass agents (15) store the failed potential optimizations.
Further, the self-generating pass agents (15) check if the processed IR passes are ready for generating the program binary code for runtime. If no, the self-generating pass agents (15) return to ingestion of the IR passes. Otherwise, the self-generating pass agents (15) insert shared and specialized micro heterogeneous runtime code in the IR passes for creating one or more template codes and the associate information.
The self-generating pass agents (15) store the template codes and the associate information to the flight manual (16). Finally, a code generator (19, shown in FIGURE 1) generates a program binary code for runtime from the IR passes, which is executable at a machine for performing a function for which the source code is created by a programmer or developer.
The present invention extends IR optimization in the dynamic re-planning engine (13) for scrutinizing pipelining phase issues and unlocks aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline of multi-agent passes. By this way, the present invention is capable of efficiently complying with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices. FIGURE 3 shows a flow diagram of the method (20) for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention. The method (20) comprises the steps of inputting a source code to an input unit (21), parsing and tokenizing the source code at a code parser into multiple device codes in the form of multiple tokenized words and tags (22), processing each device code at a dynamic re-planning engine to generate an intermediate representation (IR) for transformation and optimization processes (23) and generating a program binary code at a code generator (24).
Each device code is processed by segmenting and transforming IR passes at a blackboard multi-agent optimization composer (BlackMAOC) of the dynamic re- planning engine and by processing a re-plan route for the IR passes and feeding back to a flight manual using multiple self-generating pass agents in the dynamic re planning engine. Further, associate information of a platform and program execution is stored in a flight manual in the dynamic re-planning engine, which functions as a knowledgebase.
A springer memory in the BlackMAOC to represent a heterogeneous data structure i.e. a dynamic combination refactor (DCR) structure, wherein data is stored in the spring memory in the DCR structure. Further, a V-ring hooker in the BlackMAOC processes the IR, wherein the device codes are analysed based on the information stored in the flight manual. The V-ring hooker formulates the DCR, executes invariant passes in pipeline, checks if there is any change in the IR and processes the IR accordingly.
If there is no change in the IR, the IR is sent to the self-generating pass agents for trampoline execution of the IR passes and the associate information of the IR passes is stored in the flight manual. On the contrary, if there is a change in the IR, IR processing is restarted. Prior to restarting IR processing, potential results are shared on a blackboard to activate the self-generating pass, phase ordering is resolved, obsolete transformation paths are removed from the blackboard and the potential results delegated to runtime passes are entered in the blackboard. The IR passes are sent to the self-generating pass agents for processing a re-plan route, wherein the IR passes from the BlackMAOC are executed and results are sent to the BlackMAOC as changes in the IR. The IR passes are ingested to a trampoline jump at the self-generating pass agents and are processed for multi agent pipeline execution. During executing the IR passes, failed potential optimizations are classified based on corresponding reasons for failure such as aliasing between two variables and the like and are stored. Further, the self-generating pass agents check if the processed IR passes are ready for generating the program binary code for runtime. If no, the process is returned to ingestion of the IR passes. Otherwise, shared and specialized micro heterogeneous runtime code is inserted in the IR passes for creating one or more template codes and the associate information.
By this way, the present invention is capable of extending IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocking aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline. Therefore, the present invention efficiently complies with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.
The template codes and the associate information are stored to the flight manual. Finally, the program binary code for runtime is generated from the IR passes, which is executable at a machine for performing a function for which the source code is created by a programmer or developer.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise.
The terms "comprises," "comprising," “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed. The use of the expression “at least” or “at least one” suggests the use of one or more elements, as the use may be in one of the embodiments to achieve one or more of the desired objects or results.
Various methods described herein may be practiced by combining one or more machine-readable storage media containing code that perform the steps according to the present invention with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing various embodiments of the present invention may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the disclosure could be accomplished by modules, routines, subroutines, or subparts of a computer program product.
While the foregoing describes various embodiments of the disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof. The scope of the disclosure is determined by the claims that follow. The disclosure is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the disclosure when combined with information and knowledge available to the person having ordinary skill in the art.

Claims

CLAIMS:
1. A system (10) for executing heterogeneous compilation, comprising:
- at least one input unit (11 ) for inputting a source code;
- at least one code parser (12) for parsing and tokenizing said source code into multiple device codes in a form of multiple tokenized words and tags;
- a dynamic re-planning engine (13) for processing each device code to generate an intermediate representation, IR, for transformation and optimization processes; and
- a code generator (19) for generating a program binary code for runtime, characterized in that said dynamic re-planning engine (13) includes: i. a blackboard multi-agent optimization composer, BlackMAOC, (14) for segmenting and transforming IR passes; ii. a flight manual (16) functioning as a knowledgebase to store an associate information of a platform and program execution; and iii. multiple self-generating pass agents (15) for processing a re-plan route for said IR passes and feeding back to said flight manual.
2. The system (10) as claimed in claim 1, wherein said BlackMAOC (14) includes a springer memory (17) to represent a heterogeneous data structure and a V-ring hooker (18) for IR processing.
3. The system (10) as claimed in claim 2, wherein said heterogeneous data structure is a dynamic combination refactor, DCR, structure.
4. The system (10) as claimed in claim 2, wherein said V-ring hooker (18) processes said IR by:
- analysing said device codes based on said information stored in said flight manual (16);
- formulating DCR and executing invariant IR passes in pipeline;
- checking if there is any change in said IR; and
- processing said IR based on said check result.
5. The system (10) as claimed in claim 4, wherein, if there is no change in said IR, said V-ring hooker (18) sends said IR to said self-generating pass agents (15) for trampoline execution of said IR passes in pipeline and stores associate information of said IR passes to said flight manual (16).
6. The system (10) as claimed in claim 4, wherein said V-ring hooker (18) restarts IR processing if there is a change in said IR, wherein said V-ring hooker (18):
- shares potential results on a blackboard to activate said self-generating pass agents (15);
- resolves phase ordering;
- removes obsolete transformation paths from said blackboard;
- writes said potential results delegated to runtime passes on said blackboard; and
- re-runs said analysis of said device codes based on said information stored in said flight manual (16).
7. The system (10) as claimed in claim 1 , wherein said self-generating pass agents (15) process said re-plan route by:
- receiving and executing said IR passes from said BlackMAOC (14);
- sending results to said BlackMAOC (14) as changes in said IR;
- storing failed potential optimizations, each of said failed potential optimizations is classified by at least one reason for failure;
- ingesting said IR passes to a trampoline jump; and
- processing said IR passes for multi agent pipeline execution.
8. The system (10) as claimed in claim 7, wherein said self-generating pass agents (15) check if said processed IR passes are ready for generating said program binary code for runtime and insert shared and specialized micro heterogeneous runtime code for creating at least one template code and said associate information if said IR passes are ready.
9. The system (10) as claimed in claim 8, wherein if said IR passes are not ready for generating said program binary code, said self-generating pass agents repeat ingestion of said IR passes, processing of said ingested IR passes and checking of readiness of said IR passes.
10. The system (10) as claimed in claim 8, wherein said self-generating pass agents store said template code and said associate information to said flight manual.
11. A method (20) for executing heterogeneous compilation, comprising the steps of:
- inputting a source code to at least one input unit (21 ),
- parsing and tokenizing said source code at least one code parser into multiple device codes in the form of multiple tokenized words and tags (22);
- processing each device code at a dynamic re-planning engine to generate an intermediate representation, IR, for transformation and optimization processes (23); and
- generating a program binary code at a code generator (24), characterized in that said step of processing each device code includes: i. segmenting and transforming IR passes at a blackboard multi-agent optimization composer, BlackMAOC; ii. processing a re-plan route for said IR passes and feedback to a flight manual using multiple self-generating pass agents; and iii. storing an associate information of a platform and program execution in a flight manual that functions as a knowledgebase.
PCT/MY2020/050124 2019-11-29 2020-10-28 System and method for executing heterogeneous compilation WO2021107765A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2019007061 2019-11-29
MYPI2019007061 2019-11-29

Publications (1)

Publication Number Publication Date
WO2021107765A1 true WO2021107765A1 (en) 2021-06-03

Family

ID=76128872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2020/050124 WO2021107765A1 (en) 2019-11-29 2020-10-28 System and method for executing heterogeneous compilation

Country Status (1)

Country Link
WO (1) WO2021107765A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023116272A1 (en) * 2021-12-20 2023-06-29 International Business Machines Corporation Feedback-directed pass pipeline optimization in static compilation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283337A1 (en) * 2006-06-06 2007-12-06 Waseda University Global compiler for controlling heterogeneous multiprocessor
US20110214016A1 (en) * 2010-03-01 2011-09-01 International Business Machines Corporation Performing Aggressive Code Optimization with an Ability to Rollback Changes Made by the Aggressive Optimizations
WO2015143641A1 (en) * 2014-03-26 2015-10-01 Empire Technology Development Llc Compilation of application into multiple instruction sets for a heterogeneous processor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283337A1 (en) * 2006-06-06 2007-12-06 Waseda University Global compiler for controlling heterogeneous multiprocessor
US20110214016A1 (en) * 2010-03-01 2011-09-01 International Business Machines Corporation Performing Aggressive Code Optimization with an Ability to Rollback Changes Made by the Aggressive Optimizations
WO2015143641A1 (en) * 2014-03-26 2015-10-01 Empire Technology Development Llc Compilation of application into multiple instruction sets for a heterogeneous processor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023116272A1 (en) * 2021-12-20 2023-06-29 International Business Machines Corporation Feedback-directed pass pipeline optimization in static compilation
US11847435B2 (en) 2021-12-20 2023-12-19 International Business Machines Corporation Feedback-directed pass pipeline optimization in static compilation

Similar Documents

Publication Publication Date Title
US10402208B2 (en) Adaptive portable libraries
Duboscq et al. An intermediate representation for speculative optimizations in a dynamic compiler
US10860300B2 (en) Direct function call substitution using preprocessor
US5774728A (en) Method and system for compiling sections of a computer program for multiple execution environments
US8522223B2 (en) Automatic function call in multithreaded application
US7434213B1 (en) Portable executable source code representations
US8407667B2 (en) Inferring missing type information for reflection
US8239823B2 (en) Generating libraries for reflection without project compilation
US6874140B1 (en) Debug annotations
Wagelaar et al. Translational semantics of a co-evolution specific language with the EMF transformation virtual machine
US20030237080A1 (en) System and method for improved register allocation in an optimizing compiler
JP2013533533A (en) Workload distribution and parallelization within a computing platform
US20080295070A1 (en) Native access to foreign code environment
US20100011339A1 (en) Single instruction multiple data (simd) code generation for parallel loops using versioning and scheduling
CN101369233A (en) Program compiling method and compiler
US8930925B2 (en) Method for enabling compilation of a Cobol source program utilizing a two-stage compilation process, the Cobol source program including a mix of Cobol, C++ or JAVA statements, and optional OpenMP directives
US7152223B1 (en) Methods and systems for compiling and interpreting one or more associations between declarations and implementations in a language neutral fashion
WO2021107765A1 (en) System and method for executing heterogeneous compilation
Bartkov Graal as a multilingual platform
Reinking et al. A type-directed approach to program repair
US20180364993A1 (en) Generating executable files through compiler optimization
WO2021047662A1 (en) Method and apparatus for enabling autonomous acceleration of dataflow ai applications
Aumage et al. Detecting simdization opportunities through static/dynamic dependence analysis
Gaikwad et al. Hosting OpenMP programs on Java virtual machines
Wang et al. Ompparser: A standalone and unified OpenMP parser

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20892039

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20892039

Country of ref document: EP

Kind code of ref document: A1