WO2021107765A1

WO2021107765A1 - System and method for executing heterogeneous compilation

Info

Publication number: WO2021107765A1
Application number: PCT/MY2020/050124
Authority: WO
Inventors: Keh Kok YONG; Pek Yin SIAN
Original assignee: Mimos Berhad
Priority date: 2019-11-29
Filing date: 2020-10-28
Publication date: 2021-06-03

Abstract

The present invention relates to a system and method for executing heterogeneous compilation. The system (10) comprises an input unit (11) for inputting a source code and a code parser (12) for parsing and tokenizing the source code into multiple device codes in the form of multiple tokenized words and tags. A dynamic re-planning engine (13) processes each device code to generate an intermediate representation (IR) for transformation and optimization processes. A code generator (19) generates a program binary code for runtime.

Description

SYSTEM AND METHOD FOR EXECUTING HETEROGENEOUS COMPILATION

FIELD OF THE DISCLOSURE

The disclosures made herein relate generally to the field of software code compilation and, more particularly, to a system and method for executing heterogeneous compilation.

BACKGROUND

Generally, any computing system includes a compiler program for translating a high- level programming language e.g. C++, C, clang-mcpu, NVCC, Python, PyCUDA and the like, to a machine language, wherein a programmer or developer uses the high- level programming language to create a source code. Basic functions of a typical compiler program includes recognize legal and illegal programs, generate correct code, manage storage of all variables and code and agree on format for object or assembly code. Traditional compilers include C compiler (gcc), Java JIT compiler (hotspot), system assembler (as), system linker (Id), IDEs (Xcode), refactoring tools and the like.

Modern computing systems increasingly rely on heterogeneity processing to achieve greater performance, scalability and energy efficiency. Thus, compilers need massive tool-chains with multiple middle layer components to be a complete heterogeneous compiler system. A compiler infrastructure, also called low level virtual machine (LLVM), is used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is developed based on an intermediate representation (IR) independent of source code language.

LLVM provides modular and reusable components for building compilers at reduced cost and time consumption. Also, LLVM allows choice of right components for a task and sharing of the components across different compilers. It is not capable of parsing a language’s grammar, directly addressing a larger culture and providing a garbage-collector mechanism. Traditional LLVM architecture consists of a language frontend, core engine and code generator. The frontend is responsible for tokenizing words and parsing the sources. The backend takes preprocessed object code, and then performs transformation and optimization. However, there are always challenges in dealing with optimization to consider orders in a heterogeneous way.

United States Patent Publication No.: US 2014/0068581 A1 discloses a system for optimized division of work in a heterogeneous processing system. The system divides a source code into multiple sections for parallel execution on corresponding allocated processor, analysing each section and transforming the sections based on analysis. Execution of each section is monitored and next code is allocated based on an execution parameter such as processor availability and time consumption. Since parallel execution is performed based on availability of resources, issues in dealing with optimization to consider orders in a heterogeneous way are still not solved.

Hence, there is a need for a system and method for executing heterogeneous compilation, which enables scrutinizing pipelining phase issues and unlocking aggressive heterogeneous optimization without slowing down compiling performance.

SUMMARY

The present invention proposes a system for executing heterogeneous compilation. The system comprises an input unit for inputting a source code and a code parser for parsing and tokenizing the source code into multiple device codes in the form of multiple tokenized words and tags. A dynamic re-planning engine processes each device code to generate an intermediate representation, IR, for transformation and optimization processes. A code generator (14) generates a program binary code for runtime.

In a preferred embodiment, the dynamic re-planning engine includes a blackboard multi-agent optimization composer (BlackMAOC), multiple self-generating pass agents and a flight manual, wherein the BlackMAOC segments and transforms IR passes and the self-generating pass agents process a re-plan route for the IR passes and feedback to the flight manual. The flight manual stores associate information of a platform and program execution.

In one aspect of the present invention, the BlackMAOC includes a springer memory to represent a heterogeneous data structure and a V-ring hooker for IR processing. The present invention extends IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocks aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline of multi-agent passes. By this way, the present invention is capable of efficiently complying with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.

The present invention also includes a method for executing heterogeneous compilation, wherein the method comprises the steps of: inputting a source code to an input unit, parsing and tokenizing the source code at least one code parser into multiple device codes, processing each device code to generate an intermediate representation and generating a program binary code at a code generator.

Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

In the figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.

FIGURE 1 illustrates a block representation of the system for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.

FIGURE 2 illustrates a block representation of a dynamic re-planning engine, in accordance with an exemplary embodiment of the present invention.

FIGURE 3 illustrates a flow diagram of the method for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention.

DETAILED DESCRIPTION

In accordance with the present invention, there is provided a system and a method for executing heterogeneous compilation, which will now be described with reference to the embodiment shown in the accompanying drawings. The embodiment does not limit the scope and ambit of the disclosure. The description relates purely to the exemplary embodiment and its suggested applications.

The embodiment herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiment in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiment herein may be practiced and to further enable those of skill in the art to practice the embodiment herein. Accordingly, the description should not be construed as limiting the scope of the embodiment herein. The description hereinafter, of the specific embodiment will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify or adapt or perform both for various applications such specific embodiment without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware or programmable instructions) or an embodiment combining software and hardware aspects that may all generally be referred to herein as an “unit,” “module,” or “system.”

Various terms as used herein are defined below. To the extent a term used in a claim is not defined below, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.

Definitions:

Compiling: A process of converting a code written in a compiled programming language into a machine executable program, wherein the machine includes but not limited to a desktop computer, mobile computer, smartphone, personal digital assistant, server, microprocessor and the like. Few examples of the machine include Intel, NVIDIA, AMD, ARM and PowerPC.

The present invention provides a system and a method for executing heterogeneous compilation. The system comprises a dynamic re-planning engine including a blackboard multi-agent optimization composer (BlackMAOC) and multiple self- generating pass agents, wherein a springer memory in the BlackMAOC represents a heterogeneous data structure and a V-ring hooker in the BlackMAOC uses the data structure and interfaces with self-generating passes agents for intermediate representation (IR) processing. By this way, the present invention is capable of extending IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocking aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline. Therefore, the present invention is efficiently complies with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.

Referring to the accompanying drawings, FIGURE 1 illustrates a block representation of the system for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention. The system (10) comprises an input unit (11) for inputting a source code and a code parser (12) for parsing and tokenizing the source code into multiple device codes in a form of multiple tokenized words and tags. Preferably, the device codes include transformation host code and transformation device code. Further, the input unit (11) is a storage drive capable of reading the source code stored in a storage medium such as magnetic disk, optical disk, solid stage storage device and the like. Alternatively, the input unit (11) may also be an input device such as keyboard, voice recorder, scanner and the like, for receiving direct input of the source code.

A dynamic re-planning engine (13) processes each device code to generate an intermediate representation, IR, for transformation and optimization processes. In an exemplary embodiment, the dynamic re-planning engine (13) includes a blackboard multi-agent optimization composer (BlackMAOC) (14), multiple self-generating pass agents (15) and a flight manual (16), as shown in FIGURE 2. The BlackMAOC (14) segments and transforms IR passes and the self-generating pass agents (15) process a re-plan route for the IR passes and feeding back to the flight manual (16). The flight manual (16) functions as a knowledgebase to store associate information of a platform and program execution

The BlackMAOC (14) includes a springer memory (17) to represent a heterogeneous data structure i.e. a dynamic combination refactor (DCR) structure, wherein data is stored in the spring memory (17) in the DCR structure. A typical DCR structure is shown in the table 1. Table 1. DCR structure

The BlackMAOC (14) also includes a V-ring hooker (18) to process said IR, wherein V-ring hooker (18) analyses the device codes based on the information stored in the flight manual (16). The V-ring hooker (18) formulates the DCR, executes invariant passes in pipeline, checks if there is any change in the IR and processes the IR accordingly.

If there is no change in the IR, the V-ring hooker (18) sends the IR to the self generating pass agents (15) for trampoline execution of the IR passes and stores associate information of the IR passes to the flight manual (16). On the contrary, if there is a change in the IR, the V-ring hooker (18) restarts IR processing. Prior to restarting IR processing, the V-ring hooker (18) shares potential results on a blackboard to activate the self-generating pass, resolves phase ordering, removes obsolete transformation paths from the blackboard and writes the potential results delegated to runtime passes on the blackboard.

Further, the V-ring hooker (18) sends the IR passes to the self-generating pass agents (15) for processing a re-plan route. The self-generating pass agents (15) receive and execute the IR passes from the BlackMAOC and send results to the BlackMAOC (14) as changes in the IR. The self-generating pass agents (15) ingest the IR passes to a trampoline jump and process the IR passes for multi agent pipeline execution.

During execution of the IR passes, the self-generating pass agents (15) classify failed potential optimizations based on corresponding reasons for failure such as aliasing between two variables and the like. Further, the self-generating pass agents (15) store the failed potential optimizations.

Further, the self-generating pass agents (15) check if the processed IR passes are ready for generating the program binary code for runtime. If no, the self-generating pass agents (15) return to ingestion of the IR passes. Otherwise, the self-generating pass agents (15) insert shared and specialized micro heterogeneous runtime code in the IR passes for creating one or more template codes and the associate information.

The self-generating pass agents (15) store the template codes and the associate information to the flight manual (16). Finally, a code generator (19, shown in FIGURE 1) generates a program binary code for runtime from the IR passes, which is executable at a machine for performing a function for which the source code is created by a programmer or developer.

The present invention extends IR optimization in the dynamic re-planning engine (13) for scrutinizing pipelining phase issues and unlocks aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline of multi-agent passes. By this way, the present invention is capable of efficiently complying with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices. FIGURE 3 shows a flow diagram of the method (20) for executing heterogeneous compilation, in accordance with an exemplary embodiment of the present invention. The method (20) comprises the steps of inputting a source code to an input unit (21), parsing and tokenizing the source code at a code parser into multiple device codes in the form of multiple tokenized words and tags (22), processing each device code at a dynamic re-planning engine to generate an intermediate representation (IR) for transformation and optimization processes (23) and generating a program binary code at a code generator (24).

Each device code is processed by segmenting and transforming IR passes at a blackboard multi-agent optimization composer (BlackMAOC) of the dynamic re- planning engine and by processing a re-plan route for the IR passes and feeding back to a flight manual using multiple self-generating pass agents in the dynamic re planning engine. Further, associate information of a platform and program execution is stored in a flight manual in the dynamic re-planning engine, which functions as a knowledgebase.

A springer memory in the BlackMAOC to represent a heterogeneous data structure i.e. a dynamic combination refactor (DCR) structure, wherein data is stored in the spring memory in the DCR structure. Further, a V-ring hooker in the BlackMAOC processes the IR, wherein the device codes are analysed based on the information stored in the flight manual. The V-ring hooker formulates the DCR, executes invariant passes in pipeline, checks if there is any change in the IR and processes the IR accordingly.

If there is no change in the IR, the IR is sent to the self-generating pass agents for trampoline execution of the IR passes and the associate information of the IR passes is stored in the flight manual. On the contrary, if there is a change in the IR, IR processing is restarted. Prior to restarting IR processing, potential results are shared on a blackboard to activate the self-generating pass, phase ordering is resolved, obsolete transformation paths are removed from the blackboard and the potential results delegated to runtime passes are entered in the blackboard. The IR passes are sent to the self-generating pass agents for processing a re-plan route, wherein the IR passes from the BlackMAOC are executed and results are sent to the BlackMAOC as changes in the IR. The IR passes are ingested to a trampoline jump at the self-generating pass agents and are processed for multi agent pipeline execution. During executing the IR passes, failed potential optimizations are classified based on corresponding reasons for failure such as aliasing between two variables and the like and are stored. Further, the self-generating pass agents check if the processed IR passes are ready for generating the program binary code for runtime. If no, the process is returned to ingestion of the IR passes. Otherwise, shared and specialized micro heterogeneous runtime code is inserted in the IR passes for creating one or more template codes and the associate information.

By this way, the present invention is capable of extending IR optimization in the dynamic re-planning engine for scrutinizing pipelining phase issues and unlocking aggressive heterogeneous optimization without slowing down compiling performance by composing unconditional multi pipeline. Therefore, the present invention efficiently complies with multi-core central processing unit (CPU) and multi-core general-purpose graphics processing unit (GPU) devices.

The template codes and the associate information are stored to the flight manual. Finally, the program binary code for runtime is generated from the IR passes, which is executable at a machine for performing a function for which the source code is created by a programmer or developer.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises," "comprising," “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed. The use of the expression “at least” or “at least one” suggests the use of one or more elements, as the use may be in one of the embodiments to achieve one or more of the desired objects or results.

Various methods described herein may be practiced by combining one or more machine-readable storage media containing code that perform the steps according to the present invention with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing various embodiments of the present invention may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the disclosure could be accomplished by modules, routines, subroutines, or subparts of a computer program product.

While the foregoing describes various embodiments of the disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof. The scope of the disclosure is determined by the claims that follow. The disclosure is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the disclosure when combined with information and knowledge available to the person having ordinary skill in the art.

Claims

CLAIMS:

1. A system (10) for executing heterogeneous compilation, comprising:

- at least one input unit (11 ) for inputting a source code;

- at least one code parser (12) for parsing and tokenizing said source code into multiple device codes in a form of multiple tokenized words and tags;

- a dynamic re-planning engine (13) for processing each device code to generate an intermediate representation, IR, for transformation and optimization processes; and

- a code generator (19) for generating a program binary code for runtime, characterized in that said dynamic re-planning engine (13) includes: i. a blackboard multi-agent optimization composer, BlackMAOC, (14) for segmenting and transforming IR passes; ii. a flight manual (16) functioning as a knowledgebase to store an associate information of a platform and program execution; and iii. multiple self-generating pass agents (15) for processing a re-plan route for said IR passes and feeding back to said flight manual.

2. The system (10) as claimed in claim 1, wherein said BlackMAOC (14) includes a springer memory (17) to represent a heterogeneous data structure and a V-ring hooker (18) for IR processing.

3. The system (10) as claimed in claim 2, wherein said heterogeneous data structure is a dynamic combination refactor, DCR, structure.

4. The system (10) as claimed in claim 2, wherein said V-ring hooker (18) processes said IR by:

- analysing said device codes based on said information stored in said flight manual (16);

- formulating DCR and executing invariant IR passes in pipeline;

- checking if there is any change in said IR; and

- processing said IR based on said check result.

5. The system (10) as claimed in claim 4, wherein, if there is no change in said IR, said V-ring hooker (18) sends said IR to said self-generating pass agents (15) for trampoline execution of said IR passes in pipeline and stores associate information of said IR passes to said flight manual (16).

6. The system (10) as claimed in claim 4, wherein said V-ring hooker (18) restarts IR processing if there is a change in said IR, wherein said V-ring hooker (18):

- shares potential results on a blackboard to activate said self-generating pass agents (15);

- resolves phase ordering;

- removes obsolete transformation paths from said blackboard;

- writes said potential results delegated to runtime passes on said blackboard; and

- re-runs said analysis of said device codes based on said information stored in said flight manual (16).

7. The system (10) as claimed in claim 1 , wherein said self-generating pass agents (15) process said re-plan route by:

- receiving and executing said IR passes from said BlackMAOC (14);

- sending results to said BlackMAOC (14) as changes in said IR;

- storing failed potential optimizations, each of said failed potential optimizations is classified by at least one reason for failure;

- ingesting said IR passes to a trampoline jump; and

- processing said IR passes for multi agent pipeline execution.

8. The system (10) as claimed in claim 7, wherein said self-generating pass agents (15) check if said processed IR passes are ready for generating said program binary code for runtime and insert shared and specialized micro heterogeneous runtime code for creating at least one template code and said associate information if said IR passes are ready.

9. The system (10) as claimed in claim 8, wherein if said IR passes are not ready for generating said program binary code, said self-generating pass agents repeat ingestion of said IR passes, processing of said ingested IR passes and checking of readiness of said IR passes.

10. The system (10) as claimed in claim 8, wherein said self-generating pass agents store said template code and said associate information to said flight manual.

11. A method (20) for executing heterogeneous compilation, comprising the steps of:

- inputting a source code to at least one input unit (21 ),

- parsing and tokenizing said source code at least one code parser into multiple device codes in the form of multiple tokenized words and tags (22);

- processing each device code at a dynamic re-planning engine to generate an intermediate representation, IR, for transformation and optimization processes (23); and

- generating a program binary code at a code generator (24), characterized in that said step of processing each device code includes: i. segmenting and transforming IR passes at a blackboard multi-agent optimization composer, BlackMAOC; ii. processing a re-plan route for said IR passes and feedback to a flight manual using multiple self-generating pass agents; and iii. storing an associate information of a platform and program execution in a flight manual that functions as a knowledgebase.