CN108604182B - Apparatus for generating code for execution on a distributed processing system - Google Patents

Apparatus for generating code for execution on a distributed processing system Download PDF

Info

Publication number
CN108604182B
CN108604182B CN201580084720.3A CN201580084720A CN108604182B CN 108604182 B CN108604182 B CN 108604182B CN 201580084720 A CN201580084720 A CN 201580084720A CN 108604182 B CN108604182 B CN 108604182B
Authority
CN
China
Prior art keywords
level
program
representation
code
data type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201580084720.3A
Other languages
Chinese (zh)
Other versions
CN108604182A (en
Inventor
亚历山大·尼古拉耶维奇·菲利波夫
亚历山大·弗拉基米罗维奇·斯莱萨连科
维克多·弗拉基米罗维奇·斯米尔诺夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN108604182A publication Critical patent/CN108604182A/en
Application granted granted Critical
Publication of CN108604182B publication Critical patent/CN108604182B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/453Data distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)

Abstract

There is provided an apparatus for generating code for execution on a distributed processing system, comprising: a data interface for receiving a representation of a program written in a high level programming language containing operations and abstract data types; a processor; and a memory storing vectorized code. The vectorization code includes the following code: instructions for mapping an abstract data type to an index set data type representing a set of elements of a data type, wherein each element of the set of elements is accessible by an index. Instructions for converting each operation into a vector-based form based on the index set data type to create a vector-based representation of the program. And instructions for providing the vector-based representation to an application programming interface of a low-level language for execution of the vector representation by the compute nodes distributed to the distributed processing execution environment.

Description

Apparatus for generating code for execution on a distributed processing system
Technical Field
The present invention, in some embodiments thereof, relates to systems and methods for generating program implementations for execution in distributed processing systems, and more particularly, but not exclusively, to systems and methods for optimizing implementations for programs executing within distributed processing systems.
Background
Some computing problems require a Distributed Processing Framework (DPF) to execute. Programs that perform such computations may not be able to run on a single compute node, for example, due to the limited memory and/or limited processing resources of the single node. In many cases, programs are executed within a distributed processing system using the processing and/or memory resources of multiple processing nodes.
DPFs typically provide a burner with a library of several specific fields (e.g., linear algebra, graphics processing). The use of abstract domain-specific objects provided by the library relieves the burner of the necessity to handle low-level details, making the program easier to write. However, there is a tradeoff between writer yield and performance of the program when executing in a distributed processing system. Raising the abstraction level of the writer creates additional low-level implementation overhead that may create performance penalties when executing the program.
Furthermore, the performance level of the executed distributed application is based on the burner's selection of the appropriate data structure for the implementation of the high-level domain object. In practice, however, it is difficult for the burner to know when which data structure to use for a particular use case to obtain the best performance. For example, for an algorithmic processing matrix, when to select a sparse matrix representation and when to select a dense matrix representation.
In fact, DPFs have significant limitations in their ability to allow the burner access to the distributed processing system. This constraint arises from the programming model supported by the DPF. The burner needs to manually circumvent these limitations, which can be difficult and time consuming, and/or result in a program that cannot be efficiently executed in a distributed processing system.
Disclosure of Invention
It is an object of the present invention to provide an apparatus, system, computer program product and method for creating a low-level implementation of a program representation for execution within a distributed processing system.
The foregoing and other objects are achieved by the features of the independent claims. Other embodiments are apparent from the dependent claims, the description and the drawings.
According to a first aspect, an apparatus for generating code for execution on a distributed processing system comprises: a data interface for receiving a representation of a program written in a high level programming language, the program representation containing operations, the program representation containing abstract data types; and a memory storing vectorized code; and a processor coupled to the data interface and the memory to execute vectorized code, the vectorized code comprising: code instructions for mapping each of the abstract data types to at least one of an index set data type, wherein each of the index set data types represents a set of elements of a data type, wherein each element of the set of elements is accessible by an index; converting each operation into a vector-based form based on the index set data type to create a vector-based representation of the program; a vector-based representation of a program is provided to an Application Programming Interface (API) of a low-level language for execution by compute nodes distributed to a distributed processing execution environment.
The abstract data type maps to at least one of the index set data types and converts each operation into a vector-based form based on the index set data types to create a vector-based representation to enable the writer to write to a distributed program that includes nested data parallelism (e.g., distribution of distributions). Vector analogs allow the operations of a program to be performed in a distributed manner, rather than collecting data stored in different nodes to a master node (which may not have enough memory to store all of the collected data), performing the operations on the collected data at the master node, and redistributing the results to the respective nodes.
In a first possible implementation of the apparatus according to the first aspect, the apparatus further comprises code instructions for converting, based on the index set data type, nested data-parallel operations of the program representation at an index set data type level with vector analogs.
The apparatus deletes nested data parallel code written by the burner and replaces it with vector analogs that can be executed by the distributed system.
In a second possible implementation form of the apparatus according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the abstract data types of the program representation are based on a hierarchical data structure, wherein each data type of a relatively higher level of abstraction is based on at least one data type of a relatively lower level of abstraction, wherein the index set data type is at the bottom of the hierarchical data structure.
The hierarchical data structure improves the execution performance of a program by allowing optimizations to be performed at each level of abstraction (e.g., in sequence). The index set, which can be converted to a vector representation, is located at the bottom of the hierarchy, allowing optimization before the index set level is reached, which improves the execution performance of the program.
In a third possible implementation form of the apparatus according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the apparatus further comprises code instructions for sequentially converting an Intermediate Representation (IR) of program instructions of the current abstraction level of an abstract data type of the current level of a hierarchical data structure representation based on the abstract data type into an IR of program instructions of a lower abstraction level of a data type of a lower abstraction level based on the hierarchical data structure.
Each IR may be optimized at its respective level before transitioning to the next lower level of abstraction, thereby improving the overall execution performance of the final program at the lowest level, which is executing in the runtime environment.
In a fourth possible implementation form of the apparatus according to the previous third implementation form of the first aspect, the data interface is further for receiving input values for processing by an instruction set of the IR of the program at the current abstraction level; and the apparatus further includes code instructions for selecting, from the isomorphic specializations, an isomorphic specialization of an instruction set of IR of the program at the current abstraction level to process the input value using a vector-based form, the selection based on a performance metric of each of the isomorphic specializations when processing the vector-based representation of the IR; wherein each isomorphic specialization has a different implementation based on the abstract data type at an abstraction level lower than the current abstraction level, wherein each isomorphic specialization produces the same result when processing input values.
Dynamic selection of isomorphic specialization at each abstraction level improves the execution performance of programs in a distributed environment based on the actual received input. Each level may be dynamically optimized based on actual inputs, thereby improving performance of the lowest level program executing in the distributed environment.
In a fifth possible implementation form of the apparatus according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the apparatus further comprises a compiler for sequentially optimizing to each IR application instruction of the program at each abstraction level.
Optimizations may be performed at each level of abstraction, thereby increasing the overall performance of the program.
In a sixth possible implementation form of the apparatus according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the apparatus further comprises code instructions for mapping instructions of the IR represented by the program based on the index set data type according to a set of predefined vectorization rules in the form of vectors creating instructions computable by the distributed processing execution environment.
The vector form enables the performance of execution of programs in a distributed processing execution environment to be improved.
In a seventh possible implementation form of the apparatus according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the vector form is computed without collecting data to the master nodes and redistributing the results.
The vector form allows operations to be distributed to different nodes rather than requiring data to be collected to the master node and the results redistributed, resulting in improved performance of execution of programs in a distributed processing execution environment.
In an eighth possible implementation form of the apparatus also according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the IR of the program contains a dataflow graph that includes at least one node representing an operation of a program representation based on an index set data type; and the apparatus further includes code instructions for converting each of the at least one node into a vector form of instructions to create a transformed dataflow graph for execution within the distributed processing execution environment.
Execution performance of the transformed dataflow graph in a distributed execution environment may be improved.
In a ninth possible implementation form of the apparatus according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the high-level programming language is a Domain Specific Language (DSL), wherein the abstract data type represents a distribution domain abstraction having a plurality of low-level distribution implementations, the DSL language not defining a certain low-level distribution implementation.
Using DSL allows the burner to use a high-level abstraction writer that can map to lower level languages and execute at a high performance level in a distributed environment (i.e., without significant performance loss when mapping from the high-level abstraction of the source code to low-level instructions for execution by the distributed system).
In a tenth possible implementation form of the apparatus according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the program written in the high level language is transformed into the graph-based abstract IR by an automated virtualization and a hierarchical evaluation method that identifies at least a portion of code written in the high level language, wherein the identified at least a portion of code is provided for converting each operation in the at least a portion into a vector-based form based on an index set data type to create a vector-based representation of the graph-based IR.
Optimizations may be performed at each level of abstraction, thereby increasing the overall performance of the program.
In an eleventh possible implementation form, a method of generating code from a program representation for execution on a distributed processing system is provided, wherein the method is for operating an apparatus also according to the first aspect or according to any of the previous implementation forms of the first aspect.
In a twelfth possible implementation form, a computer program stored on a computer-readable medium is provided, which computer program, when executed by a processor of a computer, performs the previous method.
Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although materials and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, exemplary methods and/or materials are described below. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be necessarily limiting.
Drawings
Some embodiments of the invention are described herein, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the embodiments of the present invention. Thus, it will be apparent to one skilled in the art from the description of the figures how embodiments of the invention may be practiced.
In the drawings:
FIG. 1 is a flow diagram of a computer-implemented method for generating low-level code from a high-level program representation executing within a distributed processing system according to some embodiments of the invention;
FIG. 2 is a block diagram of components of a system that generates low-level code from a high-level program representation executing within a distributed processing system, according to some embodiments of the invention;
FIG. 3 is a schematic illustration of the inefficient deletion of NDP based on prior art methods;
FIG. 4 is a schematic diagram representing different instances of abstract data types and implementations of abstract data types based on index sets according to some embodiments of the invention;
FIG. 5 is a schematic illustration of an automated virtualization module that translates a high-level program representation to generate a graph-based abstract IR, according to some embodiments of the invention.
FIG. 6 is a schematic diagram illustrating the relationship between code at a current level of abstraction and associated isomorphic specialization according to some embodiments of the invention;
FIG. 7 is a schematic diagram depicting a transition from a current level of abstraction to a lower level of abstraction based on selection of isomorphic specialized code in accordance with some embodiments of the invention;
FIG. 8 is a data flow diagram depicting an exemplary process of vectorization according to some embodiments of the invention;
FIG. 9 is a schematic diagram depicting a portion of code at a low abstraction level implemented using index set abstract data types for selection of vectorization and isomorphic specialization, according to some embodiments of the invention;
FIG. 10 is a schematic diagram depicting a source code data flow for creating executable code in vector form for source code, in accordance with some embodiments of the invention; and
fig. 11 is a table providing experimental results using a procedure based on the systems and/or methods described herein, according to some embodiments of the invention.
Detailed Description
The present invention, in some embodiments thereof, relates to systems and methods for generating program implementations for execution in distributed processing systems, and more particularly, but not exclusively, to systems and methods for optimizing implementations for programs executing within distributed processing systems.
Aspects of some embodiments of the invention relate to vectorizing code (implementable by a processing unit) that converts operations of a program representation written in a high-level abstract programming language into a vector-based form to create a vector-based representation of the program. The vector-based representation is translated into a low-level language and executed by a distributed processing system. The conversion of the high-level operations of the program representation to vector representations allows the burner to use Nested Data Parallel (NDP) writers, e.g., using a distributed set of Data objects.
The vectorization code maps abstract data types of the program representation to one or more index set data types (e.g., defined by a library, file, or other representation). Each index set data type represents a collection of elements of a data type. Each element in the set of elements is accessible by the index. Program operations that use the index set data type representation are converted into a vector-based form to create a vector-based representation of the program.
The vectorization code converts the NDP operations of the program representation into vector analogs based on the index set data type. A low-level vector form implementation based on corresponding high-level NDP operations may be performed by distributing low-level instructions to multiple compute nodes in a distributed processing system, optionally by parallel processing by the compute nodes.
The abstract data types of a program representation may be based on a multi-level hierarchical data structure (e.g., stored as a library, file, or other representation), where each data structure at one of the levels of abstraction is based on one or more data types at a lower level of abstraction (i.e., one level lower than the level defined by the hierarchical structure). The index set data type is defined as the lowest level of the hierarchy.
Optionally, the program representation at each abstraction level is optimized by code (e.g., a compiler) and fed as input to the next lower abstraction level, where the program representation at the lower level is optimized. This process can be done in an iterative manner, optimizing the program representation at each abstraction level until the lowest level is reached and the program representation is converted into vector form. Optimizations can be performed at each abstraction level, including selection of isomorphic specialization of the current abstraction level (e.g., based on performance metrics), optimization of program instructions, and optimization of graph-based representations of program representations.
Optionally, at each abstraction level (i.e., defined by a hierarchical data structure), the Intermediate Representation (IR) of the program at the current abstraction level may be converted by processor-implemented code into an IR representation at one lower abstraction level. The IR representation at each level may be optimized.
It should be noted that vectorization is performed at an abstraction level implemented using an index set layer, which is a layer above executable low level code (low level code) executing in a distributed execution environment.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
The present invention may be a system, method and/or computer program product. The computer program product may include one or more computer-readable storage media having computer-readable program instructions embodied thereon for causing a processor to perform various aspects of the present invention.
The computer readable storage medium may be a tangible device capable of holding and storing instructions for use by an instruction execution device. The computer readable storage medium may be, for example but not limited to: electronic storage, magnetic storage, optical storage, electromagnetic storage, semiconductor storage, or any suitable combination of the foregoing.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a corresponding computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network.
The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, an electronic circuit, including, for example, a programmable logic circuit, a field-programmable gate array (FPGA), or a Programmable Logic Array (PLA), may execute computer-readable program instructions to perform aspects of the present invention by personalizing the electronic circuit with state information of the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. As such, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Referring now to FIG. 1, FIG. 1 is a flow diagram of a computer-implemented method for generating low-level code from a high-level program executing within a distributed processing system according to some embodiments of the present invention. The method optimizes the IR of the program at each abstraction level and converts the IR at the lowest abstraction level to a vector representation based on the index set data type. Referring also to FIG. 2, FIG. 2 is a block diagram of components of a system 200 that allows a burner to write to a high-level program containing NDPs that can be executed in parallel using multiple nodes of a distributed processing system, according to some embodiments of the invention. The method of fig. 1 may be performed by the apparatus and/or system of fig. 2.
Systems and/or methods convert operations represented by a high-level program into low-level vector analogs, allowing the operations of the program to be performed in a distributed manner (i.e., distributing data and/or computations to different nodes for parallel processing), rather than, for example, collecting data stored in different nodes to a master node (which may not have enough memory to store all of the collected data), performing the operations on the collected data at the master node, and redistributing the results to the respective nodes.
The systems and/or methods described herein do not simply replace each high-level operation in sequence with a vector analog, as is done with some prior art methods, because doing so does not remove the NDP from the code, e.g., as depicted in the schematic of fig. 3, which depicts an NDP code 302 and a corresponding non-NDP code 304 created by sequentially replacing each operation (vectorization 306) with a vector analog based on prior art methods. Execution of such programs is performed, for example, by collecting data to a central node, performing operations, and redistributing results (308), which can cause bottlenecks in the execution of the programs because operations cannot be performed in parallel as originally programmed (310). Although program code may be efficient and compiled according to an Application Programming Interface (API) syntax, the compiled code cannot be executed in parallel due to parallel lookup operations (NDPs) in the body of the parallel mapping function. To execute a program using prior art methods, NDP can be deleted by replacing the parallel map with a vector analog for lookup, based on collecting data to the master node, processing the data in sequence, and redistributing the results. It should be noted that when the master node is unable to process the collected data (e.g., insufficient memory and/or insufficient processing resources), the program may fail or fail during runtime. As outlined in fig. 3, the prior art approach fails to handle advanced NDPs using a parallel architecture. In contrast, the systems and/or methods described herein map abstract data types to index set data type forms that are converted to vector representations and allow vectorized formats to be executed in parallel by different nodes in a distributed environment, i.e., without the need to collect data to a master node to perform operations.
At 102, a representation of a program 202 written in a high-level programming language is received. The program 202 may be a complete program, or a portion of a program (e.g., a function). Program 202 may include source code, or an IR representation (e.g., a dataflow graph), or compiled code, or other human and/or computer-readable instructions. Program 202 may be received as a file, a set of network messages (e.g., a packet), or other format that may be stored locally and/or remotely on a storage device (e.g., hard drive, server external storage). The program 202 may be created by a burner using a client terminal 212 (e.g., computer, laptop, server, mobile device), e.g., as described herein.
The program representation 202 may be received by an apparatus 204 (e.g., a computer, server, mobile device, networked computer, distributed system, or other computing unit) through a data interface 205 (e.g., network connection, connection port, wireless interface, external memory device connection) for receiving the program representation 202.
The apparatus 204 includes one or more processing units 206 (e.g., central processing units, digital signal processing units, field programmable gate arrays, custom circuits), the processing units 206 implementing code stored in memory 208 (and/or other local and/or external and/or remote storage devices, e.g., hard disk drives, random access memories, optical disk drives, other storage devices).
The device 204 is in communication with a distributed execution environment 216. The apparatus 204 transforms the program representation 202 from a high-level abstract representation to low-level code that is executable by the distributed execution environment 216.
The distributed execution environment 216 (also referred to herein as a distributed system or distributed processing system) may be organized as a heterogeneous distributed processing system and/or a homogeneous distributed processing system. The distributed execution environment 216 includes a plurality of compute nodes, each including one or more processors. The processors may be of different types or the like. The processors may use different instruction set architectures or similar instruction set architectures. Processors may have different architectural designs, such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Field Programmable Gate Array (FPGA), a processor for interfacing with other units, and/or dedicated hardware accelerators (e.g., encoders, decoders, and cryptographic coprocessors).
The program representation 202 contains a plurality of operations for processing data represented as abstract data types. The abstract data types may be defined based on a data structure library 210, which data structure library 210 may be stored locally on the memory 208, on another storage device, and/or remotely on a server or storage device in communication with the apparatus 204.
The abstract data type represents a domain object, for example, a domain specific object when the programming language is a Domain Specific Language (DSL). Examples of abstract data types include: matrices, vectors and graphs. DSL may provide higher levels of abstraction of data types, and/or more generic abstract data types than other programming languages, such as low-level programming languages, and/or programming languages that are not specifically designed to address problems in the same field as DSL. The DSLs may be pre-existing available DSLs, or custom developed DSLs, such as the R programming language for statistical calculations, and the Structured Query Language (SQL) programming language for databases.
The data structure library 210 defines an abstract interface for each domain object available for DSL. For example, the domain objects may be implemented in an object-oriented paradigm, such as an abstract class. The interface may provide access to object properties and/or object methods. The term "abstract data type" as used herein may refer to such domain objects.
Each domain object may be implemented based on one or more specific implementations. The concrete implementation may be implemented in an object-oriented paradigm, such as extending the corresponding abstract class and implementing the type of interface of the abstract class. Each concrete implementation contains one main constructor that defines the data structure for an abstract object representation. Each primary constructor may be based on one or more other domain objects and/or based on a base type.
Each abstract data type may map to one or more concrete data types or one or more abstract data types at a lower level. Each specific data type may be provided to an API of a low-level language that defines the data distribution of programs executing on the distributed processing system. The abstract data types may be implemented in a number of different data distribution implementations defined by a low-level language. The abstract data types are independent of any private data distribution definitions.
Different implementations of abstract data types (and optionally concrete) are isomorphic. As used herein, the term isomorphic refers to different implementations that hold the same amount of information and are available (i.e., interchangeable and/or convertible) from one another.
The abstract data type comprises an index set abstract data type that represents a set of elements of a data type, where each element in the set of elements is accessible by an index. For example, the abstract data type index set [ T ] represents a T-type element set in which an index access operation is defined. One or more specific implementations may be defined for an index set data type. For example, the index set [ T ] may have one of the specific implementations based on the array [ T ] data type.
Referring now to FIG. 4, FIG. 4 is a diagram representing different instances of higher level abstract data type and one lower level abstract data type implementation of index set-based abstract data types, according to some embodiments of the invention. The abstract data types at lower levels are isomorphic with each other.
The abstract data type matrix [ T ]402 may be implemented by a homogenous data type composite matrix 404A and a flat matrix 404B. The composite matrix 404A is implemented using an index set data type [ vector ] as shown by arrow 406A. The flat matrix 404B is implemented using an index set data type [ T ] as shown by arrow 406B.
The abstract data type vector [ T ]412 may be implemented by a homogenous data type dense vector 414A and a sparse vector 414B. Dense vector 414A is implemented using index set data type [ T ] as shown by arrow 416A. Sparse vector 414B is implemented using index set data types [ Int ] and [ T ] as shown by arrow 416B.
Abstract data type index set [ T ]]422 may be implemented with a set of homogeneous data type arrays 424A and a set of Spark indices 424B. Array set 424A uses array [ T ] shown by arrow 426A]Type implementation. Spark index set 424B uses RDD [ (Int, T) as shown by arrow 426B]Type implementation. (RDD ═ by Apache SparkTMA defined elastic distributed data set).
Referring now back to block 102 of FIG. 1, the abstract data type is based on a hierarchical data structure (which may be defined and/or stored in the data structure library 210.). A hierarchical data structure defines levels of abstraction, and relationships (e.g., mappings) between each abstract data type at a relatively higher level of abstraction and one or more abstract data types at a relatively lower level of abstraction. For example, the abstract type matrix may be represented by a relatively lower level array of abstract data type vectors. Each vector abstract data type may be represented by an array of relatively low abstract data type elements.
FIG. 4 graphically illustrates dependency relationships between abstract data types and concrete implementations, which may form a hierarchical data structure. As shown, matrix 402 may be implemented by a composite matrix 404A, the composite matrix 404A being implemented by a vector 412, the vector 412 being implemented by a dense vector 414A and a sparse vector 414B, both dense vector 414A and sparse vector 414B being implemented using an index set [ T ] 422.
It should be noted that vector 412 may not be implemented using the higher-level matrix 402 data type. If the relationship is expressed mathematically, in the case of abstract type A having one or more concrete implementations based on type B, A depends on B. This relationship naturally has a transitive property: if A depends on B and B depends on C, then A depends on C. If it is specified that circular dependencies are not allowed, then the dependency relationships allow the corresponding order to be built.
Referring back to block 102 of FIG. 1, the index set data type is defined to be at the bottom of the hierarchical data structure. The index set data type is defined as mapping to one or more low-level implementations that may be provided to the API interface 222 for execution within the distributed execution environment 216.
The hierarchical data structure improves the execution performance of a program by allowing optimizations to be performed (e.g., sequentially) at each abstraction level, as described herein. The index set data type, which may be converted to a vector representation and located at the bottom of the hierarchical data structure, allows optimization before reaching the index set level, which improves the execution performance of the program.
Optionally, the data structure library 210 replaces a definition of a pre-existing abstract class that provides index access to elements, such as an abstract class defined by a high-level programming language. The data structure library 210 may redefine abstract classes implemented based on index set abstract data types, which allows pre-existing abstract data types to be transformed into vector representations for parallel execution within a distributed processing system.
The program representation 202 may be source code written by a burner using a high-level programming language, optionally a Domain Specific Language (DSL). DSL may be designed to write distributed programs and/or may be designed to execute programs written in a certain domain within a distributed environment. The programmer may write the program representation 202 with abstract data types that represent definitions of the distributed domain abstractions having a plurality of available low-level distribution implementations without specifying which low-level implementation to use for the distributed domain abstractions. The DSL language may not define a low-level distribution implementation, allowing code described herein to select a low-level distribution implementation based on vector format and/or other optimizations performed at different abstraction levels.
The use of DSL allows the burner to write programs using a high level of abstraction that can be mapped to a lower level language and executed in a distributed environment at a high performance level (i.e., without incurring significant performance penalty when mapping from the high level abstraction of the source code to the low level instructions for execution by the distributed system).
At 104, a program written in a high-level language (e.g., source code) is transformed into an IR, optionally a graph-based abstract IR. Code virtualization may include abstracting abstract data types and/or language constructs from a representation in a high-level language. The transformation may be performed automatically by the processing unit 206 of the device 204 implementing code stored in the memory 208 (or in communication with the processing unit 204) as an automated virtualization module 214A. Alternatively or additionally, code virtualization may be performed manually by a burner.
The auto-virtualization module 214A may store code instructions to specify one or more portions of code written in a high-level language. Each code portion may be identified and designated as containing code and/or operations that are significant when the program is executed, such as functions that are called multiple times when the program is executed, loops that are executed multiple times, and decision branches that rely on data to make decisions critical to executing the program.
Each code portion is identified to convert the operation of each portion into a vector-based form based on the index set data type, creating a vector-based representation of the graph-based IR, as described herein.
Automated virtualization module 214A may embed one or more data objects, optionally a library of data structures of program representations, for example, as tags for identified hot spots. The embedded database of data structures and/or the tagging of hotspots may be used in optimizing the converted IR of the program, as described herein.
Optimizations may be performed on the identified portions of code at each abstraction level, which may improve the overall performance of the program when the lowest level representation of the program is executed in a distributed execution environment.
The auto-virtualization module 214A may convert a high-level representation of the program into a dataflow graph that includes one or more interconnected nodes, each of which represents an operation of the program representation, based on the data type defined at the current level of abstraction. It should be noted that each IR of a program is converted to a relatively low level of abstraction, as discussed herein. The graph representation at the lowest abstraction level uses an index set data type.
Referring now to FIG. 5, FIG. 5 is a schematic illustration of an automated virtualization module 502 (e.g., automated virtualization module 214A of FIG. 2), optionally implemented as a configuration file 504, according to some embodiments of the invention, the automated virtualization module 502 transforming a high-level program representation 506 (e.g., program representation 202 of FIG. 2) containing a plurality of identified hot-spots 508 to produce a graph-based abstract IR 510. Hotspot 508 may be embedded and/or marked in program representation 506 using data structure library 512.
Referring now back to fig. 1, at 106, the graph-based abstract IR of the program representation 202 is provided to a compiler 214B (e.g., stored on the memory 208 and/or on another local and/or remote storage device), the compiler 214B containing code for implementation by the processing unit 206 of the apparatus 204. Compiler 214B includes code instructions that perform a hierarchical evaluation for each abstraction level of the IR of the program. Compiler 214B optimizes the IR of the program at each level of abstraction. Compiler 214B may perform graph transformations to optimize the graph-based IR, e.g., reduce the number of compute nodes, change the order of the compute nodes, replace the compute nodes with different nodes, and/or increase the number of compute nodes (e.g., when the replaced nodes are estimated to improve performance in the execution environment over previous nodes).
Compiler 214B may perform the optimization sequentially by receiving a current abstraction level IR representation, optimizing the current abstraction level IR, converting the current abstraction level IR to a lower abstraction level, and repeating the process of representing the received lower abstraction level IR as the current abstraction level. Optimizations may be performed at each level of abstraction, thereby increasing the overall performance of the program.
At 108, code stored in isomorphic specialization module 214C that is implementable by processing unit 206 of apparatus 204 (stored on memory 208 and/or on another storage device in communication with apparatus 204) selects isomorphic specialization of one or more instruction sets of the IR of the program at the current abstraction level. The selection of isomorphic specialization may be performed dynamically based on actual received input 218 and/or based on predicted input 218 for processing by program representation 202 running within distributed execution environment 216. Alternatively or additionally, code may be dynamically created through isomorphic specialization 214C based on the selected isomorphic specialization. As used herein, the terms selection of isomorphic specialization and automatic creation of isomorphic specialization code are interchangeable. Dynamic selection of isomorphic specialization (and/or dynamic creation of code) at each abstraction level improves the execution performance of programs in a distributed environment based on the actual received input. Each level may be dynamically optimized based on actual inputs, thereby improving the performance of the lowest level of a program executing in a distributed environment.
The isomorphic specialization is selected from a set of predefined isomorphic specializations that can be stored in memory 208 and/or an external library of isomorphic specialization modules (locally and/or remotely located). Each isomorphic specialization has a different implementation based on the abstract data type at an abstraction level lower than the current abstraction level. Each isomorphic specialization produces the same result when processing input values. The actual performance of each isomorphic specialization may be different when different input values (having different absolute values, types, and/or representations) are processed within the distributed execution environment 216. For example, in some cases of input values, the performance of a sparse matrix implementation may be better than a dense matrix implementation. In other cases of input values, the performance of dense matrix representations may be superior to sparse matrix implementations within the same distributed execution environment.
Each isomorphic specialization includes code for processing the received input values 218 using an abstract data type at the current abstraction level (i.e., vector-based form at the lowest abstraction level) of IR. The selection of isomorphic specializations can be based on a performance metric for each isomorphic specialization when the received input values 218 are processed using the respective abstract data type defined for the current abstraction level (i.e., the vector-based representation of the IR at the lowest abstraction level). Isomorphic specializations are selected that represent the best predicted and/or actual performance of the input values 218 as processed by the corresponding isomorphic specializations.
The performance indicators may represent previous and/or current performance of the execution input values by the respective isomorphic specializations in the distributed execution environment. For example, the performance indicators may be obtained from a database of simulated performance indicators and/or experimentally determined performance indicators representing different possible absolute values and/or types of inputs.
The data processing interface 205 may receive input values 218 for instruction set processing of the IR of the program at the current abstraction level. Alternatively, the input value 218 may be an estimated input based on input predicted to be received by the program during execution. The input values 218 may be received, for example, as a network message, as a signal through a function, from a user through an API and/or other interface, from a stored file, from a remote server, from another program, and/or from the distributed execution environment 216. The input values 218 may be actual values for processing of the program representation 202 within the distributed execution environment 216, such as values obtained from a stored database, values obtained from user input, and/or values obtained from other programs and/or functions.
Referring now to FIG. 6, FIG. 6 is a schematic diagram illustrating the relationship between code at the current abstraction level and associated isomorphic specialization, according to some embodiments of the invention. Isomorphic specialization module 214C selects code from a plurality of available isomorphic specializations that includes a isomorphic specialization.
Isomorphic specialization module 214C receives an IR representation of the program at the current abstraction level, e.g., an operation that multiplies 602 matrix a by matrix B. The matrix multiplication 602 is implemented using the abstract data type defined by the current abstraction level, which is denoted as abstract data type Matr 604. The matrix multiplication 602 outputs a matrix C610 of the abstract data type Matr. The abstract data type Matr 604 may be implemented using concrete data types at a lower level of abstraction, e.g., SparseMatr606A and/or DenseMatr 606B.
Based on the different available isomorphic specials, the isomorphic specialization module 214C automatically selects and/or automatically generates code at a lower abstraction level, e.g., code for performing sparse matrix multiplication 608A based on the sparematr 606A abstract data type of the matrix C612A that outputs the sparematr type and/or code for performing dense matrix multiplication 608B based on the DenseMatr606B data type of the matrix C612B that outputs the DenseMatr type.
For receive matrix multiplication 602 at the current abstraction level, isomorphic specialization module 214C selects code at a lower abstraction level where matrix multiplication is performed using sparse matrix representation 614A or code at a lower abstraction level where matrix multiplication is performed using dense matrix representation 614B.
Referring now back to FIG. 1, at 110, the IR of a program instruction at a current abstraction level is converted to a next lower abstraction level according to the selected isomorphic specialization, e.g., by replacing a fragment (e.g., code, subgraph or other representation) of the IR with the corresponding selected isomorphic specialization. Each IR may be optimized at its respective level before conversion to the next lower level of abstraction, thereby improving the overall execution performance of the final program at the lowest level of execution in the runtime environment.
The choice of isomorphic specialization reduces the level of abstraction of the program's current IR by one level. Each abstraction level contains its own isomorphic specializations based on the abstract data types defined for the current abstraction level, as well as a mapping to abstract data types of lower abstraction levels, e.g., as defined by a hierarchical data structure representation optionally stored in the data structure repository 210.
Referring now to FIG. 7, FIG. 7 is a schematic diagram depicting the transition from a current level of abstraction to a lower level of abstraction based on the selection of isomorphic specialized code, according to some embodiments of the invention. Block 702 contains high level code written using the Scala programming language. Block 704 includes automatically generating lower level code based on the selected isomorphic specialization.
The code in block 702 is written using a matrix abstract data type. The code in block 704 uses a lower level set of abstract data type vectors and indices (which are exemplary implementations of matrices, as discussed herein). The matrix in block 702 is dedicated to the lower-level set of indices of the vector, e.g., for converting a matrix multiplication operation (×) to its lower-level representation.
Referring back now to FIG. 1, at 112, blocks 104, 106, and 108 are optionally repeated in sequence for each abstraction level until a lower abstraction level is reached that is implemented using the index set data type. At each abstraction level, compiler 214B optimizes IR and isomorphic specialization module 214C selects (and/or automatically generates) isomorphic specialization code, as described herein.
At 114, the lowest level of abstraction based on the hierarchical data structure is reached. Code implemented by processing unit 206 of device 204 (e.g., stored in memory 208) maps the abstract data types of the IR to index set data types. Index set data types are implemented using specific data types that may be executed in a distributed environment. Each index set data type represents a collection of elements of a data type. Each element in the set of elements is accessible by the index. Index access allows conversion to a vector-based representation and/or allows parallel processing of multiple operations using index set data types accessible by the index.
At 116, the lowest level operations of the IR based on the index set data type are converted into a vector-based form, creating a vector-based representation of the program (referred to herein as vectorization). For example, the conversion to vector form may be performed by vectorization code 214D stored in memory 208 and/or another storage device in communication with apparatus 204. The vectorization code 214D is implemented by the processing unit 206 of the device 204.
It should be noted that vectorization is performed at the level of abstraction implemented using an index aggregation layer, which is a layer above the executable low-level code executing in the distributed execution environment 216.
The vectorization code 214D contains instructions for deleting NDP code from the program representation. The NDP code may be replaced with vector analogs that may be executed by the distributed execution environment 216. The vectorization code 214D includes instructions to convert (and/or replace) the nested data-parallel operations of the program representation at the index set data type with a vector analog based on the index set data type.
The vectorization code 214D contains instructions for mapping the instructions of the IR represented by the program based on the index set data type according to a set of predefined vectorization rules in the form of vectors that create instructions that can be computed by the distributed execution environment 216. The vector form enables the performance of execution of programs in a distributed processing execution environment to be improved. The vectorization rules may be stored as a vectorization rules store 220, e.g., a file, a library, as human-readable rules, and/or as machine instructions. The vectorization rule store 220 may be stored locally and/or remotely on the memory 208 and/or another storage device.
The vectorization code 214D may include instructions for converting nodes of the IR of the program into vector forms of instructions to create a transformed dataflow graph for execution in the distributed processing execution environment 216. Execution performance of the transformed dataflow graph in a distributed execution environment may be improved.
Referring now to fig. 8, fig. 8 is a data flow diagram depicting an exemplary process of vectorization performed by the vectorizing code 214D and/or instructions of a compiler or other code, according to some embodiments of the present invention.
The graph representation of the IR representation at the current abstraction level of the index set abstract data type 802 is used as input to a vectorization process 804 to create a vectorized IR 806.
The Graph IR is provided to Graphics Interpreter (GI) code 808 (which may be implemented with vectorization code 214D and stored in memory 208). The GI 808 may create an empty context. The GI may traverse the graph IR (shown as block 810) to account for dependencies between nodes.
For each node (or group of nodes) of the graph IR, Staged Evaluator (SE) code 812 (which may be implemented with the vectorization code 214D and stored in the memory 208) uses the vectorization rules 220 to create a vectorized form of each operation of the graph in a given current context (shown as block 814).
Each node of the graph IR that has been converted into vectorized form by the SE 812 is provided to the GI 808. The GI 808 updates the context with the new pair (Operation → | Operation |), and continues traversing the nodes in the graph. Each node (or group of nodes) traversed is provided to SE 812 for vectorization and updating through GI 808. This process continues as the GI 808 traverses the nodes in the graph IR.
After the GI 808 completes the traversal of the nodes in the graph, the created vectorized graph 806 is provided. A vectorized graph IR 806 is created for the vectorized version | P | of the input program P.
Referring now to fig. 9, fig. 9 is a schematic diagram depicting a portion of code at a low abstraction level implemented using an index set abstract data type 902 vectorized by the vectorization code 214D to create a vector form 904 of the code, according to some embodiments of the invention. The process of converting the code 902 into vector form is performed by the vectorization code 214D in accordance with the vectorization rules 220, shown as vectorization rules 906, for example. Based on the selected homogeneous representation (e.g., as described with reference to block 120), the vectorized code 904 is converted into lower level code 908 for execution within the distributed execution environment 216.
At 118, the vectorized IR of the program is optionally optimized by performing one or more graph transforms, e.g., as described with reference to block 106 of fig. 1.
At 120, isomorphic specialization is selected for vectorized IR, e.g., as described with reference to block 108 of fig. 1. The implementation of isomorphic specialization converts the vectorized IR into a lower level representation designed to execute within the distributed execution environment 216.
At 122, the vector-based representation of the program implemented using the selected isomorphic specialization may be provided to an Application Programming Interface (API) 222 in a low-level language, e.g., as an executable file and/or executable code portion.
At 124, the executable representation of the program representation 202 executes in the distributed execution environment 216. The execution of the operations defined by the program representation 202 is performed in parallel by a plurality of nodes. The vector form allows operations to be distributed to different nodes rather than requiring data to be collected to the master node and the results redistributed, resulting in improved performance of execution of programs in a distributed processing execution environment.
Referring now to FIG. 10, FIG. 10 is a schematic data flow diagram depicting the data flow of a program represented by source code 1002 for creating executable code 1004 in the form of vectors for the program executing in the distributed execution environment 216, according to some embodiments of the present invention.
As described herein, source code 1002 is written using the data structure library 210. Source code 1002 is processed (virtualization 1006) by auto-virtualization module 214A to create an abstract IR representation 1008, optionally a graph-based IR, e.g., as described with reference to block 104 of fig. 1. The abstract IR 1008 is at a high level of abstraction. The process of lowering the abstraction level 1010 is performed by the processing unit 206 of the device 204, as described herein with reference to blocks 106-112 of fig. 1.
At 1012, the abstract IR is optimized by one or more graph transformations performed by compiler 214B to create a transformed IR 1014, as described with reference to block 106 of fig. 1.
At 1016, one or more isomorphic specializations are selected for the transformed IR 1014. Isomorphic specialization is used to convert the transformed IR 1016 to a lower level of abstraction to create the IR at the lower level of abstraction 1018. For each lower abstraction level implemented based on index set data type 1020, the process of transforming of IR at the current abstraction level, selection of isomorphic specialization, and conversion to the lower level is iterated, as described with reference to block 114.
IR of abstract data types based on index sets may be transformed (as described herein for other levels of abstraction) and vectorized 1022, as described herein with reference to block 116.
Isomorphic specialization is selected for the vectorized IR 1022 to convert the vectorized IR to a lower level code that can be optimized (e.g., transformed) to create the IR 1024. Executable code 1004 is created from IR 1024, for example, by providing IR 1024 to an API of a low-level program, as described with reference to block 122. Executable code 1004 executes within distributed execution environment 216 (e.g., by providing IR 1024 generation to an API).
Referring now to FIG. 11, FIG. 11 is a table that provides experimental results indicating that the systems and/or methods described herein improve the execution performance of programs in a distributed processing system based on mapping abstract data types of a high-level program representation to index set data types and converting operations of the high-level program into vector-based forms based on the index set data types, according to some embodiments of the invention.
The inventors compared the performance of a procedure involving NDP using the systems and/or methods described herein with another DPF. Programs written based on the systems and/or methods described herein are used to implement the SVD + + algorithm for collaborative filtering problems, for example, as described in http:// public. Based on the systems and/or methods described herein, a program is written using the NDP programming model and the program is aligned with the utilization using Apache SparkTMComparison of the performance of the GraphX program obtained with the DPF (available from http:// spark. apache. org/graph).
Each program runs in 4 different distributed execution environments, which consist of different hardware configurations. Local [4 ]]、local[8]And local [16 ]]Is a local SparkTMClusters, each cluster running on one machine with a corresponding number of worker threads. The 4-node cluster configuration is a 4-node SparkTMAnd (4) clustering.
The results of the experiment are summarized in the table of fig. 11. The results indicate that algorithms developed based on the systems and/or methods described herein have improved performance (sqrtErr 1.05) compared to GraphX programs executed at 50 iterations (sqrtErr 1.32) or 100 iterations (sqrtErr 1.23). The improved performance is based on the ability of the systems and/or methods described herein to perform parallel processing of NDP codes in vector form, but as discussed herein, prior art methods are unable to achieve this performance due to the inability to support parallel processing of NDP codes.
Further, the results indicate that the program reduces processing time based on the systems and/or methods described herein because optimization of the program representation is performed at each level of abstraction, as described herein.
The description of the various embodiments of the present invention is intended to be illustrative, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical advance, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein, compared to techniques available in the market.
It is expected that during the life of a patent maturing from this application many relevant systems and methods may be developed and the scope of the term distributed execution environment is intended to include all such new technologies a priori.
The term "about" as used herein means ± 10%.
The terms "including," comprising, "" having, "and variations thereof mean" including, but not limited to. This term encompasses the terms "consisting of … …" and "consisting essentially of … …".
The phrase "consisting essentially of …" means that the composition or method may include additional ingredients and/or steps, provided that the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. For example, the term "compound" or "at least one compound" may comprise a plurality of compounds, including mixtures thereof.
The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any "exemplary" embodiment is not necessarily to be construed as preferred or advantageous over other embodiments, and/or to exclude the presence of other combinations of features of other embodiments.
The word "optionally" is used herein to mean "provided in some embodiments and not provided in other embodiments". Any particular embodiment of the invention may incorporate a plurality of "optional" features, unless these features contradict each other.
Throughout this application, various embodiments of the present invention may be presented in a range format. It is to be understood that the description of the range format is merely for convenience and brevity and should not be construed as a fixed limitation on the scope of the present invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6, etc., as well as individual numbers within the range, such as 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
When a range of numbers is indicated herein, the expression includes any number (fractional or integer) recited within the indicated range. The phrases "in the first indicated number and the second indicated number range" and "from the first indicated number to the second indicated number range" are used interchangeably herein to mean to include the first and second indicated numbers and all fractions and integers in between.
It is appreciated that certain features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as any suitable other embodiment of the invention. Certain features described in the context of various embodiments are not considered essential features of those embodiments unless the embodiments are not otherwise invalid.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.

Claims (13)

1. An apparatus for generating code for execution on a distributed processing system, comprising:
a data interface for receiving a representation of a program written in a high level programming language, the program representation containing operations, the program representation containing abstract data types;
a memory storing vectorized code; and
a processor coupled to the data interface and the memory to execute the vectoring code, the vectoring code comprising:
code instructions for mapping each of the abstract data types to at least one of an index set data type, wherein each of the index set data types represents a set of elements of a data type, wherein each element of the set of elements is accessible by an index;
based on the index set data type, converting each operation into a vector-based form to create a vector-based representation of the program;
providing the vector-based representation of the program to an Application Programming Interface (API) of a low-level language for execution by compute nodes distributed to a distributed processing execution environment.
2. The apparatus of claim 1, further comprising code instructions for converting the program representation at index set data type level with vector analogs based on the index set data type.
3. The apparatus of claim 1 or claim 2, wherein the abstract data types of the program representation are based on a hierarchical data structure, wherein each data type of a relatively higher level of abstraction is based on at least one data type of a relatively lower level of abstraction, wherein the index set data type is at the bottom of the hierarchical data structure.
4. The apparatus of any of claims 1-2, further comprising code instructions to sequentially convert an Intermediate Representation (IR) of program instructions at a current abstraction level of a current level of an abstract data type based on a hierarchical data structure representation of the abstract data type to an IR of the program instructions at a lower abstraction level based on a data type at a lower abstraction level of the hierarchical data structure.
5. The apparatus of claim 4, wherein data interface is further for receiving input values for processing by an instruction set of the IR of the program at the current abstraction level;
the device further includes code instructions for selecting, from isomorphic specializations, an isomorphic specialization of the instruction set of the IR of the program at the current abstraction level to process the input value using the vector-based form, the selecting based on a performance metric of each of the isomorphic specializations when processing the vector-based representation of the IR;
wherein each isomorphic specialization has a different realization based on an abstract data type at an abstraction level lower than the current abstraction level, wherein each isomorphic specialization produces the same result when processing the input value.
6. The apparatus of any of claims 1-2, further comprising a compiler for sequentially optimizing to each IR application instruction of the program at each abstraction level.
7. The apparatus of any of claims 1-2, further comprising code instructions for mapping instructions of the IR of the program representation based on the index set data type according to a set of predefined vectorization rules that create a vector form of instructions that are computable by a distributed processing execution environment.
8. The apparatus of any of claims 1-2, wherein the vector form is computed without collecting data to the master node and redistributing results.
9. The apparatus of any of claims 1-2, wherein the IR of the program contains a dataflow graph that includes at least one node representing an operation of the program representation based on the index set data type;
the apparatus further includes code instructions for converting each of the at least one node into a vector form of the instruction to create a transformed dataflow graph for execution in a distributed processing execution environment.
10. The apparatus of any of claims 1-2, wherein the high-level programming language is a Domain Specific Language (DSL), wherein the abstract data type represents a distribution domain abstraction having a plurality of low-level distribution implementations, and wherein the DSL language does not define a certain low-level distribution implementation.
11. The apparatus of any of claims 1-2, wherein the program written in a high-level language is transformed into a graph-based abstract IR by an automated virtualization and a hierarchical evaluation method that identifies at least a portion of code written in the high-level language, wherein the identified at least a portion of the code is provided for converting each operation in the at least a portion into the vector-based form based on the index set data type to create the vector-based representation of the graph-based IR.
12. A method of generating code from a program representation for execution on a distributed processing system, the method being for operating an apparatus according to any of claims 1-11.
13. A computer program stored on a computer-readable medium, characterized in that the computer program, when being executed by a processor of a computer, carries out the method of claim 12.
CN201580084720.3A 2015-11-20 2015-11-20 Apparatus for generating code for execution on a distributed processing system Active CN108604182B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2015/000807 WO2017086828A1 (en) 2015-11-20 2015-11-20 Generating a vector based representation of a program for execution in a distributed processing system

Publications (2)

Publication Number Publication Date
CN108604182A CN108604182A (en) 2018-09-28
CN108604182B true CN108604182B (en) 2021-04-09

Family

ID=56203890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201580084720.3A Active CN108604182B (en) 2015-11-20 2015-11-20 Apparatus for generating code for execution on a distributed processing system

Country Status (2)

Country Link
CN (1) CN108604182B (en)
WO (1) WO2017086828A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766190B (en) * 2022-11-10 2023-07-21 北京海泰方圆科技股份有限公司 Encryption method, decryption method and electronic equipment for arbitrary set elements

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217004A (en) * 2014-09-15 2014-12-17 中国工商银行股份有限公司 Monitoring method and device for database hot spot of transaction system
CN104536958A (en) * 2014-09-26 2015-04-22 杭州华为数字技术有限公司 Composite index method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8549501B2 (en) * 2004-06-07 2013-10-01 International Business Machines Corporation Framework for generating mixed-mode operations in loop-level simdization
US20130055294A1 (en) * 2011-08-29 2013-02-28 Christopher Diebner Extensible framework which enables the management of disparately located heterogeneous systems requiring command and control, situational awareness, operations management and other specific capabilities

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217004A (en) * 2014-09-15 2014-12-17 中国工商银行股份有限公司 Monitoring method and device for database hot spot of transaction system
CN104536958A (en) * 2014-09-26 2015-04-22 杭州华为数字技术有限公司 Composite index method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张鲁飞等.基于矩阵计算的并行谱聚类方法.《计算机科学与探索》.2015, *
戚军军.一种基于MPI和MapReduce的分布式向量计算框架的研究与实现.《中国优秀硕士学位论文全文数据库 信息科技辑》.2014,第5-51页. *
梁帆等.Accelerating Iterative Big Data Computing Through MPI.《计算机科学技术学报(英文版)》.2015,第283-294页. *

Also Published As

Publication number Publication date
WO2017086828A1 (en) 2017-05-26
CN108604182A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
US11341303B2 (en) System for reversible circuit compilation with space constraint, method and program
US11416228B2 (en) System and method of optimizing instructions for quantum computers
Chou et al. Format abstraction for sparse tensor algebra compilers
US10534590B2 (en) Dynamic recompilation techniques for machine learning programs
Xin et al. Graphx: Unifying data-parallel and graph-parallel analytics
Brown et al. Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns
AU2019204395A1 (en) Visually specifying subsets of components in graph-based programs through user interactions
US20140115560A1 (en) Systems and methods for parallelization of program code, interactive data visualization, and graphically-augmented code editing
US20130235050A1 (en) Fully parallel construction of k-d trees, octrees, and quadtrees in a graphics processing unit
US11392623B2 (en) Hybrid in-memory BFS-DFS approach for computing graph queries against heterogeneous graphs inside relational database systems
Kusum et al. Efficient processing of large graphs via input reduction
US11392624B2 (en) Hybrid in-memory BFS-DFS approach for computing graph queries against homogeneous graphs inside relational database systems
US10268461B2 (en) Global data flow optimization for machine learning programs
Iverson et al. Evaluation of connected-component labeling algorithms for distributed-memory systems
WO2016177405A1 (en) Systems and methods for transformation of a dataflow graph for execution on a processing system
Cecilia et al. Enhancing GPU parallelism in nature-inspired algorithms
Cabiddu et al. Large mesh simplification for distributed environments
Litteken et al. An updated LLVM-based quantum research compiler with further OpenQASM support
CN108604182B (en) Apparatus for generating code for execution on a distributed processing system
Ling et al. Knowledge compilation for constrained combinatorial action spaces in reinforcement learning
Fryz et al. Assurance of system consistency during independent creation of UML diagrams
Qiao et al. GPU implementation of Borůvka’s algorithm to Euclidean minimum spanning tree based on Elias method
Kong On the impact of affine loop transformations in qubit allocation
KR20220118948A (en) Using hardware-accelerated instructions
Adamus et al. A step towards genuine declarative language-integrated queries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant