CN107250985A - For Heterogeneous Computing API(API)System and method - Google Patents

For Heterogeneous Computing API(API)System and method Download PDF

Info

Publication number
CN107250985A
CN107250985A CN201580076832.4A CN201580076832A CN107250985A CN 107250985 A CN107250985 A CN 107250985A CN 201580076832 A CN201580076832 A CN 201580076832A CN 107250985 A CN107250985 A CN 107250985A
Authority
CN
China
Prior art keywords
api
group
processor
modules
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201580076832.4A
Other languages
Chinese (zh)
Other versions
CN107250985B (en
Inventor
奥弗·罗森伯格
内坦·彼得弗洛恩德
大卫·米诺尔
埃亚勒·罗森贝格
阿德南·阿巴里亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN107250985A publication Critical patent/CN107250985A/en
Application granted granted Critical
Publication of CN107250985B publication Critical patent/CN107250985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/541Interprogram communication via adapters, e.g. between incompatible applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities

Abstract

The invention provides a kind of device for being used to handle API (application programming interface, API) request, including:Interface for receiving the API request;And processing unit, it is used for:Multiple processors of the identification with different instruction set framework (instruction set architectures, ISA);Operate a different set of API executor modules;And at least one API executor module is controlled, to perform order based on the API request at least one processor in the processor.

Description

System and method for Heterogeneous Computing API (API)
Background technology
The present invention is related to Heterogeneous Computing in some of embodiment, more specifically and non-uniquely, is related to for Heterogeneous Computing The system and method for API (application programming interface, API).
Heterogeneous computing system comprising multiple different processors can be programmed by special API.Different supplies Business provides different API, and each API may not in terms of language syntax and/or for the available action that performs on a processor Together.Single supplier can provide same API different editions, for performing on different processing hardware.
Each API may be different in terms of model, memory model, language syntax and compilation model is performed.For example, holding Row model can include synchronous or asynchronous queue.Performing model can support or not support event.Memory model can be Distributed or local.Transmission can be nontransparent or transparent.It can support or not support to map and cancel mapping behaviour Make.Store function can be based on pointer or based on a kind of different model.API language is possible different with structure in grammer. Compilation model can include single online and offline compiling, or including combination compiling.
Program is write for heterogeneous computing system more complicated, may be absorbed in programmer and be operated using single API.It is single Individual API may not provide all required abilities or behavior, and may not support the kernel of various versions.Therefore, programmer Modification API may be needed or other API are used.
The speed being programmed using multiple API is slow, error-prone, expensive and needs the high level training of programming personnel Instruction, knowledge and skills.Seek the programming to heterogeneous system to be improved.
The content of the invention
The purpose of the present invention is to improve the processing of API.
Above and other target is realized by the feature of independent claims.Other embodiment will from appurtenance Ask, it is apparent in description content and accompanying drawing.
According in a first aspect, a kind of be used to handle API (application programming Interface, API) request device include:Interface for receiving the API request;And processing unit, it is used for:Identification Multiple processors with different instruction set framework (instruction set architectures, ISA);Operate one group not Same API executor modules;And at least one API executor module is controlled, at least one in the processor Order is performed based on the API request on reason device.
Described device is by automatically selecting API executor modules, rather than requires that application developer is entered using multiple API Row operation, it is negative come eliminate that the different API of use from mankind application developer are programmed to different isomerization processing equipment Load.Different API can provide different functions, sometimes with different grammers so that be programmed using multiple API very tired It is difficult, easily malfunction and take.It is (rather than multiple different using single unified environment that described device allows programmer to be absorbed in API) operated, then the single unified environment is mapped to available API executor modules by described device.In single unification In framework, programmer has by the different API a variety of different abilities provided or behavior (for example, supporting the interior of various versions Core).
According to described in a first aspect, in the first possible form of implementation of described device, each API executor modules include With at least one object in the following group:Storage object, operation object, queue object, and at least one described object is used to make a reservation for The ISA of justice;The processing unit is used to control at least one described API executor module, with based at least one object in institute State and the order is performed at least one processor.
Described device distributes different objects automatically, is mapped between different objects and API executor modules, and lead to API module is crossed using the API special objects of different lower levels to perform operation to high-level command.
According to first aspect as described above or according to foregoing any form of implementation of the first aspect, in described device In second possible form of implementation, described device includes unified layer, and the unified layer is included with least one unification pair in the following group As:Unified storage object, unified operation object, unified queue object, and the Compatible object is then used for the API request.
Described device creates the higher level of abstraction of storage object, operation object and queue object.By using the rudimentary of API Language, described device from the different rudimentary API of trend be embodied as programmer provide single face higher level operation object (for example, Sequence, filtering, addition).
May form of implementation according to the foregoing first or second form of implementation of the first aspect, the in described device the 3rd In, the processing unit is used for an operational order in one group of operational order is related to the signature for indicating corresponding API request Connection.
Signature represents the abstract representation there is provided operational order.Signature is represented can be by different rudimentary API discriminatively Realize.
According to any form of implementation in foregoing first, second or third form of implementation of the first aspect, described In the possible form of implementation of the 4th of device, each storage object in one group of storage object includes:Common portion, defines described one Public Value Types and public function that each member in group storage object has;And specific part, uniquely define at least One specific Value Types and at least one API specific function are called.
General utility functions has both been unified in the design that general and specific part is provided, and the specific rudimentary definition of API is provided again.It is logical Allow computer program to perform abstract advanced storage order with part, details is realized without rudimentary.Advanced storage order is certainly It is dynamic to be mapped to low-level command, to be performed on some target device.
According to any form of implementation of foregoing the first, second, third of the first aspect or the 4th in form of implementation, In the possible form of implementation of the 5th of described device, the processing unit is used for using every in one group of API executor module The corresponding set of operational order of one, based on it is multiple seek unity of action between storage object and one group of storage object associate with And it is multiple seek unity of action between operational order and one group of operational order associate, it is another in multiple subsequences to perform It is individual.
Mapping from Compatible object to local API executor modules object allows high-level abstractions API request optionally It is mapped to different rudimentary API instructions.
According to foregoing 5th form of implementation of the first aspect, the in described device the 6th may be in form of implementation, often Individual seek unity of action storage object and the one group of storage pair of each in a different set of API executor modules A member as in is associated, and each operational order and a different set of API actuators mould of seeking unity of action A member in the one group of operational order of each in block is associated.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device In 7th possible form of implementation, the processing unit is used for:The runtime data of the performing environment is collected, to use the fortune Data are associated with one in the multiple processor by one in the API executor modules during row.
Associated by adding or cancelling with API executor modules, the change of performing environment can be realized automatically, for example, move Except processing hardware or the new processing hardware of addition.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device In 8th possible form of implementation, the processing unit is used for:According to from order response time, overall order execution time and power consumption One processor characteristic of middle selection, by one in the API executor modules and a phase in the multiple processor Association.
Different API can operate the processor with different performance rank.By considering what is produced by the API matched Device characteristic is managed, API executor modules are mapped to processor to improve systematic function.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device The 9th may be in form of implementation, the processing unit is used to sequence being divided into multiple queues;It is every in the multiple queue One is handled by the different API executor modules of one in a different set of API executor modules.
For handling=selections of the API executor modules of each queue improves systematic function, because different API is held Row device module can handle the different queue with different performance rank.The API that for each queue performance can be selected best is held Row device module.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device The tenth may be in form of implementation, the processing unit is used to create described one at initialization event in the operation of the application The different API executor modules of group.
Initialization is according to available existing processing infrastructure generation API executor modules in performing environment during operation.API Executor module is generated according to the change of available processors, and change is such as addition new processor and/or removal processor.Just Beginning event may trigger new executor module and more effectively handle related to computer program using different processors Change.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device In 11st possible form of implementation, the processing unit is used to manage the sequence at least one layer queue of seeking unity of action.
The automatic dividing sequence of described device, and for sequence different piece specify and use different API executor modules, Without programmer's dividing sequence.
It may be used to handle API (application there is provided one kind in form of implementation the 12nd Programming interface, API) request method, methods described be used for according to first aspect as described above or according to The device of any foregoing embodiments of the first aspect is operated.
There is provided a kind of computer program in the 13rd possible form of implementation, the computer program is on computers Preceding method is run during execution.
Unless otherwise defined, otherwise all technical terms and/or scientific terminology used herein are respectively provided with institute of the present invention The equivalent that the one of ordinary skill in the art being related to is commonly understood by.Although similar or identical to it is described herein that A little methods and material can be used in the practice of embodiments of the invention or in test, but illustrative methods are described below And/or material.In case of conflict, it is defined by the patent specification including definition.In addition, material, method and example are only Only it is exemplary, it is no intended to limited with being necessary.
Brief description of the drawings
Herein only as an example, being described in conjunction with the accompanying some embodiments of the present invention.Specifically now with reference to accompanying drawing, it is necessary to It is emphasised that shown project is as an example, in order to illustratively discuss embodiments of the invention.So, illustrate with reference to the accompanying drawings, The embodiment of the present invention how is put into practice to will be apparent to those skilled in the art.
In the accompanying drawings:
Fig. 1 is the flow chart of the method for processing API request according to some embodiments of the invention;
Fig. 2 is the block component diagram of the system of the device for including processing API request according to some embodiments of the invention;
Fig. 3 A are additional optional module and/or object structure in the device of Fig. 2 according to some embodiments of the invention Block diagram;
Fig. 3 B are the schematic diagrames of the propagation of description storage operation bind command according to some embodiments of the invention;
Fig. 3 C are the schematic diagrames of the structure of storage object according to some embodiments of the invention;
Fig. 4 is according to some embodiments of the invention for recognizing the processor in performing environment and according to the place recognized Manage the flow chart of the method for device generation API executor modules;
Fig. 5 is the stream for being used to map the method for the operation of API executor modules support according to some embodiments of the invention Cheng Tu;
Fig. 6 be according to some embodiments of the invention describe API request from the computer program of execution to in target The schematic diagram of the mapping of the rudimentary API instructions performed in equipment and/or processor;And
Fig. 7 is to describe to perform on the target device in performing environment based on the API request of reception is mapped to Rudimentary API, module described herein and/or object between data flow schematic diagram, it is as described herein.
Embodiment
The present invention is related to Heterogeneous Computing in some of embodiment, more specifically and uniquely, and being related to should for Heterogeneous Computing With DLL (application programming interface, API) system and method.
The one side of some embodiments of the invention is related to a kind of device, the device management be used for it is unified from by different many The layer of the communication of multiple different processors of individual API (each API controls a different processor) control.The device connects automatically The API request sequence of receipts program, and by the advanced procedures (or part thereof, the partitioning portion of such as program) based on advanced interface The subsequence of API request be mapped to different rudimentary API.Each corresponding rudimentary API operation one or more processors with Perform the program (or its partitioning portion) of mapping.So, advanced procedures including be organized as heterogeneous computing system it is multiple not With being performed automatically in the performing environment of processor.Advanced procedures need not include specify for execution specific rudimentary API and/or The low-level instructions of par-ticular processor.The API performed can be selected to improve the systematic function for configuration processor, because different API and/or different processors can perform identical program in different performance classes.
Alternatively, device controls one or more API executor modules, with based on senior API request (by program or its portion Distribution cloth) one or more low-level commands are performed on the one or more processors, these processors can be alternatively according to finger Order collection framework (instruction set architecture, ISA) is organized as distinct device.Each senior API request can be with It is mapped to one in one group of multiple rudimentary API executor module.Each API executor modules are optionally based on public ISA To operate some processor in this group of available processors.Alternatively, processor is dissimilar, forms heterogeneous system.
API request can be provided, performed by one or more processors by computer program, by one or more rudimentary The senior request of API operations.Device specifies the specific API executor modules for performing API request automatically.So, program can To be write using senior API request, the processor of isomery performing environment is operated without defining rudimentary API executor modules, And/or which part of which computing device program do not defined.Device can select API executor modules senior to improve The execution performance of API request.
Alternatively, the different API executor modules of the group are based on the existing processor in performing environment, for example, hold detecting The processor availability of row environment change run time during, automatically generated by device., can be with according to ISA difference For the different API executor modules of different (single or multiple) generations of processor.
It should be noted that device described herein can be implemented as (in hardware and/or software) program module, system, Method and/or computer program product.
Before explaining at least one embodiment of the invention in detail, it will be appreciated that the present invention in its application not necessarily The construction and arrangement for being limited to component that is illustrated by following description and/or being shown in accompanying drawing and/or example and/or method are thin Section.The present invention can be to be realized or be practiced or carried out in a variety of ways by its embodiment.
The present invention can be a kind of system, a kind of method and/or a kind of computer program product.Computer program product can Including a kind of computer-readable recording medium, computer-readable recording medium has in computer-readable program instructions thereon, For making computing device each aspect of the present invention.
Computer-readable recording medium can be tangible device, and tangible device can retain with store instruction so that instruction is performed Equipment is used.Computer-readable recording medium can be, such as, but not limited to electronic storage device, magnetic storage apparatus, optics Storage device, electromagnetism storage device, the random suitable combination of semiconductor memory apparatus or aforementioned device.
Computer-readable program instructions described herein can be downloaded to from computer-readable recording medium corresponding calculating/ Processing equipment or by network, such as internet, LAN, wide area network and/or wireless network download to outer computer or outer Portion's storage device.
Computer-readable program instructions can be as independent software package all on the computer of user, partly in the meter of user Performed on calculation machine, and part is performed on the computer of user and part is performed on the remote computer, or all long-range Performed on computer or server.In latter scene, remote computer can pass through any type of network connection to user Computer, the networks of these types includes LAN (local area network, LAN) or wide area network (wide area Network, WAN), or (for example can pass through internet using ISP) and be connected to outer computer. In some embodiments, including PLD, field programmable gate array (field-programmable gate Arrays, FPGA) or the electronic circuit of programmable logic array (programmable logic arrays, PLA) etc. can pass through Referred to using the status information of computer-readable program instructions with carrying out personalization to electronic circuit to perform computer-readable program Order, to perform each aspect of the present invention.
Herein, each aspect of the present invention refers to method according to embodiments of the present invention, device (system) and computer The flow chart illustration and/or block diagram of program product are described.It will be understood that, each square frame in flow chart illustration and/or block diagram And the combination of the square frame in flow chart illustration and/or block diagram can be realized by computer-readable program instructions.
Flow chart and block diagram in figure show system according to various embodiments of the present invention, method and computer program product Possible embodiment framework, function and operation.In this regard, each square frame in flow chart or block diagram can represent mould Block, fragment or part are instructed, and part instruction includes the one or more executable instructions for being used to realize specified logic function. In some alternative embodiments, the function of being mentioned in square frame can not occur according to the order mentioned in figure.For example, showing in succession Two square frames in fact can substantially simultaneously perform, or these square frames can be performed with reverse order sometimes, and this is depended on Involved function.It should also be noted that each square frame and block diagram and/or flow chart figure of block diagram and/or flow chart illustration The combination of square frame in showing can specify the special hardware based system of function or action by performing or perform specialized hardware Realized with computer instruction combination.
Referring now to Figure 1, Fig. 1 is according to some embodiments of the invention for controlling API executor modules with one Or based on (alternatively, computer program) the senior exectorial method of API request on multiple processors.Referring also to Fig. 2, figure 2 be the block component diagram of system, and system is supported to allow programmer rudimentary without considering with high level language source code by following Processor is realized:Automatically select API executor modules and be mapped to advanced procedures (it includes senior API request) selected API executor modules are so as to by the rudimentary execution of the processor of performing environment progress.The method of claim 1 can by Fig. 2 dress Put and/or system is performed.
Device based on senior API request by automatically selecting API executor modules, rather than requires that application developer makes Operated, different isomerization processing equipment is entered with multiple API to eliminate the use different API from mankind application developer The burden of row programming.Different API can provide different functions, sometimes with different grammers so that directly using multiple API (opposite with API request described herein) is programmed highly difficult, easy error and taken.Device allows programmer It is absorbed in using senior API request and is operated using single unified environment (rather than multiple different API), then device will It is mapped to available API executor modules.In single Unified frame, programmer has by a variety of of different API offers Different abilities or behavior (for example, supporting the kernel of various versions).
Device 200 includes interface 202 to receive API request 204.API request 204 can be regard as advanced procedures 206 A part for source code and comprising for example as storehouse, and/or being used as the subprogram being integrated in high-level language.Computer program Can be complete computer program, a part for computer program and/or single algorithm.Computer program can be senior source Code format, the low level code form being suitably executed or precompile code.Program 206 can be performed in performing environment 214.
Alternatively, source code is write with Domain Specific Language (domain specific language, DSL).DSL can be with There is provided than other programming languages, the problem of such as low level programming language and/or non-expert design are to handle with DSL identicals domain Programming language, the data type of higher level is abstract and/or abstract data type more widely uses.DSL can be advance The available DSL or the DSL of customized development existed.
Device 200 includes processing unit 208, and processing unit 208 operates a different set of API executor modules 210A-C, It is as described herein.Processing unit 208 controls one or more API executor modules 210A-C to perform ring based on API request Order is performed on the one or more processors 212A-C in border 214, it is as described herein.It is noted that the quantity of executor module 2 can be more than with the quantity of processor, selection quantity 2 is for simple, clear and explanation purpose.
Processor 210A-C can be different, alternatively, be operated using different ISA.Processor 210A-C can be with With different architecture designs, such as central processing unit (central processing unit, CPU), graphics processing unit (graphical processing unit, GPU), the processor for being connected with other units, and/or specialized hardware accelerate Device (for example, encoder, decoder and crypto-coprocessor).Herein, one or more processors and associated storage Device is sometimes referred to as equipment.Each equipment can include the multiple processors and associative storage according to public ISA operation.This paper institutes The term processor and equipment used can be exchanged sometimes.
Each API executor modules are mapped to (or the single group processing operated using identical ISA of a processor Device or an equipment), it is as described herein.Each API executor modules are specific to a kind of API types.Each API executor modules The processor for the API types for supporting executor module can be mapped to.
With reference now to Fig. 3 A, Fig. 3 A are the additional optional modules in the device 200 of Fig. 2 according to some embodiments of the invention And/or the block diagram of object structure.Add-on module provides unified interface for programmer's program of writing, and according to Compatible object and API The unified interface is mapped to API executor modules by the mapping between executor module object.Seek unity of action layer 304 and API is held Row device module 210A-C provides layered framework, the operation for performing API request on various Heterogeneous Computing API.Object is provided High-level abstractions by the API request sent by computer program to be mapped to rudimentary API, to be held in the processor of performing environment OK.
Program 206, it includes the API request alternatively defined according to unify API 302, by seeking unity of action for device 200 Layer 304 is received.Unified layer 304 includes one or more Compatible objects:Unified storage object 306A, unified operation object 306B and Unified queue object 306C.Each Compatible object is adapted to API request, such as order according to included by API request generate, and/ Or it is associated with API request.
Each API executor modules 210A-C of equipment 200 includes being mapped to the corresponding objects for layer 304 of seeking unity of action One or more of 306A-C following object 308A-C:Storage object 308A, operation object 308B and queue object 308C. Each object 308A-C is defined according to corresponding API executor modules 210A-C, such as according to corresponding with API executor modules API types include low-level instructions.Alternatively, each object is carried out according to predefined ISA corresponding with API executor modules Adjustment, the operation for example defined according to ISA is generated, and/or with the rudimentary definition according to ISA.
Storage object at seek unity of action layer and API executor modules is for example in each equipment in target execution environment In (that is, associated with one or more processors), the high-level abstractions of available memory.Storage object describe data format and Type, such as 10000 floating-point number vectors, and include the matrix of 50x50 floating number.Abstract deposit is performed using storage object Reservoir is managed, for example, the release of Memory Allocation, memory and garbage collected.
Operation object at seek unity of action layer and API executor modules is in target execution environment, such as each In equipment, the high-level abstractions of the program code (that is, API request) of operation.Operation object defines a specific function, for example, arrange Sequence or convolution.Abstract code administration detail is performed using operation object, for example, compile, perform and optimize.
Queue object at seek unity of action layer and API executor modules is target execution environment, such as in each equipment, Operation scheduling high-level abstractions.Abstract operation perform performed using queue object, for example either synchronously or asynchronously perform, coordinate and according to Rely.
Each API executor modules 210A-C is based on 1:1 maps to control corresponding equipment 312A-C.Each equipment 312A-C includes alternatively using the one or more processors of public ISA operation.For example, equipment 312A is including one or more CPU 314A, equipment 312B include one or more GPU 314B, and equipment 312C includes one or more field-programmable gate arrays Arrange (field programmable gate array, FPGA) 314C.Equipment 312A-C combines to form isomery performing environment.
The control API executor modules 210A-C of processing unit 208 of device 200 is with the execution on relevant device 312A-C (being received using unify API 302 from program 206) order.Using equipment 312A-C ISA, by related API 316A-C, The API and/or the API of customization that for example supplier provides obtain control.Based on by controlling API executor modules 210A-C to define Object 308A-C realize control.
Device creates the higher level of abstraction of storage object, operation object and queue object.By using API low level language Speech, device is embodied as programmer from the different rudimentary API of trend and provides the higher level operation object of single face (for example, sequence, mistake Filter, addition).
Device distributes different objects automatically, is mapped between different objects and API executor modules, and pass through API module performs operation using the API special objects of different lower levels to high-level command.
Object is allocated, and is mapped seeking unity of action between layer and API executor modules, and for performing operation to data, It is as described herein.Generally, the system with N number of equipment and/or processor, each Compatible object for layer of seeking unity of action is mapped to The M in API executor module subsets specified<N number of correspondence mirror image object.
Compatible object 306A-C can be created at layer 304 of seeking unity of action by API request.By API request and/or By creating one group of corresponding object is automatically created at the related API executor modules that Compatible object is triggered.All available Queue object is automatically created in API executor modules.In each subset of API executor modules for supporting asked operation Operation object is automatically created, for example, as discussed above with reference to figure 5.
Referring now to Fig. 3 B, Fig. 3 B are the propagation of description storage operation bind command according to some embodiments of the invention Schematic diagram.Bind command is automatically propagated to API the actuators 210A and 210B for supporting the operation from layer 304 of seeking unity of action.
When receiving binding memory API request, storage object can be created.For example, binding storage order is:Set Memory Object A as Arg 2 of Operation K.Binding storage order is included is tied to operation by storage object The instruction of object.One or more storage object 306A are automatically generated at layer 304 of seeking unity of action, and according to binding storage order It is tied to operation object 306B.
Bind command from layer 304 of seeking unity of action travel to support bind command include in operation (by arrow 330A-B Represent) API executor modules.Storage object 308A is created in each related API executor modules.Propagate bind command Automatic it can perform storage at each related API executor modules (using supporting that the rudimentary API of bind command is operated) place Device operation binding.
Referring now to Fig. 3 C, Fig. 3 C are the signals of the data structure of the storage object of Fig. 3 A according to some embodiments of the invention Figure.
Unified storage object 306A, which is present in, to seek unity of action in layer 304, as discussed with reference to Fig. 3 A.Actuator storage object 318A corresponds to, for example, API executor modules 210A storage object.Actuator storage object 318B corresponds to, for example, API Executor module 210B storage object.For example but it is intended to limitation, API executor modules 210A operationsAPI, API executor modules 210B operates OpenCLTM(open computing language) API.
Each depositing in one group of storage object comprising seek unity of action layer storage object and API executor module storage objects Storing up object includes common portion 320 and specific part 322.Common portion 320 is all identical for all storage objects, is respectively positioned on system At one execution level and each API executor modules.Specific part 322 is in the layer and each of seeking unity of action for each storage object Customized at API executor modules.General utility functions has both been unified in the design that general and specific part is provided, and API is provided again special Fixed rudimentary definition.
Public Value Types that each storage object member that common portion 320 defines this group of storage object has and/or Public function.
The function that offer is different at seek unity of action layer and API executor module layers of specific part 322.In layer of seeking unity of action Place, specific part includes the mapping to the available associated storage object of each API executor modules, such as by arrow 324A and API executor module storage objects 318A and 318B are mapped to unification by the pointer array that 324B is represented, arrow 324A and 324B Storage object 306A.At API executor module layers, specific part includes the specific additional datas of API.Specific part 322 is only One ground defines specific Value Types and/or API specific functions are called.
Balloon 326A and 326B depict the example of Memory Allocation order, and Memory Allocation order is not at different layers and There is different implementations between same API executor modules.Balloon 326A is unified storage object 306A distribution order Sample implementation, rudimentary Memory Allocation instruction column is called at each storage object of each API executor modules The high level instructions of table.Balloon 326B is the example implementation of API executor modules 210A memory module 318A distribution order Mode, performs the low-level devices and/or API particular commands of Memory Allocation order at corresponding device and/or processor.
In this way, common portion allows computer program to perform abstract advanced storage order (programmer can use), Rudimentary details is realized without clearly defined.Advanced storage order is automatically mapped to low-level command, with automatically selected Performed on target device.
Referring back to Fig. 1, alternatively, 102, available processing in performing environment 214 is alternatively recognized by processing unit 208 Device 212A-C.The processor recognized is alternatively dissimilar, with different ISA.The processor controlled using public ISA can be with Organize together, for example, organize in a device.
Alternatively, 104, a different set of API executor modules 210A-C is alternatively created by processing unit 208.Often Individual API executor modules 210A-C can include one group of operational order, and/or one group is used for one of processor and/or equipment Respective ISA storage object.
Alternatively, 106, iteration square frame 102 and block 104.The new API executor modules of grey iterative generation, removal are uncorrelated Old API executor modules and/or update existing API executor modules.Can initialization when, system start when, periodicity Ground and/or perform iteration when detecting performing environment and changing.Or, or in addition, at the beginning of in the operation of computer program During beginning event, such as according to the change (for example, different type and/or size of input data) of input, and/or according to production The change (time of such as result of calculation is unacceptable) of raw result creates a different set of API executor modules. Initialization event may trigger new executor module and more effectively be handled and computer program phase using different processors The change of pass.Initialization is according to available existing processing infrastructure generation API executor modules in performing environment during operation. API executor modules are generated according to the change of available processors, and change is such as addition new processor and/or removal processing Device.
Alternatively, processing unit 208 collects the runtime data of performing environment 214.Runtime data can be used for API One in executor module associated with one in processor, such as new association, remove association or change existing association.It is logical Cross addition or revocation is associated with API executor modules, can realize the change of performing environment automatically, for example, remove and handle hardware Or the new processing hardware of addition.
Alternatively, API executor modules are according to processor characteristic, such as command response time, the overall order execution time and Power consumption, it is associated with corresponding processor.The association can improve the performance of processor characteristic.Different API can operate tool There is the processor of different performance rank.By the processor characteristic for considering to be produced by the API matched, API executor modules are reflected Processor is mapped to improve systematic function.
Referring now to Fig. 4, Fig. 4 is according to some embodiments of the invention for recognizing the processor in performing environment and basis The flow chart of the method for the processor generation API executor modules recognized.Fig. 4 method can by Fig. 2 and/or Fig. 3 dress 200 execution are put, alternatively by seeking unity of action layer 304 and/or processing unit 208 is performed.
Alternatively, 402, retrieved from list of hard coded etc., load, automatically generate and/or provide manually from file The API actuator types that device 200 is supported.
404, the module that storage communicates in device 200 or with device 200 is searched alternately through by holding equipment, is come Performing environment is scanned with recognition processor.Processor is identified as supported equipment according to the API actuator types supported. The equipment supported can be stored in the supported equipment repository 406 communicated with device 200.
Entry 414 is stored in the example of the supported device entry identified in thesaurus 406.Entry 414 can be with Include common portion and actuator specific part.Common portion (is e.g., including supported in the high-level equipment that abstractively defines API).Actuator specific part defines the low-level features of equipment, to be operated by API executor modules.
Alternatively, 408, the equipment recognized is designated as can be used for processing API request.Or, such as using classification Device, or according to one group of rule, such as according to equipment availability, device efficiency threshold value, equipment use cost threshold value or other factorses, refer to The subset of fixed recognized equipment, such as with including actual available equipment and/or most suitable equipment.
It is that each equipment specified automatically creates API executor modules 410.Setting from thesaurus 406 can be used API executor modules are initialized for information.The API executor modules generated can be stored in actuator thesaurus 412.
Referring now to Fig. 5, Fig. 5 is the operation for being used to map the support of API executor modules according to some embodiments of the invention The flow chart of the method for (also referred herein as operational order).Fig. 5 method can be performed by Fig. 2 and/or Fig. 3 device 200, can Selection of land is by seeking unity of action layer 304 and/or processing unit 208 is performed.
Alternatively, 502, the API executor modules each generated include one group of operation storage, and (herein also referred to as operation is ordered Order set), operation storage be come for example from file, from the transmission of layer 304 of seeking unity of action hard coded, from remote server and from hand The list retrieval of dynamic generation.
This group of operational order corresponds to the API request received.
Alternatively, 504, each API executor modules are one by one loaded into the memory of device 200 parallel or successively In.API executor modules can be loaded from actuator thesaurus 506.
Alternatively, 508, its respective operation storage is loaded into dress by each API executor modules parallel or one by one Put in 200 memory.Operation storage can be obtained from public operation storage thesaurus 510.
Alternatively, 512, each operation storage of each API executor modules performs initialization procedure to attempt initially Change the operation in operation storage.The example of initialization includes:Compiling, resource allocation and Memory Allocation.Successful initialization and/or The operation of distribution is identified.It should be noted that the initialization of some operations may fail, (these operations are then excluded can With), for example, low memory, compiling resource be unavailable or other mistakes.
It is associated with corresponding API executor modules in the operation of 514, one groups of successful initializations and/or distribution, alternatively It is stored in actuator operation thesaurus 516.
Alternatively, the operational order in one group of operational order in operation storage is by indicating the signature of corresponding API request, Alternately through action name and/or operating parameter, it is associated and/or recognizes.Signature is represented there is provided the abstract of operational order Represent.Different rudimentary API can discriminatively realize that identical advanced signature is represented.API request can according to signature represent, For example using the source code write by (in one group of operational order) operational order in the grammer of signature definition, to define.
For each operation and/or each API executor module iteration square frames 508-514.
518, the available action to each API executor modules is compiled, for example, organize and/or summarize.Processing is single Member 208 and/or layer 304 of seeking unity of action may have access to available operation, to determine how the API request received is mapped into API Executor module.
Table 520 is the example of the data structure for the available action for storing each API actuator types.Row 522 are listed can With the signature of operation.For example, the sequence of API executor modules A, B and C support floating number and integer, and API executor modules C The multiplying of integer is supported with D.
Referring now to Fig. 6, Fig. 6 is computer journey of the senior API request of description according to some embodiments of the invention from execution Schematic diagram of the sequence to the mapping of the rudimentary API instructions for being performed on target device and/or processor.
Block in row 602 depicts the mapping of the API request using the establishment storage object sent.API request is mapped to The unified storage object sought unity of action at layer, the storage object is mapped to two storages at available API configuration processors module Object.Each API executor modules operate target device and/or processor using different rudimentary API.
Block in row 604 depicts the similar mapping for the API request for creating queue.
Block in row 606 depicts the similar mapping for the API request for creating operational order.It is worth noting that, uniformly holding Unified operation object map at row layer such as can use behaviour to the API executor modules for being able to carry out asked operation with limiting The associated low-level operation storage of each API executor modules of the low-level command of work is defined.
Referring back to Fig. 1 is referred to, 108, the interface 202 of device 200 receives one or a series of API request 204.
The computer program 206 (for example, application) that API request can be performed from performing environment 214 is received.It is optional Ground, processing unit 208 manages the sequence in one or more layer queue object 306C that seek unity of action.Sequence can include being placed on Unified operation (coming from available unified operation object) in unified queue object.
Alternatively, the sequence received is divided into multiple queues by processing unit 208.Or, or in addition, each queue Ending is defined by API request.
The different API executor module that each queue is mapped in the API executor modules of one group of generation And handled by the API executor modules.It can be reflected according to the API executor modules for supporting the operational order in sequence to perform Penetrate.The API executor modules for all operations being able to carry out in each queue can be recognized (so as to complete complete in queue Set order).Specify the API executor modules for handling queue can be alternatively according to decision-making module, such as grader or one Group rule, is selected from the set (or from full set) recognized.Decision-making module can for example can according to the object in equipment With property (for example, in terms of memory availability and/or queue space), processing cost, statistical information and/or other degree during operation Measure to make decision.
The selection of API executor modules for handling each queue improves systematic function, because different API is performed Device module can handle the different queue with different performance rank.The best API of performance can be selected to perform for each queue Device module.The automatic dividing sequence of device, and different piece for sequence specifies and uses different API executor modules, and not Necessarily programmer's dividing sequence.
110, target API executor modules perform the queue of API request sequence.API executor modules are by queue Command mapping is into the instruction based on rudimentary API, and these instructions are operated on correspondence target processor.From Compatible object to local The mapping of API executor module objects allows optionally to be mapped to high-level abstractions API request into different rudimentary API instructions.
Processing unit 208 is performed using each group operational order of each in the one group of API specified executor module Each subsequence.If applicable, subsequence can perform according to priority and/or out of turn concurrently, successively.Alternatively, For example sub-sequence includes senior memory command, is held according to storage object of the seeking unity of action API corresponding to each for layer of seeking unity of action Association between one group of storage object of row device module, to instruct to perform each height at each corresponding API executor module Sequence.Besides or furthermore, when for example sub-sequence includes higher level operation order, according to the operation life of seeking unity of action for layer of seeking unity of action The association between one group of operational order of API executor modules corresponding with each is made to instruct to perform.Bind command can root Performed according to the binding propagated between operation and storage object, it is as described herein.
In each one group of storage object in storage object of the seeking unity of action API executor modules different from the group Member is associated.Each one group of operation in each operational order API executor modules different from the group of seeking unity of action Member in order is associated.
The corresponding set of operational order of each and/or one group of storage in the described one group API executor modules specified Object is used to indicate one in different processor each subsequence to perform API request sequence.API executor modules are with right Answer API grammer to generate low-level command, low-level command is used to indicate corresponding processor.
Referring now to Fig. 7, Fig. 7 is to describe to be used for the sequence mapping of the API request of reception to for indicating in performing environment Target device on execution rudimentary API module and/or object between data flow example schematic diagram, such as this paper institutes State.
702, the computer program (for example, application program) performed in performing environment to one of layer of seeking unity of action or Multiple unified queues provide API request sequence.API request includes the sequence of operations organized according to subsequence.Each subsequence Ending can automatically be defined by application definition, and/or by layer of seeking unity of action.
704, scheduling can be applied the tag to the operation in unified queue to recognize subsequence by layer of seeking unity of action.
As described herein, each subsequence is mapped to one of API executor modules by layer of seeking unity of action.
It is as described herein, the unified storage object for the operation being tied in subsequence is mapped or copied to the API specified The storage object of executor module.
706, the sheet of (for example, duplication) to corresponding specified API executor modules is provided by the subsequence of each mapping Ground actuator queue.The data of unified storage object from unified layer can be provided (for example, duplication) to each corresponding Specify the storage object of API executor modules.
708, the operation that the API executor modules specified are associated using it stores to perform the behaviour in actuator queue Make.API executor modules can be stored with call operation, so as to transmit operation and any storage pair bound in local queue As.
710, the operation code being stored in local operation storage accesses API runtime environments to perform on relevant device Code.The instruction of rudimentary API definition is generated and performed in equipment.
To the description of each of the invention embodiment for illustrative purposes only, and these descriptions are not intended as exhaustive or limit In the disclosed embodiments.In the case where not departing from the scope and spirit of described embodiment, those skilled in the art can To be clearly understood that many modifications and variations.The technology that can be found compared in the market, selects term used herein can be best The principle, practical application or technological progress of the present embodiment are explained, or others skilled in the art is understood reality disclosed herein Apply example.
It is contemplated that since the life cycle of the patent moved to maturity the application, it will develop many associative processors And API, the scope of term intermediate representation, processor and API is intended to include all such priori new technologies.From During the ripe patent of the application, many associative processors and API will be developed, and term processor and API Scope is intended to all these new technologies being included.
Term " about " used herein refers to ± 10%.
Term " comprising " and " having " expression " including but is not limited to ".This term include term " by ... constitute " with And " substantially by ... constitute ".
Phrase " substantially by ... constitute " refers to that construction or method may include extra material and/or step, but premise It is extra material and/or step will not substantially change construction claimed or the basic and novel feature of method.
Unless the context clearly indicates otherwise, singulative " one " used herein and " described " contain including plural number Justice.For example, term " compound " or " at least one compound " can include multiple compounds, including its mixture.
Word " exemplary " expression " being used as an example, example or explanation " used herein.It is any " exemplary " real Apply example and be not necessarily to be construed as prior to or be superior to other embodiments, and/or be not precluded from the combination of other embodiments feature.
Word " alternatively " expression used herein " is provided and not carried in other embodiments in certain embodiments For ".The embodiment of any specific of the present invention can include multiple " optional " features, unless these features are conflicting.
In whole present application, various embodiments of the present invention can be presented with range format.It should be understood that range format Description is not construed as the fixed limitation to the scope of the invention only for convenient and for purpose of brevity.Therefore, to scope Description be considered as having disclosed particularly the individual number in all possible subrange and the scope.For example, To for example being considered as having disclosed particularly subrange from the description of 1 to 6 scope, such as from 1 to 3, from 1 to 4, from 1 To 5, from 2 to 4, from 2 to 6, from the individual digital in 3 to 6 etc., and the scope, such as 1,2,3,4,5 and 6.Not scope tube Width how, this is all suitable for.
When it is noted that during a digital scope, representing to include any cited number in the range of this pointed out Word (fraction or integer).Phrase " within the scope of the number indicated by the number indicated by first and second " and " from first Within the scope of the indicated number counted to indicated by second " and used interchangeably herein, expression includes first and second institute The number of instruction and all therebetween fraction and integer.
Single embodiment can also provide the combination of some features, and these features have brief in each embodiment text Description.On the contrary, each feature of the present invention has brief description in the text of single embodiment, this can also be provided respectively A little feature or any suitable sub-portfolios are used as any suitable other embodiments of the present invention.It is not construed as each Some features described in the text of embodiment are the essential features of these embodiments, unless there are no these elements, the embodiment It is invalid.
Herein, all publications, patent and the patent specification referred in this specification is all by quoting this specification knot Close in this manual, equally, each individually publication, patent or patent specification are also specific and individually combine herein. In addition, to the reference or identification of any reference of the application can not as be allow it is such with reference in the prior art prior to The present invention.With regard to using for section header, section header should not be understood as to necessary restriction.

Claims (14)

1. one kind is used for the dress for handling API (application programming interface, API) request Put, it is characterised in that including:
Interface, for receiving the API request;And
Processing unit, is used for:
Multiple processors of the identification with different instruction set framework (instruction set architecture, ISA);
Operate a different set of API executor modules;And
At least one API executor module is controlled, to be asked at least one processor in the processor based on the API Ask to perform order.
2. device according to claim 1, it is characterised in that:
Each API executor modules are included with least one object in the following group:Storage object, operation object, queue object, and At least one described object is used for predefined ISA;And
The processing unit is used to control at least one described API executor module, with based at least one object it is described extremely The order is performed on a few processor.
3. the device according to any one of preceding claims, it is characterised in that including unified layer, the unified layer bag Include with least one Compatible object in the following group:Unified storage object, unified operation object, unified queue object, and the system One object is then used for the API request.
4. the device according to any one of claim 2-3, it is characterised in that the processing unit is used for one group of behaviour An operational order in ordering is associated to the signature of the corresponding API request of instruction.
5. the device according to any one of claim 2-4, it is characterised in that each storage in one group of storage object Object includes:
Common portion, defines each member of one group of storage object publicly-owned public Value Types and public function;And
Specific part, uniquely defines at least one specific Value Types and at least one API specific function is called.
6. the device according to any one of claim 2-5, it is characterised in that the processing unit is used for using described The corresponding set of operational order of each in one group of API executor module, based on it is multiple seek unity of action storage object with it is described Association between one group of storage object and it is multiple seek unity of action between operational order and one group of operational order associate, come Perform another in multiple subsequences.
7. device according to claim 6, it is characterised in that each storage object of seeking unity of action with described one group not A member in the one group of storage object of each in same API executor modules is associated, and each described In the one group of operational order of each in operational order of seeking unity of action and a different set of API executor modules One member is associated.
8. the device according to any one of preceding claims, it is characterised in that the processing unit is used for:
The runtime data of the performing environment is collected,
To use the runtime data by one in the API executor modules and a phase in the multiple processor Association.
9. the device according to any one of preceding claims, it is characterised in that the processing unit is used for:
According to the processor characteristic selected from order response time, overall order execution time and power consumption, by the API One in executor module associated with one in the multiple processor.
10. the device according to any one of preceding claims, it is characterised in that the processing unit is used for sequence It is divided into multiple queues;Each in the multiple queue by one in a different set of API executor modules not Same API executor modules processing.
11. the device according to any one of preceding claims, it is characterised in that the processing unit is used for described The a different set of API executor modules are created during the operation of application at initialization event.
12. the device according to any one of preceding claims, it is characterised in that the processing unit is used to manage extremely Sequence in a few layer queue of seeking unity of action.
13. one kind is used to handle API (application programming interface, API) request Method, it is characterised in that methods described is used to operate the device according to one of preceding claims.
14. a kind of computer program, it is characterised in that the computer program runs preceding method when performing on computers.
CN201580076832.4A 2015-02-27 2015-02-27 System and method for heterogeneous computing Application Programming Interface (API) Active CN107250985B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2015/054130 WO2016134784A1 (en) 2015-02-27 2015-02-27 Systems and methods for heterogeneous computing application programming interfaces (api)

Publications (2)

Publication Number Publication Date
CN107250985A true CN107250985A (en) 2017-10-13
CN107250985B CN107250985B (en) 2020-10-16

Family

ID=52598745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201580076832.4A Active CN107250985B (en) 2015-02-27 2015-02-27 System and method for heterogeneous computing Application Programming Interface (API)

Country Status (2)

Country Link
CN (1) CN107250985B (en)
WO (1) WO2016134784A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244507A1 (en) * 2007-03-30 2008-10-02 Microsoft Corporation Homogeneous Programming For Heterogeneous Multiprocessor Systems
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN101923492A (en) * 2010-08-11 2010-12-22 上海交通大学 Method for executing dynamic allocation command on embedded heterogeneous multi-core
US20140089905A1 (en) * 2012-09-27 2014-03-27 William Allen Hux Enabling polymorphic objects across devices in a heterogeneous platform
CN103858099A (en) * 2011-08-02 2014-06-11 国际商业机器公司 Technique for compiling and running high-level programs on heterogeneous computers
US20140281457A1 (en) * 2013-03-15 2014-09-18 Elierzer Weissmann Method for booting a heterogeneous system and presenting a symmetric core view

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244507A1 (en) * 2007-03-30 2008-10-02 Microsoft Corporation Homogeneous Programming For Heterogeneous Multiprocessor Systems
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN101923492A (en) * 2010-08-11 2010-12-22 上海交通大学 Method for executing dynamic allocation command on embedded heterogeneous multi-core
CN103858099A (en) * 2011-08-02 2014-06-11 国际商业机器公司 Technique for compiling and running high-level programs on heterogeneous computers
US20140089905A1 (en) * 2012-09-27 2014-03-27 William Allen Hux Enabling polymorphic objects across devices in a heterogeneous platform
US20140281457A1 (en) * 2013-03-15 2014-09-18 Elierzer Weissmann Method for booting a heterogeneous system and presenting a symmetric core view

Also Published As

Publication number Publication date
CN107250985B (en) 2020-10-16
WO2016134784A1 (en) 2016-09-01

Similar Documents

Publication Publication Date Title
JP6997285B2 (en) Multipurpose parallel processing architecture
WO2021114530A1 (en) Hardware platform specific operator fusion in machine learning
Abadi et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems
CN103858099B (en) The method and system applied for execution, the circuit with machine instruction
CN107924323B (en) Dependency-based container deployment
EP3614260A1 (en) Task parallel processing method, apparatus and system, storage medium and computer device
US9424079B2 (en) Iteration support in a heterogeneous dataflow engine
KR102253628B1 (en) Combining states of multiple threads in a multi-threaded processor
JP2008535074A (en) Creating instruction groups in processors with multiple issue ports
US20190130270A1 (en) Tensor manipulation within a reconfigurable fabric using pointers
US20210294960A1 (en) Systems and methods for intelligently buffer tracking for optimized dataflow within an integrated circuit architecture
Syriani et al. Modeling a model transformation language
US11567778B2 (en) Neural network operation reordering for parallel execution
US7530063B2 (en) Method and system for code modification based on cache structure
WO2020169182A1 (en) Method and apparatus for allocating tasks
US20190121678A1 (en) Parallel computing
CN107250985A (en) For Heterogeneous Computing API(API)System and method
US11372677B1 (en) Efficient scheduling of load instructions
US11704562B1 (en) Architecture for virtual instructions
EP3971787A1 (en) Spatial tiling of compute arrays with shared control
US11709783B1 (en) Tensor data distribution using grid direct-memory access (DMA) controller
Lindemann et al. Intelligent strategies for structuring products
CN115516435A (en) Optimized arrangement of data structures in hybrid memory-based inferential computing platforms
Jiang et al. A Task Parallelism Runtime Solution for Deep Learning Applications using MPSoC on Edge Devices
US11809981B1 (en) Performing hardware operator fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant