CN107250985A - For Heterogeneous Computing API(API)System and method - Google Patents
For Heterogeneous Computing API(API)System and method Download PDFInfo
- Publication number
- CN107250985A CN107250985A CN201580076832.4A CN201580076832A CN107250985A CN 107250985 A CN107250985 A CN 107250985A CN 201580076832 A CN201580076832 A CN 201580076832A CN 107250985 A CN107250985 A CN 107250985A
- Authority
- CN
- China
- Prior art keywords
- api
- group
- processor
- modules
- processing unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/541—Interprogram communication via adapters, e.g. between incompatible applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
Abstract
The invention provides a kind of device for being used to handle API (application programming interface, API) request, including:Interface for receiving the API request;And processing unit, it is used for:Multiple processors of the identification with different instruction set framework (instruction set architectures, ISA);Operate a different set of API executor modules;And at least one API executor module is controlled, to perform order based on the API request at least one processor in the processor.
Description
Background technology
The present invention is related to Heterogeneous Computing in some of embodiment, more specifically and non-uniquely, is related to for Heterogeneous Computing
The system and method for API (application programming interface, API).
Heterogeneous computing system comprising multiple different processors can be programmed by special API.Different supplies
Business provides different API, and each API may not in terms of language syntax and/or for the available action that performs on a processor
Together.Single supplier can provide same API different editions, for performing on different processing hardware.
Each API may be different in terms of model, memory model, language syntax and compilation model is performed.For example, holding
Row model can include synchronous or asynchronous queue.Performing model can support or not support event.Memory model can be
Distributed or local.Transmission can be nontransparent or transparent.It can support or not support to map and cancel mapping behaviour
Make.Store function can be based on pointer or based on a kind of different model.API language is possible different with structure in grammer.
Compilation model can include single online and offline compiling, or including combination compiling.
Program is write for heterogeneous computing system more complicated, may be absorbed in programmer and be operated using single API.It is single
Individual API may not provide all required abilities or behavior, and may not support the kernel of various versions.Therefore, programmer
Modification API may be needed or other API are used.
The speed being programmed using multiple API is slow, error-prone, expensive and needs the high level training of programming personnel
Instruction, knowledge and skills.Seek the programming to heterogeneous system to be improved.
The content of the invention
The purpose of the present invention is to improve the processing of API.
Above and other target is realized by the feature of independent claims.Other embodiment will from appurtenance
Ask, it is apparent in description content and accompanying drawing.
According in a first aspect, a kind of be used to handle API (application programming
Interface, API) request device include:Interface for receiving the API request;And processing unit, it is used for:Identification
Multiple processors with different instruction set framework (instruction set architectures, ISA);Operate one group not
Same API executor modules;And at least one API executor module is controlled, at least one in the processor
Order is performed based on the API request on reason device.
Described device is by automatically selecting API executor modules, rather than requires that application developer is entered using multiple API
Row operation, it is negative come eliminate that the different API of use from mankind application developer are programmed to different isomerization processing equipment
Load.Different API can provide different functions, sometimes with different grammers so that be programmed using multiple API very tired
It is difficult, easily malfunction and take.It is (rather than multiple different using single unified environment that described device allows programmer to be absorbed in
API) operated, then the single unified environment is mapped to available API executor modules by described device.In single unification
In framework, programmer has by the different API a variety of different abilities provided or behavior (for example, supporting the interior of various versions
Core).
According to described in a first aspect, in the first possible form of implementation of described device, each API executor modules include
With at least one object in the following group:Storage object, operation object, queue object, and at least one described object is used to make a reservation for
The ISA of justice;The processing unit is used to control at least one described API executor module, with based at least one object in institute
State and the order is performed at least one processor.
Described device distributes different objects automatically, is mapped between different objects and API executor modules, and lead to
API module is crossed using the API special objects of different lower levels to perform operation to high-level command.
According to first aspect as described above or according to foregoing any form of implementation of the first aspect, in described device
In second possible form of implementation, described device includes unified layer, and the unified layer is included with least one unification pair in the following group
As:Unified storage object, unified operation object, unified queue object, and the Compatible object is then used for the API request.
Described device creates the higher level of abstraction of storage object, operation object and queue object.By using the rudimentary of API
Language, described device from the different rudimentary API of trend be embodied as programmer provide single face higher level operation object (for example,
Sequence, filtering, addition).
May form of implementation according to the foregoing first or second form of implementation of the first aspect, the in described device the 3rd
In, the processing unit is used for an operational order in one group of operational order is related to the signature for indicating corresponding API request
Connection.
Signature represents the abstract representation there is provided operational order.Signature is represented can be by different rudimentary API discriminatively
Realize.
According to any form of implementation in foregoing first, second or third form of implementation of the first aspect, described
In the possible form of implementation of the 4th of device, each storage object in one group of storage object includes:Common portion, defines described one
Public Value Types and public function that each member in group storage object has;And specific part, uniquely define at least
One specific Value Types and at least one API specific function are called.
General utility functions has both been unified in the design that general and specific part is provided, and the specific rudimentary definition of API is provided again.It is logical
Allow computer program to perform abstract advanced storage order with part, details is realized without rudimentary.Advanced storage order is certainly
It is dynamic to be mapped to low-level command, to be performed on some target device.
According to any form of implementation of foregoing the first, second, third of the first aspect or the 4th in form of implementation,
In the possible form of implementation of the 5th of described device, the processing unit is used for using every in one group of API executor module
The corresponding set of operational order of one, based on it is multiple seek unity of action between storage object and one group of storage object associate with
And it is multiple seek unity of action between operational order and one group of operational order associate, it is another in multiple subsequences to perform
It is individual.
Mapping from Compatible object to local API executor modules object allows high-level abstractions API request optionally
It is mapped to different rudimentary API instructions.
According to foregoing 5th form of implementation of the first aspect, the in described device the 6th may be in form of implementation, often
Individual seek unity of action storage object and the one group of storage pair of each in a different set of API executor modules
A member as in is associated, and each operational order and a different set of API actuators mould of seeking unity of action
A member in the one group of operational order of each in block is associated.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device
In 7th possible form of implementation, the processing unit is used for:The runtime data of the performing environment is collected, to use the fortune
Data are associated with one in the multiple processor by one in the API executor modules during row.
Associated by adding or cancelling with API executor modules, the change of performing environment can be realized automatically, for example, move
Except processing hardware or the new processing hardware of addition.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device
In 8th possible form of implementation, the processing unit is used for:According to from order response time, overall order execution time and power consumption
One processor characteristic of middle selection, by one in the API executor modules and a phase in the multiple processor
Association.
Different API can operate the processor with different performance rank.By considering what is produced by the API matched
Device characteristic is managed, API executor modules are mapped to processor to improve systematic function.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device
The 9th may be in form of implementation, the processing unit is used to sequence being divided into multiple queues;It is every in the multiple queue
One is handled by the different API executor modules of one in a different set of API executor modules.
For handling=selections of the API executor modules of each queue improves systematic function, because different API is held
Row device module can handle the different queue with different performance rank.The API that for each queue performance can be selected best is held
Row device module.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device
The tenth may be in form of implementation, the processing unit is used to create described one at initialization event in the operation of the application
The different API executor modules of group.
Initialization is according to available existing processing infrastructure generation API executor modules in performing environment during operation.API
Executor module is generated according to the change of available processors, and change is such as addition new processor and/or removal processor.Just
Beginning event may trigger new executor module and more effectively handle related to computer program using different processors
Change.
According to first aspect as described above or according to any foregoing embodiments of the first aspect, in described device
In 11st possible form of implementation, the processing unit is used to manage the sequence at least one layer queue of seeking unity of action.
The automatic dividing sequence of described device, and for sequence different piece specify and use different API executor modules,
Without programmer's dividing sequence.
It may be used to handle API (application there is provided one kind in form of implementation the 12nd
Programming interface, API) request method, methods described be used for according to first aspect as described above or according to
The device of any foregoing embodiments of the first aspect is operated.
There is provided a kind of computer program in the 13rd possible form of implementation, the computer program is on computers
Preceding method is run during execution.
Unless otherwise defined, otherwise all technical terms and/or scientific terminology used herein are respectively provided with institute of the present invention
The equivalent that the one of ordinary skill in the art being related to is commonly understood by.Although similar or identical to it is described herein that
A little methods and material can be used in the practice of embodiments of the invention or in test, but illustrative methods are described below
And/or material.In case of conflict, it is defined by the patent specification including definition.In addition, material, method and example are only
Only it is exemplary, it is no intended to limited with being necessary.
Brief description of the drawings
Herein only as an example, being described in conjunction with the accompanying some embodiments of the present invention.Specifically now with reference to accompanying drawing, it is necessary to
It is emphasised that shown project is as an example, in order to illustratively discuss embodiments of the invention.So, illustrate with reference to the accompanying drawings,
The embodiment of the present invention how is put into practice to will be apparent to those skilled in the art.
In the accompanying drawings:
Fig. 1 is the flow chart of the method for processing API request according to some embodiments of the invention;
Fig. 2 is the block component diagram of the system of the device for including processing API request according to some embodiments of the invention;
Fig. 3 A are additional optional module and/or object structure in the device of Fig. 2 according to some embodiments of the invention
Block diagram;
Fig. 3 B are the schematic diagrames of the propagation of description storage operation bind command according to some embodiments of the invention;
Fig. 3 C are the schematic diagrames of the structure of storage object according to some embodiments of the invention;
Fig. 4 is according to some embodiments of the invention for recognizing the processor in performing environment and according to the place recognized
Manage the flow chart of the method for device generation API executor modules;
Fig. 5 is the stream for being used to map the method for the operation of API executor modules support according to some embodiments of the invention
Cheng Tu;
Fig. 6 be according to some embodiments of the invention describe API request from the computer program of execution to in target
The schematic diagram of the mapping of the rudimentary API instructions performed in equipment and/or processor;And
Fig. 7 is to describe to perform on the target device in performing environment based on the API request of reception is mapped to
Rudimentary API, module described herein and/or object between data flow schematic diagram, it is as described herein.
Embodiment
The present invention is related to Heterogeneous Computing in some of embodiment, more specifically and uniquely, and being related to should for Heterogeneous Computing
With DLL (application programming interface, API) system and method.
The one side of some embodiments of the invention is related to a kind of device, the device management be used for it is unified from by different many
The layer of the communication of multiple different processors of individual API (each API controls a different processor) control.The device connects automatically
The API request sequence of receipts program, and by the advanced procedures (or part thereof, the partitioning portion of such as program) based on advanced interface
The subsequence of API request be mapped to different rudimentary API.Each corresponding rudimentary API operation one or more processors with
Perform the program (or its partitioning portion) of mapping.So, advanced procedures including be organized as heterogeneous computing system it is multiple not
With being performed automatically in the performing environment of processor.Advanced procedures need not include specify for execution specific rudimentary API and/or
The low-level instructions of par-ticular processor.The API performed can be selected to improve the systematic function for configuration processor, because different
API and/or different processors can perform identical program in different performance classes.
Alternatively, device controls one or more API executor modules, with based on senior API request (by program or its portion
Distribution cloth) one or more low-level commands are performed on the one or more processors, these processors can be alternatively according to finger
Order collection framework (instruction set architecture, ISA) is organized as distinct device.Each senior API request can be with
It is mapped to one in one group of multiple rudimentary API executor module.Each API executor modules are optionally based on public ISA
To operate some processor in this group of available processors.Alternatively, processor is dissimilar, forms heterogeneous system.
API request can be provided, performed by one or more processors by computer program, by one or more rudimentary
The senior request of API operations.Device specifies the specific API executor modules for performing API request automatically.So, program can
To be write using senior API request, the processor of isomery performing environment is operated without defining rudimentary API executor modules,
And/or which part of which computing device program do not defined.Device can select API executor modules senior to improve
The execution performance of API request.
Alternatively, the different API executor modules of the group are based on the existing processor in performing environment, for example, hold detecting
The processor availability of row environment change run time during, automatically generated by device., can be with according to ISA difference
For the different API executor modules of different (single or multiple) generations of processor.
It should be noted that device described herein can be implemented as (in hardware and/or software) program module, system,
Method and/or computer program product.
Before explaining at least one embodiment of the invention in detail, it will be appreciated that the present invention in its application not necessarily
The construction and arrangement for being limited to component that is illustrated by following description and/or being shown in accompanying drawing and/or example and/or method are thin
Section.The present invention can be to be realized or be practiced or carried out in a variety of ways by its embodiment.
The present invention can be a kind of system, a kind of method and/or a kind of computer program product.Computer program product can
Including a kind of computer-readable recording medium, computer-readable recording medium has in computer-readable program instructions thereon,
For making computing device each aspect of the present invention.
Computer-readable recording medium can be tangible device, and tangible device can retain with store instruction so that instruction is performed
Equipment is used.Computer-readable recording medium can be, such as, but not limited to electronic storage device, magnetic storage apparatus, optics
Storage device, electromagnetism storage device, the random suitable combination of semiconductor memory apparatus or aforementioned device.
Computer-readable program instructions described herein can be downloaded to from computer-readable recording medium corresponding calculating/
Processing equipment or by network, such as internet, LAN, wide area network and/or wireless network download to outer computer or outer
Portion's storage device.
Computer-readable program instructions can be as independent software package all on the computer of user, partly in the meter of user
Performed on calculation machine, and part is performed on the computer of user and part is performed on the remote computer, or all long-range
Performed on computer or server.In latter scene, remote computer can pass through any type of network connection to user
Computer, the networks of these types includes LAN (local area network, LAN) or wide area network (wide area
Network, WAN), or (for example can pass through internet using ISP) and be connected to outer computer.
In some embodiments, including PLD, field programmable gate array (field-programmable gate
Arrays, FPGA) or the electronic circuit of programmable logic array (programmable logic arrays, PLA) etc. can pass through
Referred to using the status information of computer-readable program instructions with carrying out personalization to electronic circuit to perform computer-readable program
Order, to perform each aspect of the present invention.
Herein, each aspect of the present invention refers to method according to embodiments of the present invention, device (system) and computer
The flow chart illustration and/or block diagram of program product are described.It will be understood that, each square frame in flow chart illustration and/or block diagram
And the combination of the square frame in flow chart illustration and/or block diagram can be realized by computer-readable program instructions.
Flow chart and block diagram in figure show system according to various embodiments of the present invention, method and computer program product
Possible embodiment framework, function and operation.In this regard, each square frame in flow chart or block diagram can represent mould
Block, fragment or part are instructed, and part instruction includes the one or more executable instructions for being used to realize specified logic function.
In some alternative embodiments, the function of being mentioned in square frame can not occur according to the order mentioned in figure.For example, showing in succession
Two square frames in fact can substantially simultaneously perform, or these square frames can be performed with reverse order sometimes, and this is depended on
Involved function.It should also be noted that each square frame and block diagram and/or flow chart figure of block diagram and/or flow chart illustration
The combination of square frame in showing can specify the special hardware based system of function or action by performing or perform specialized hardware
Realized with computer instruction combination.
Referring now to Figure 1, Fig. 1 is according to some embodiments of the invention for controlling API executor modules with one
Or based on (alternatively, computer program) the senior exectorial method of API request on multiple processors.Referring also to Fig. 2, figure
2 be the block component diagram of system, and system is supported to allow programmer rudimentary without considering with high level language source code by following
Processor is realized:Automatically select API executor modules and be mapped to advanced procedures (it includes senior API request) selected
API executor modules are so as to by the rudimentary execution of the processor of performing environment progress.The method of claim 1 can by Fig. 2 dress
Put and/or system is performed.
Device based on senior API request by automatically selecting API executor modules, rather than requires that application developer makes
Operated, different isomerization processing equipment is entered with multiple API to eliminate the use different API from mankind application developer
The burden of row programming.Different API can provide different functions, sometimes with different grammers so that directly using multiple
API (opposite with API request described herein) is programmed highly difficult, easy error and taken.Device allows programmer
It is absorbed in using senior API request and is operated using single unified environment (rather than multiple different API), then device will
It is mapped to available API executor modules.In single Unified frame, programmer has by a variety of of different API offers
Different abilities or behavior (for example, supporting the kernel of various versions).
Device 200 includes interface 202 to receive API request 204.API request 204 can be regard as advanced procedures 206
A part for source code and comprising for example as storehouse, and/or being used as the subprogram being integrated in high-level language.Computer program
Can be complete computer program, a part for computer program and/or single algorithm.Computer program can be senior source
Code format, the low level code form being suitably executed or precompile code.Program 206 can be performed in performing environment 214.
Alternatively, source code is write with Domain Specific Language (domain specific language, DSL).DSL can be with
There is provided than other programming languages, the problem of such as low level programming language and/or non-expert design are to handle with DSL identicals domain
Programming language, the data type of higher level is abstract and/or abstract data type more widely uses.DSL can be advance
The available DSL or the DSL of customized development existed.
Device 200 includes processing unit 208, and processing unit 208 operates a different set of API executor modules 210A-C,
It is as described herein.Processing unit 208 controls one or more API executor modules 210A-C to perform ring based on API request
Order is performed on the one or more processors 212A-C in border 214, it is as described herein.It is noted that the quantity of executor module
2 can be more than with the quantity of processor, selection quantity 2 is for simple, clear and explanation purpose.
Processor 210A-C can be different, alternatively, be operated using different ISA.Processor 210A-C can be with
With different architecture designs, such as central processing unit (central processing unit, CPU), graphics processing unit
(graphical processing unit, GPU), the processor for being connected with other units, and/or specialized hardware accelerate
Device (for example, encoder, decoder and crypto-coprocessor).Herein, one or more processors and associated storage
Device is sometimes referred to as equipment.Each equipment can include the multiple processors and associative storage according to public ISA operation.This paper institutes
The term processor and equipment used can be exchanged sometimes.
Each API executor modules are mapped to (or the single group processing operated using identical ISA of a processor
Device or an equipment), it is as described herein.Each API executor modules are specific to a kind of API types.Each API executor modules
The processor for the API types for supporting executor module can be mapped to.
With reference now to Fig. 3 A, Fig. 3 A are the additional optional modules in the device 200 of Fig. 2 according to some embodiments of the invention
And/or the block diagram of object structure.Add-on module provides unified interface for programmer's program of writing, and according to Compatible object and API
The unified interface is mapped to API executor modules by the mapping between executor module object.Seek unity of action layer 304 and API is held
Row device module 210A-C provides layered framework, the operation for performing API request on various Heterogeneous Computing API.Object is provided
High-level abstractions by the API request sent by computer program to be mapped to rudimentary API, to be held in the processor of performing environment
OK.
Program 206, it includes the API request alternatively defined according to unify API 302, by seeking unity of action for device 200
Layer 304 is received.Unified layer 304 includes one or more Compatible objects:Unified storage object 306A, unified operation object 306B and
Unified queue object 306C.Each Compatible object is adapted to API request, such as order according to included by API request generate, and/
Or it is associated with API request.
Each API executor modules 210A-C of equipment 200 includes being mapped to the corresponding objects for layer 304 of seeking unity of action
One or more of 306A-C following object 308A-C:Storage object 308A, operation object 308B and queue object 308C.
Each object 308A-C is defined according to corresponding API executor modules 210A-C, such as according to corresponding with API executor modules
API types include low-level instructions.Alternatively, each object is carried out according to predefined ISA corresponding with API executor modules
Adjustment, the operation for example defined according to ISA is generated, and/or with the rudimentary definition according to ISA.
Storage object at seek unity of action layer and API executor modules is for example in each equipment in target execution environment
In (that is, associated with one or more processors), the high-level abstractions of available memory.Storage object describe data format and
Type, such as 10000 floating-point number vectors, and include the matrix of 50x50 floating number.Abstract deposit is performed using storage object
Reservoir is managed, for example, the release of Memory Allocation, memory and garbage collected.
Operation object at seek unity of action layer and API executor modules is in target execution environment, such as each
In equipment, the high-level abstractions of the program code (that is, API request) of operation.Operation object defines a specific function, for example, arrange
Sequence or convolution.Abstract code administration detail is performed using operation object, for example, compile, perform and optimize.
Queue object at seek unity of action layer and API executor modules is target execution environment, such as in each equipment,
Operation scheduling high-level abstractions.Abstract operation perform performed using queue object, for example either synchronously or asynchronously perform, coordinate and according to
Rely.
Each API executor modules 210A-C is based on 1:1 maps to control corresponding equipment 312A-C.Each equipment
312A-C includes alternatively using the one or more processors of public ISA operation.For example, equipment 312A is including one or more
CPU 314A, equipment 312B include one or more GPU 314B, and equipment 312C includes one or more field-programmable gate arrays
Arrange (field programmable gate array, FPGA) 314C.Equipment 312A-C combines to form isomery performing environment.
The control API executor modules 210A-C of processing unit 208 of device 200 is with the execution on relevant device 312A-C
(being received using unify API 302 from program 206) order.Using equipment 312A-C ISA, by related API 316A-C,
The API and/or the API of customization that for example supplier provides obtain control.Based on by controlling API executor modules 210A-C to define
Object 308A-C realize control.
Device creates the higher level of abstraction of storage object, operation object and queue object.By using API low level language
Speech, device is embodied as programmer from the different rudimentary API of trend and provides the higher level operation object of single face (for example, sequence, mistake
Filter, addition).
Device distributes different objects automatically, is mapped between different objects and API executor modules, and pass through
API module performs operation using the API special objects of different lower levels to high-level command.
Object is allocated, and is mapped seeking unity of action between layer and API executor modules, and for performing operation to data,
It is as described herein.Generally, the system with N number of equipment and/or processor, each Compatible object for layer of seeking unity of action is mapped to
The M in API executor module subsets specified<N number of correspondence mirror image object.
Compatible object 306A-C can be created at layer 304 of seeking unity of action by API request.By API request and/or
By creating one group of corresponding object is automatically created at the related API executor modules that Compatible object is triggered.All available
Queue object is automatically created in API executor modules.In each subset of API executor modules for supporting asked operation
Operation object is automatically created, for example, as discussed above with reference to figure 5.
Referring now to Fig. 3 B, Fig. 3 B are the propagation of description storage operation bind command according to some embodiments of the invention
Schematic diagram.Bind command is automatically propagated to API the actuators 210A and 210B for supporting the operation from layer 304 of seeking unity of action.
When receiving binding memory API request, storage object can be created.For example, binding storage order is:Set
Memory Object A as Arg 2 of Operation K.Binding storage order is included is tied to operation by storage object
The instruction of object.One or more storage object 306A are automatically generated at layer 304 of seeking unity of action, and according to binding storage order
It is tied to operation object 306B.
Bind command from layer 304 of seeking unity of action travel to support bind command include in operation (by arrow 330A-B
Represent) API executor modules.Storage object 308A is created in each related API executor modules.Propagate bind command
Automatic it can perform storage at each related API executor modules (using supporting that the rudimentary API of bind command is operated) place
Device operation binding.
Referring now to Fig. 3 C, Fig. 3 C are the signals of the data structure of the storage object of Fig. 3 A according to some embodiments of the invention
Figure.
Unified storage object 306A, which is present in, to seek unity of action in layer 304, as discussed with reference to Fig. 3 A.Actuator storage object
318A corresponds to, for example, API executor modules 210A storage object.Actuator storage object 318B corresponds to, for example, API
Executor module 210B storage object.For example but it is intended to limitation, API executor modules 210A operationsAPI,
API executor modules 210B operates OpenCLTM(open computing language) API.
Each depositing in one group of storage object comprising seek unity of action layer storage object and API executor module storage objects
Storing up object includes common portion 320 and specific part 322.Common portion 320 is all identical for all storage objects, is respectively positioned on system
At one execution level and each API executor modules.Specific part 322 is in the layer and each of seeking unity of action for each storage object
Customized at API executor modules.General utility functions has both been unified in the design that general and specific part is provided, and API is provided again special
Fixed rudimentary definition.
Public Value Types that each storage object member that common portion 320 defines this group of storage object has and/or
Public function.
The function that offer is different at seek unity of action layer and API executor module layers of specific part 322.In layer of seeking unity of action
Place, specific part includes the mapping to the available associated storage object of each API executor modules, such as by arrow 324A and
API executor module storage objects 318A and 318B are mapped to unification by the pointer array that 324B is represented, arrow 324A and 324B
Storage object 306A.At API executor module layers, specific part includes the specific additional datas of API.Specific part 322 is only
One ground defines specific Value Types and/or API specific functions are called.
Balloon 326A and 326B depict the example of Memory Allocation order, and Memory Allocation order is not at different layers and
There is different implementations between same API executor modules.Balloon 326A is unified storage object 306A distribution order
Sample implementation, rudimentary Memory Allocation instruction column is called at each storage object of each API executor modules
The high level instructions of table.Balloon 326B is the example implementation of API executor modules 210A memory module 318A distribution order
Mode, performs the low-level devices and/or API particular commands of Memory Allocation order at corresponding device and/or processor.
In this way, common portion allows computer program to perform abstract advanced storage order (programmer can use),
Rudimentary details is realized without clearly defined.Advanced storage order is automatically mapped to low-level command, with automatically selected
Performed on target device.
Referring back to Fig. 1, alternatively, 102, available processing in performing environment 214 is alternatively recognized by processing unit 208
Device 212A-C.The processor recognized is alternatively dissimilar, with different ISA.The processor controlled using public ISA can be with
Organize together, for example, organize in a device.
Alternatively, 104, a different set of API executor modules 210A-C is alternatively created by processing unit 208.Often
Individual API executor modules 210A-C can include one group of operational order, and/or one group is used for one of processor and/or equipment
Respective ISA storage object.
Alternatively, 106, iteration square frame 102 and block 104.The new API executor modules of grey iterative generation, removal are uncorrelated
Old API executor modules and/or update existing API executor modules.Can initialization when, system start when, periodicity
Ground and/or perform iteration when detecting performing environment and changing.Or, or in addition, at the beginning of in the operation of computer program
During beginning event, such as according to the change (for example, different type and/or size of input data) of input, and/or according to production
The change (time of such as result of calculation is unacceptable) of raw result creates a different set of API executor modules.
Initialization event may trigger new executor module and more effectively be handled and computer program phase using different processors
The change of pass.Initialization is according to available existing processing infrastructure generation API executor modules in performing environment during operation.
API executor modules are generated according to the change of available processors, and change is such as addition new processor and/or removal processing
Device.
Alternatively, processing unit 208 collects the runtime data of performing environment 214.Runtime data can be used for API
One in executor module associated with one in processor, such as new association, remove association or change existing association.It is logical
Cross addition or revocation is associated with API executor modules, can realize the change of performing environment automatically, for example, remove and handle hardware
Or the new processing hardware of addition.
Alternatively, API executor modules are according to processor characteristic, such as command response time, the overall order execution time and
Power consumption, it is associated with corresponding processor.The association can improve the performance of processor characteristic.Different API can operate tool
There is the processor of different performance rank.By the processor characteristic for considering to be produced by the API matched, API executor modules are reflected
Processor is mapped to improve systematic function.
Referring now to Fig. 4, Fig. 4 is according to some embodiments of the invention for recognizing the processor in performing environment and basis
The flow chart of the method for the processor generation API executor modules recognized.Fig. 4 method can by Fig. 2 and/or Fig. 3 dress
200 execution are put, alternatively by seeking unity of action layer 304 and/or processing unit 208 is performed.
Alternatively, 402, retrieved from list of hard coded etc., load, automatically generate and/or provide manually from file
The API actuator types that device 200 is supported.
404, the module that storage communicates in device 200 or with device 200 is searched alternately through by holding equipment, is come
Performing environment is scanned with recognition processor.Processor is identified as supported equipment according to the API actuator types supported.
The equipment supported can be stored in the supported equipment repository 406 communicated with device 200.
Entry 414 is stored in the example of the supported device entry identified in thesaurus 406.Entry 414 can be with
Include common portion and actuator specific part.Common portion (is e.g., including supported in the high-level equipment that abstractively defines
API).Actuator specific part defines the low-level features of equipment, to be operated by API executor modules.
Alternatively, 408, the equipment recognized is designated as can be used for processing API request.Or, such as using classification
Device, or according to one group of rule, such as according to equipment availability, device efficiency threshold value, equipment use cost threshold value or other factorses, refer to
The subset of fixed recognized equipment, such as with including actual available equipment and/or most suitable equipment.
It is that each equipment specified automatically creates API executor modules 410.Setting from thesaurus 406 can be used
API executor modules are initialized for information.The API executor modules generated can be stored in actuator thesaurus 412.
Referring now to Fig. 5, Fig. 5 is the operation for being used to map the support of API executor modules according to some embodiments of the invention
The flow chart of the method for (also referred herein as operational order).Fig. 5 method can be performed by Fig. 2 and/or Fig. 3 device 200, can
Selection of land is by seeking unity of action layer 304 and/or processing unit 208 is performed.
Alternatively, 502, the API executor modules each generated include one group of operation storage, and (herein also referred to as operation is ordered
Order set), operation storage be come for example from file, from the transmission of layer 304 of seeking unity of action hard coded, from remote server and from hand
The list retrieval of dynamic generation.
This group of operational order corresponds to the API request received.
Alternatively, 504, each API executor modules are one by one loaded into the memory of device 200 parallel or successively
In.API executor modules can be loaded from actuator thesaurus 506.
Alternatively, 508, its respective operation storage is loaded into dress by each API executor modules parallel or one by one
Put in 200 memory.Operation storage can be obtained from public operation storage thesaurus 510.
Alternatively, 512, each operation storage of each API executor modules performs initialization procedure to attempt initially
Change the operation in operation storage.The example of initialization includes:Compiling, resource allocation and Memory Allocation.Successful initialization and/or
The operation of distribution is identified.It should be noted that the initialization of some operations may fail, (these operations are then excluded can
With), for example, low memory, compiling resource be unavailable or other mistakes.
It is associated with corresponding API executor modules in the operation of 514, one groups of successful initializations and/or distribution, alternatively
It is stored in actuator operation thesaurus 516.
Alternatively, the operational order in one group of operational order in operation storage is by indicating the signature of corresponding API request,
Alternately through action name and/or operating parameter, it is associated and/or recognizes.Signature is represented there is provided the abstract of operational order
Represent.Different rudimentary API can discriminatively realize that identical advanced signature is represented.API request can according to signature represent,
For example using the source code write by (in one group of operational order) operational order in the grammer of signature definition, to define.
For each operation and/or each API executor module iteration square frames 508-514.
518, the available action to each API executor modules is compiled, for example, organize and/or summarize.Processing is single
Member 208 and/or layer 304 of seeking unity of action may have access to available operation, to determine how the API request received is mapped into API
Executor module.
Table 520 is the example of the data structure for the available action for storing each API actuator types.Row 522 are listed can
With the signature of operation.For example, the sequence of API executor modules A, B and C support floating number and integer, and API executor modules C
The multiplying of integer is supported with D.
Referring now to Fig. 6, Fig. 6 is computer journey of the senior API request of description according to some embodiments of the invention from execution
Schematic diagram of the sequence to the mapping of the rudimentary API instructions for being performed on target device and/or processor.
Block in row 602 depicts the mapping of the API request using the establishment storage object sent.API request is mapped to
The unified storage object sought unity of action at layer, the storage object is mapped to two storages at available API configuration processors module
Object.Each API executor modules operate target device and/or processor using different rudimentary API.
Block in row 604 depicts the similar mapping for the API request for creating queue.
Block in row 606 depicts the similar mapping for the API request for creating operational order.It is worth noting that, uniformly holding
Unified operation object map at row layer such as can use behaviour to the API executor modules for being able to carry out asked operation with limiting
The associated low-level operation storage of each API executor modules of the low-level command of work is defined.
Referring back to Fig. 1 is referred to, 108, the interface 202 of device 200 receives one or a series of API request 204.
The computer program 206 (for example, application) that API request can be performed from performing environment 214 is received.It is optional
Ground, processing unit 208 manages the sequence in one or more layer queue object 306C that seek unity of action.Sequence can include being placed on
Unified operation (coming from available unified operation object) in unified queue object.
Alternatively, the sequence received is divided into multiple queues by processing unit 208.Or, or in addition, each queue
Ending is defined by API request.
The different API executor module that each queue is mapped in the API executor modules of one group of generation
And handled by the API executor modules.It can be reflected according to the API executor modules for supporting the operational order in sequence to perform
Penetrate.The API executor modules for all operations being able to carry out in each queue can be recognized (so as to complete complete in queue
Set order).Specify the API executor modules for handling queue can be alternatively according to decision-making module, such as grader or one
Group rule, is selected from the set (or from full set) recognized.Decision-making module can for example can according to the object in equipment
With property (for example, in terms of memory availability and/or queue space), processing cost, statistical information and/or other degree during operation
Measure to make decision.
The selection of API executor modules for handling each queue improves systematic function, because different API is performed
Device module can handle the different queue with different performance rank.The best API of performance can be selected to perform for each queue
Device module.The automatic dividing sequence of device, and different piece for sequence specifies and uses different API executor modules, and not
Necessarily programmer's dividing sequence.
110, target API executor modules perform the queue of API request sequence.API executor modules are by queue
Command mapping is into the instruction based on rudimentary API, and these instructions are operated on correspondence target processor.From Compatible object to local
The mapping of API executor module objects allows optionally to be mapped to high-level abstractions API request into different rudimentary API instructions.
Processing unit 208 is performed using each group operational order of each in the one group of API specified executor module
Each subsequence.If applicable, subsequence can perform according to priority and/or out of turn concurrently, successively.Alternatively,
For example sub-sequence includes senior memory command, is held according to storage object of the seeking unity of action API corresponding to each for layer of seeking unity of action
Association between one group of storage object of row device module, to instruct to perform each height at each corresponding API executor module
Sequence.Besides or furthermore, when for example sub-sequence includes higher level operation order, according to the operation life of seeking unity of action for layer of seeking unity of action
The association between one group of operational order of API executor modules corresponding with each is made to instruct to perform.Bind command can root
Performed according to the binding propagated between operation and storage object, it is as described herein.
In each one group of storage object in storage object of the seeking unity of action API executor modules different from the group
Member is associated.Each one group of operation in each operational order API executor modules different from the group of seeking unity of action
Member in order is associated.
The corresponding set of operational order of each and/or one group of storage in the described one group API executor modules specified
Object is used to indicate one in different processor each subsequence to perform API request sequence.API executor modules are with right
Answer API grammer to generate low-level command, low-level command is used to indicate corresponding processor.
Referring now to Fig. 7, Fig. 7 is to describe to be used for the sequence mapping of the API request of reception to for indicating in performing environment
Target device on execution rudimentary API module and/or object between data flow example schematic diagram, such as this paper institutes
State.
702, the computer program (for example, application program) performed in performing environment to one of layer of seeking unity of action or
Multiple unified queues provide API request sequence.API request includes the sequence of operations organized according to subsequence.Each subsequence
Ending can automatically be defined by application definition, and/or by layer of seeking unity of action.
704, scheduling can be applied the tag to the operation in unified queue to recognize subsequence by layer of seeking unity of action.
As described herein, each subsequence is mapped to one of API executor modules by layer of seeking unity of action.
It is as described herein, the unified storage object for the operation being tied in subsequence is mapped or copied to the API specified
The storage object of executor module.
706, the sheet of (for example, duplication) to corresponding specified API executor modules is provided by the subsequence of each mapping
Ground actuator queue.The data of unified storage object from unified layer can be provided (for example, duplication) to each corresponding
Specify the storage object of API executor modules.
708, the operation that the API executor modules specified are associated using it stores to perform the behaviour in actuator queue
Make.API executor modules can be stored with call operation, so as to transmit operation and any storage pair bound in local queue
As.
710, the operation code being stored in local operation storage accesses API runtime environments to perform on relevant device
Code.The instruction of rudimentary API definition is generated and performed in equipment.
To the description of each of the invention embodiment for illustrative purposes only, and these descriptions are not intended as exhaustive or limit
In the disclosed embodiments.In the case where not departing from the scope and spirit of described embodiment, those skilled in the art can
To be clearly understood that many modifications and variations.The technology that can be found compared in the market, selects term used herein can be best
The principle, practical application or technological progress of the present embodiment are explained, or others skilled in the art is understood reality disclosed herein
Apply example.
It is contemplated that since the life cycle of the patent moved to maturity the application, it will develop many associative processors
And API, the scope of term intermediate representation, processor and API is intended to include all such priori new technologies.From
During the ripe patent of the application, many associative processors and API will be developed, and term processor and API
Scope is intended to all these new technologies being included.
Term " about " used herein refers to ± 10%.
Term " comprising " and " having " expression " including but is not limited to ".This term include term " by ... constitute " with
And " substantially by ... constitute ".
Phrase " substantially by ... constitute " refers to that construction or method may include extra material and/or step, but premise
It is extra material and/or step will not substantially change construction claimed or the basic and novel feature of method.
Unless the context clearly indicates otherwise, singulative " one " used herein and " described " contain including plural number
Justice.For example, term " compound " or " at least one compound " can include multiple compounds, including its mixture.
Word " exemplary " expression " being used as an example, example or explanation " used herein.It is any " exemplary " real
Apply example and be not necessarily to be construed as prior to or be superior to other embodiments, and/or be not precluded from the combination of other embodiments feature.
Word " alternatively " expression used herein " is provided and not carried in other embodiments in certain embodiments
For ".The embodiment of any specific of the present invention can include multiple " optional " features, unless these features are conflicting.
In whole present application, various embodiments of the present invention can be presented with range format.It should be understood that range format
Description is not construed as the fixed limitation to the scope of the invention only for convenient and for purpose of brevity.Therefore, to scope
Description be considered as having disclosed particularly the individual number in all possible subrange and the scope.For example,
To for example being considered as having disclosed particularly subrange from the description of 1 to 6 scope, such as from 1 to 3, from 1 to 4, from 1
To 5, from 2 to 4, from 2 to 6, from the individual digital in 3 to 6 etc., and the scope, such as 1,2,3,4,5 and 6.Not scope tube
Width how, this is all suitable for.
When it is noted that during a digital scope, representing to include any cited number in the range of this pointed out
Word (fraction or integer).Phrase " within the scope of the number indicated by the number indicated by first and second " and " from first
Within the scope of the indicated number counted to indicated by second " and used interchangeably herein, expression includes first and second institute
The number of instruction and all therebetween fraction and integer.
Single embodiment can also provide the combination of some features, and these features have brief in each embodiment text
Description.On the contrary, each feature of the present invention has brief description in the text of single embodiment, this can also be provided respectively
A little feature or any suitable sub-portfolios are used as any suitable other embodiments of the present invention.It is not construed as each
Some features described in the text of embodiment are the essential features of these embodiments, unless there are no these elements, the embodiment
It is invalid.
Herein, all publications, patent and the patent specification referred in this specification is all by quoting this specification knot
Close in this manual, equally, each individually publication, patent or patent specification are also specific and individually combine herein.
In addition, to the reference or identification of any reference of the application can not as be allow it is such with reference in the prior art prior to
The present invention.With regard to using for section header, section header should not be understood as to necessary restriction.
Claims (14)
1. one kind is used for the dress for handling API (application programming interface, API) request
Put, it is characterised in that including:
Interface, for receiving the API request;And
Processing unit, is used for:
Multiple processors of the identification with different instruction set framework (instruction set architecture, ISA);
Operate a different set of API executor modules;And
At least one API executor module is controlled, to be asked at least one processor in the processor based on the API
Ask to perform order.
2. device according to claim 1, it is characterised in that:
Each API executor modules are included with least one object in the following group:Storage object, operation object, queue object, and
At least one described object is used for predefined ISA;And
The processing unit is used to control at least one described API executor module, with based at least one object it is described extremely
The order is performed on a few processor.
3. the device according to any one of preceding claims, it is characterised in that including unified layer, the unified layer bag
Include with least one Compatible object in the following group:Unified storage object, unified operation object, unified queue object, and the system
One object is then used for the API request.
4. the device according to any one of claim 2-3, it is characterised in that the processing unit is used for one group of behaviour
An operational order in ordering is associated to the signature of the corresponding API request of instruction.
5. the device according to any one of claim 2-4, it is characterised in that each storage in one group of storage object
Object includes:
Common portion, defines each member of one group of storage object publicly-owned public Value Types and public function;And
Specific part, uniquely defines at least one specific Value Types and at least one API specific function is called.
6. the device according to any one of claim 2-5, it is characterised in that the processing unit is used for using described
The corresponding set of operational order of each in one group of API executor module, based on it is multiple seek unity of action storage object with it is described
Association between one group of storage object and it is multiple seek unity of action between operational order and one group of operational order associate, come
Perform another in multiple subsequences.
7. device according to claim 6, it is characterised in that each storage object of seeking unity of action with described one group not
A member in the one group of storage object of each in same API executor modules is associated, and each described
In the one group of operational order of each in operational order of seeking unity of action and a different set of API executor modules
One member is associated.
8. the device according to any one of preceding claims, it is characterised in that the processing unit is used for:
The runtime data of the performing environment is collected,
To use the runtime data by one in the API executor modules and a phase in the multiple processor
Association.
9. the device according to any one of preceding claims, it is characterised in that the processing unit is used for:
According to the processor characteristic selected from order response time, overall order execution time and power consumption, by the API
One in executor module associated with one in the multiple processor.
10. the device according to any one of preceding claims, it is characterised in that the processing unit is used for sequence
It is divided into multiple queues;Each in the multiple queue by one in a different set of API executor modules not
Same API executor modules processing.
11. the device according to any one of preceding claims, it is characterised in that the processing unit is used for described
The a different set of API executor modules are created during the operation of application at initialization event.
12. the device according to any one of preceding claims, it is characterised in that the processing unit is used to manage extremely
Sequence in a few layer queue of seeking unity of action.
13. one kind is used to handle API (application programming interface, API) request
Method, it is characterised in that methods described is used to operate the device according to one of preceding claims.
14. a kind of computer program, it is characterised in that the computer program runs preceding method when performing on computers.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2015/054130 WO2016134784A1 (en) | 2015-02-27 | 2015-02-27 | Systems and methods for heterogeneous computing application programming interfaces (api) |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107250985A true CN107250985A (en) | 2017-10-13 |
CN107250985B CN107250985B (en) | 2020-10-16 |
Family
ID=52598745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201580076832.4A Active CN107250985B (en) | 2015-02-27 | 2015-02-27 | System and method for heterogeneous computing Application Programming Interface (API) |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107250985B (en) |
WO (1) | WO2016134784A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080244507A1 (en) * | 2007-03-30 | 2008-10-02 | Microsoft Corporation | Homogeneous Programming For Heterogeneous Multiprocessor Systems |
CN101299199A (en) * | 2008-06-26 | 2008-11-05 | 上海交通大学 | Heterogeneous multi-core system based on configurable processor and instruction set extension |
CN101923492A (en) * | 2010-08-11 | 2010-12-22 | 上海交通大学 | Method for executing dynamic allocation command on embedded heterogeneous multi-core |
US20140089905A1 (en) * | 2012-09-27 | 2014-03-27 | William Allen Hux | Enabling polymorphic objects across devices in a heterogeneous platform |
CN103858099A (en) * | 2011-08-02 | 2014-06-11 | 国际商业机器公司 | Technique for compiling and running high-level programs on heterogeneous computers |
US20140281457A1 (en) * | 2013-03-15 | 2014-09-18 | Elierzer Weissmann | Method for booting a heterogeneous system and presenting a symmetric core view |
-
2015
- 2015-02-27 CN CN201580076832.4A patent/CN107250985B/en active Active
- 2015-02-27 WO PCT/EP2015/054130 patent/WO2016134784A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080244507A1 (en) * | 2007-03-30 | 2008-10-02 | Microsoft Corporation | Homogeneous Programming For Heterogeneous Multiprocessor Systems |
CN101299199A (en) * | 2008-06-26 | 2008-11-05 | 上海交通大学 | Heterogeneous multi-core system based on configurable processor and instruction set extension |
CN101923492A (en) * | 2010-08-11 | 2010-12-22 | 上海交通大学 | Method for executing dynamic allocation command on embedded heterogeneous multi-core |
CN103858099A (en) * | 2011-08-02 | 2014-06-11 | 国际商业机器公司 | Technique for compiling and running high-level programs on heterogeneous computers |
US20140089905A1 (en) * | 2012-09-27 | 2014-03-27 | William Allen Hux | Enabling polymorphic objects across devices in a heterogeneous platform |
US20140281457A1 (en) * | 2013-03-15 | 2014-09-18 | Elierzer Weissmann | Method for booting a heterogeneous system and presenting a symmetric core view |
Also Published As
Publication number | Publication date |
---|---|
CN107250985B (en) | 2020-10-16 |
WO2016134784A1 (en) | 2016-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6997285B2 (en) | Multipurpose parallel processing architecture | |
WO2021114530A1 (en) | Hardware platform specific operator fusion in machine learning | |
Abadi et al. | Tensorflow: Large-scale machine learning on heterogeneous distributed systems | |
CN103858099B (en) | The method and system applied for execution, the circuit with machine instruction | |
CN107924323B (en) | Dependency-based container deployment | |
EP3614260A1 (en) | Task parallel processing method, apparatus and system, storage medium and computer device | |
US9424079B2 (en) | Iteration support in a heterogeneous dataflow engine | |
KR102253628B1 (en) | Combining states of multiple threads in a multi-threaded processor | |
JP2008535074A (en) | Creating instruction groups in processors with multiple issue ports | |
US20190130270A1 (en) | Tensor manipulation within a reconfigurable fabric using pointers | |
US20210294960A1 (en) | Systems and methods for intelligently buffer tracking for optimized dataflow within an integrated circuit architecture | |
Syriani et al. | Modeling a model transformation language | |
US11567778B2 (en) | Neural network operation reordering for parallel execution | |
US7530063B2 (en) | Method and system for code modification based on cache structure | |
WO2020169182A1 (en) | Method and apparatus for allocating tasks | |
US20190121678A1 (en) | Parallel computing | |
CN107250985A (en) | For Heterogeneous Computing API(API)System and method | |
US11372677B1 (en) | Efficient scheduling of load instructions | |
US11704562B1 (en) | Architecture for virtual instructions | |
EP3971787A1 (en) | Spatial tiling of compute arrays with shared control | |
US11709783B1 (en) | Tensor data distribution using grid direct-memory access (DMA) controller | |
Lindemann et al. | Intelligent strategies for structuring products | |
CN115516435A (en) | Optimized arrangement of data structures in hybrid memory-based inferential computing platforms | |
Jiang et al. | A Task Parallelism Runtime Solution for Deep Learning Applications using MPSoC on Edge Devices | |
US11809981B1 (en) | Performing hardware operator fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |