CN110321204A - Computing system, hardware accelerator management method and device and storage medium - Google Patents

Computing system, hardware accelerator management method and device and storage medium Download PDF

Info

Publication number
CN110321204A
CN110321204A CN201810278166.8A CN201810278166A CN110321204A CN 110321204 A CN110321204 A CN 110321204A CN 201810278166 A CN201810278166 A CN 201810278166A CN 110321204 A CN110321204 A CN 110321204A
Authority
CN
China
Prior art keywords
hardware accelerator
interface
hardware
layer
computing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810278166.8A
Other languages
Chinese (zh)
Inventor
易建龙
孙晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xilinx Inc
Original Assignee
Beijing Shenjian Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenjian Intelligent Technology Co Ltd filed Critical Beijing Shenjian Intelligent Technology Co Ltd
Priority to CN201810278166.8A priority Critical patent/CN110321204A/en
Publication of CN110321204A publication Critical patent/CN110321204A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a kind of computing system, hardware accelerator management method and managing device, for realizing the calculating equipment and storage medium of the hardware accelerator management method.The computing system includes: application layer;One or more hardware accelerators execute respectively scheduled computing function for the call instruction in response to application layer;And interface layer, one or more interfaces are provided with, are respectively used to realize docking between application layer and one or more hardware accelerators.Application layer sends the call instruction for calling hardware accelerator to execute computing function to interface layer, and interface layer is based on call instruction, distributes calculating task to the hardware accelerator for executing computing function by corresponding interface.By using above-mentioned technical proposal according to the present invention, can neatly manage and scheduling hardware accelerating module.

Description

Computing system, hardware accelerator management method and device and storage medium
Technical field
The present invention relates to the scheduling scheme for the multiple hardware accelerators for including in computing system, in particular to a kind of calculating system System, hardware accelerator management method and managing device, for realizing the hardware accelerator management method calculating equipment and Storage medium.
Background technique
In recent years, neural network and depth learning technology were widely used in image procossing and field of speech recognition, achieved Good application effect.At the same time, such algorithm to calculate power demand also expedited the emergence of huge accelerator (or for " accelerate Module ", " hardware accelerator ") market.It either uses GPU or FPGA or is ASIC, be all by its framework to mind Friendly through network algorithm realizes efficiency more better than CPU.
In order to enable an application to call external hardware accelerator, need to add respective code in the application, so as to Accelerating module sends call instruction and corresponding data.
In this way, the addition of accelerating module improves the difficulty of algorithm and application and development.
Further, in practical applications, it may be necessary to which multiple accelerating modules cooperate to be optimal effect.It is different Accelerating module function is different, interface is different, usage mode is also different.It is especially external when external multiple accelerating modules When the accelerating module of multiple and different types, need to add the corresponding code of multistage respectively in the application.Different type or different function The accelerating module of energy generally requires different instruction codes or has data input/output format, it is therefore desirable to distinguish in the application Write addition code.This proposes challenge to management and scheduling.
Therefore, it is necessary to a kind of flexible management and dispatching method, the ability of abundant relief accelerator simultaneously guarantees upper layer simultaneously The ease for use of application interface.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of computing system, hardware accelerator management method and management Device, calculating equipment and storage medium for realizing the hardware accelerator management method, neatly can manage and adjust Spend hardware accelerator.
According to the first aspect of the invention, a kind of computing system is provided, comprising: application layer;One or more hardware Accelerating module executes respectively scheduled computing function for the call instruction in response to application layer;And interface layer, it is provided with One or more interfaces are respectively used to realize docking between application layer and one or more hardware accelerators, wherein application Layer sends the call instruction for calling hardware accelerator to execute computing function to interface layer, and interface layer is based on call instruction, leads to It crosses corresponding interface and distributes calculating task to the hardware accelerator for executing computing function.
According to the second aspect of the invention, a kind of hardware accelerator management method for computing system is provided, The computing system includes that application layer, interface layer and one or more hardware accelerators, hardware accelerator are used in response to answering Respectively scheduled computing function is executed with the call instruction of layer.This method comprises: by the one or more interfaces of interface layer maintenance, one A or multiple interfaces are respectively used to realize docking between application layer and one or more hardware accelerators;Interface layer is from application Layer receives the call instruction for calling hardware accelerator to execute computing function;And interface layer is based on call instruction, by right The interface answered distributes calculating task to the hardware accelerator for executing computing function.
Optionally, interface layer can determine according to call instruction and correspond to the hardware accelerator for executing computing function Interface distributes calculating task will pass through interface to hardware accelerator.
Optionally, interface can be docked with multiple congenerous hardware accelerators for executing identical calculations function, and more Calculating task is distributed between a congenerous hardware accelerator.
Optionally, interface layer can safeguard corresponding task queue for each interface respectively.
Optionally, which can also include caching;Input number needed for application layer can will execute computing function According to being saved on caching;Hardware accelerator reads corresponding input data according to the calculating task for its distribution, from caching, and The output data write-in that computing function obtains will be executed to cache.
Optionally, call instruction may include the input-buffer address of input data and the specified output caching of output data Address.
Optionally, computing system being introduced into response to hardware accelerator or being activated, interface layer can star for hard The initialization operation of part accelerating module.
Optionally, in response to being assigned with calculating task for hardware accelerator, interface layer can trigger hardware accelerator Start to execute computing function.
Optionally, terminate the execution of computing function in response to hardware accelerator, interface layer can be executed to be added for hardware The state recovery operations of fast module.
Optionally, it is unloaded or is deactivated from computing system in response to hardware accelerator, interface layer can execute For the cleaning operation of hardware accelerator.
The hardware accelerator docked with the same interface belongs to the same module group, and optionally, interface layer can respond In addition independently of the new hardware accelerator of existing module group, new interface is set, to add for realizing application layer and new hardware Docking between fast module.
According to the third aspect of the present invention, a kind of hardware accelerator management dress for computing system is additionally provided It sets, hardware accelerator is used to execute respectively scheduled computing function in response to the call instruction of application layer, which includes: to connect Mouth maintenance device, for safeguarding that one or more interfaces, one or more interfaces are respectively used to realize application layer and one or more Docking between a hardware accelerator;Order reception apparatus calls hardware accelerator to execute for receiving from application layer The call instruction of computing function;And module calling device, for being based on call instruction, by corresponding interface to for executing The hardware accelerator of computing function distributes calculating task.
According to the fourth aspect of the present invention, a kind of calculating equipment is additionally provided, comprising: processor;And memory, On be stored with executable code, when executable code is executed by processor, make processor execute it is according to the present invention second The method of aspect.
According to the fifth aspect of the present invention, a kind of non-transitory machinable medium is additionally provided, is stored thereon There is executable code, when executable code is executed by the processor of electronic equipment, processor is made to execute of the invention second The method of aspect.
Based on above scheme, the embodiment of the present invention be may have the advantage that
1. the interface for application layer has stability, ease for use, do not need to modify interface for hardware accelerator.
2. bottom hardware accelerating hardware has scalability, whether correspond to existing interface hardware accelerator or Independently of the hardware accelerator of existing interface, can easily add.
3. a variety of usage modes, the manage and dispatches such as synchronous/asynchronous is supported to have very big flexibility.
Detailed description of the invention
Disclosure illustrative embodiments are described in more detail in conjunction with the accompanying drawings, the disclosure above-mentioned and its Its purpose, feature and advantage will be apparent, wherein in disclosure illustrative embodiments, identical reference label Typically represent same parts.
Fig. 1 schematically shows computing system according to the present invention of the invention.
Fig. 2 is the schematic flow chart of hardware accelerator management method according to an embodiment of the present invention.
Fig. 3 is the schematic flow chart of calculating task scheduling scheme according to an embodiment of the invention.
Fig. 4 schematically shows hardware accelerator mode of operation management modes according to an embodiment of the present invention.
Fig. 5 is the schematic block diagram of hardware accelerator managing device according to an embodiment of the present invention.
Fig. 6, which shows an embodiment according to the present invention, can be used for realizing that the calculating of above-mentioned hardware accelerator management method is set Standby schematic block diagram.
Specific embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here Formula is limited.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and can be by the disclosure Range is completely communicated to those skilled in the art.
The invention discloses a kind of computing systems for being added to multiple hardware accelerators, such as can be that there are multi-class The heterogeneous polynuclear computing system of the accelerating module of multi-quantity.By effectively managing and dispatching, so that hardware accelerator can spirit Extension living, while easy-to-use and stable calling interface can be provided upper application software.
Fig. 1 schematically shows computing system according to the present invention of the invention.
As shown in Figure 1, computing system of the invention includes application layer 110, interface layer 130 and accelerating module layer 150.
Application layer 110 is for realizing the desired application scheme of user.User operates in the level of application layer 110.
Be provided in accelerating module layer 150 one or more hardware accelerator Acc1, Acc2 ... AccN, for ringing Respectively scheduled computing function should be executed in the call instruction of application layer 110.
When being provided with multiple hardware accelerators, these hardware accelerators can be different types of, can distinguish For executing different computing functions.
Interface layer 130 be provided with one or more interface Call_Acc1, Call_Acc2 ..., Call_AccN, respectively For realizing docking between application layer 110 and hardware accelerator.
According to the difference of implementation, interface Call_Acc1, Call_Acc2 ..., Call_AccN can be software and connects Mouthful, it is also possible to hardware interface.
Application layer 110 sends the call instruction for calling hardware accelerator to execute computing function to interface layer 130.Interface Layer 130 is based on call instruction, is distributed and is calculated to the hardware accelerator for executing the computing function by corresponding interface Task.
In this way, realizing a middle layer (interface layer), between hardware accelerator and application layer for managing and dispatching Hardware accelerator allows upper layer application (application layer) to shield low-level details (hardware accelerator), can be absorbed in algorithm and reality The exploitation of border function.
Hardware accelerator management method according to an embodiment of the present invention is described in detail below with reference to Fig. 2.
Fig. 2 is the schematic flow chart of hardware accelerator management method according to an embodiment of the present invention.This method can be with Applied to computing system as shown in Figure 1.
As shown in Fig. 2, interface layer 130 safeguards one or more interface Call_Acc1, Call_ in step S210 Acc2,……,Call_AccN.Each interface respectively with its corresponding to hardware accelerator dock.
In step S220, interface layer 130 receives the tune for calling hardware accelerator to execute computing function from application layer 110 With instruction.
Then, in step S230, interface layer 130 is based on call instruction, by corresponding interface to for executing the calculating The hardware accelerator of function distributes calculating task.
The hardware accelerator that calculating task is assigned as a result, can execute the specified computing function of the calculating task.
Call instruction for example can have order parameter.Interface layer 130 can determine the calling by analysis instruction parameter The computing function that instruction expectation hardware accelerator executes.It is held in this way, interface layer 130 can determine to correspond to according to call instruction The interface of the hardware accelerator of the row computing function, is distributed with will pass through identified interface to corresponding hardware accelerator Calculating task.
Interface layer 130 can be each interface safeguard respectively corresponding to the interface task queue (or for " wait team Column ").When interface layer 130 receives the call instruction that application layer 110 issues, when determining corresponding interface, the interface is corresponding hard In the case that part accelerating module is not idle, which can be thought to be put into task queue.It is empty to corresponding hardware accelerator After spare time, calculating task is taken out from task queue and is executed.It, can be to avoid answering in this way, when application layer 110 issues call instruction With the unnecessary waiting of layer 110.
On the other hand, an interface can be docked with multiple congenerous hardware accelerators for executing identical calculations function, And the calculating task is distributed between multiple congenerous hardware accelerators.
When encountering acceleration request, i.e., when interface layer 130 is received from the call request of application layer 110, interface layer 130 is made To dispatch executing agency, the operation taken is determined according to the state of hardware accelerator specified by call instruction.When free When not busy hardware accelerator is available, it is distributed to the hardware accelerator directly to execute calculating task.Otherwise by calculating task Waiting list is added.Each hardware accelerator is after completing current task, and whether inquiry waiting list is empty, if there is waiting to appoint Business then chooses calculating task according to scheduled strategy to execute corresponding computing function.
In the following, describing calculating task scheduling scheme corresponding with an interface with reference to Fig. 3.
Fig. 3 is the schematic flow chart of calculating task scheduling scheme according to an embodiment of the invention.
As shown in figure 3, a kind of calculating task dispatching method, comprising:
Step S310: interface layer 130 determines after receiving the call request from application layer 110 and is used to handle the tune With the interface of request;
Step S320: whether the hardware accelerator that judgement is docked with the interface is available free available;
If the hardware accelerator docked with the interface only one, judge whether the hardware accelerator is being held Row calculating operation enters waiting list if being carrying out calculating operation.If the hardware accelerator docked with the interface Have multiple, then judges whether available free available hardware accelerator.It is multiple hardware-accelerated due to what is docked with the same interface Module executes identical computing function, therefore can give the distribution of computation tasks to any one idle hardware-accelerated mould Block.
If it is not, executing step S330, that is, task queue (waiting list) is added in calculating task.
If so, thening follow the steps S340, that is, execute this computing function by the hardware accelerator of the free time.
Step S350: after hardware accelerator executes computing function, judge whether there is waiting task in task queue.
If so, thening follow the steps S360, that is, obtain calculating task from waiting list, return step S340 is hard by this Part accelerating module executes corresponding computing function.
If it is not, task schedule terminates.
Referring back to Fig. 1, preferably, which can also include caching 170 to one kind.Application layer 110 can With direct access cache 170 (as shown in Figure 1), 130 access cache 170 of interface layer can also be passed through.
For application layer 110 when issuing call instruction, input data needed for can also holding computing function is saved in caching On 170.
In an alternative embodiment, call instruction can also include input-buffer of the input data on caching 170 Specified output buffer address of the output data of address and hardware accelerator on caching 170.
Interface layer 130 can associatedly record input-buffer address and be specified with calculating task when distributing calculating task Export buffer address.
Hardware accelerator can be read according to the calculating task for its distribution from the above-mentioned input-buffer address of caching 170 Corresponding input data is taken, and the specified output executed in the output data write-in caching 170 that the computing function obtains is delayed Deposit address.
Application layer 110 can directly arrive specified output buffer address and read output data as a result,.In this way.Application layer 110 is right The calling of hardware accelerator whether uses synchronization call mode, or uses asynchronous call mode, can be easily real It is existing.
According to above with reference to Fig. 1 and Fig. 2 description hardware accelerator Managed Solution, with upper-layer user's (application layer 110) in terms of Interface design, the consistency and stability of interface is ensure that, different function or different types of hardware are added Fast module can use essentially identical interface.In this way, user only needs to be concerned about that function is realized, it is hardware-accelerated without being concerned about The details of module.Moreover, it is also possible to which selection is appointed using mode either synchronously or asynchronously to calculate to the notice of application layer 110 as needed The end of business.
As set forth above, it is possible to there is the hardware accelerator of one or more congenerous to dock with an interface.With it is same The hardware accelerator of interface docking belongs to the same module group.
In this way, Acc1, Acc2 shown in FIG. 1 ..., AccN can be respectively seen as N number of module group (hardware accelerator Group), accordingly, interface layer 130 realize N number of accelerating interface Call_Acc1, Call_Acc2 ..., Call_AccN.This Computing function performed by a little module groups can be different.Alternatively, there may also be part of module groups to execute identical calculating Function.
Each module group can only include a hardware accelerator, also may include the hardware-accelerated mould of multiple congenerous Block.The quantity of hardware accelerator can be determined according to practical calculated case in one module group.
For example, as hardware accelerator group Acc1 (originally only one hardware accelerator Acc1_ in discovery computing system Core0 when) responsible calculating is performance bottleneck, (or multiple) hardware accelerator Acc1_core1 can be further added by.Example It, can be simply hardware-accelerated by what is increased newly by interface Call_Acc1 such as by using scheduling scheme as shown in Figure 3 above The computing capability of modules A cc1_core1 and the computing capability of original hardware accelerator Acc1_core0 fully utilize. And interface can still remain unchanged, and not need change application layer 110 more.Personal code work ensure that consistency and stability in this way.
On the other hand, except existing module group, when adding hardware accelerating module, a newly-increased module can be considered as Group.One new interface of setting need to only be increased, at this time for realizing docking between application layer 110 and new hardware accelerator.
New interface and original interface are essentially identical, the interactive relation with application layer 110 and new hardware accelerator It is identical as other interfaces.Moreover, the process of adding hardware accelerating module and newly-increased interface does not all influence other modules, protecting While having demonstrate,proved forward compatibility, also there is very big flexibility.
In view of the variation and evolution of algorithm, it is likely that will appear new type/sexual function hardware accelerator.In this way, Hardware accelerator manager can adapt to the addition of new hardware accelerator as needed.
Fig. 4 schematically shows hardware accelerator mode of operation management modes according to an embodiment of the present invention.
Although computing function performed by different hardware accelerating module may be different, as shown in figure 4, in this hair In bright Managed Solution, its use process can be abstracted into four operation primitive namely four modes of operation:
Initialization 410, i.e., initialization operation when newly installing or be activated is introduced into meter in response to hardware accelerator Calculation system is activated, and interface layer starting is directed to the initialization operation of the hardware accelerator;
Triggering calculates 430, that is, starts the trigger action calculated, in response to being assigned with calculating task for hardware accelerator, Interface layer triggers the hardware accelerator and starts to execute computing function;
State restores 450, that is, terminates the state calculated and restore, terminate holding for computing function in response to hardware accelerator Row, interface layer execute the state recovery operations for being directed to the hardware accelerator;
Cleaning operation 470, that is, the cleaning work unloaded, unloaded in response to hardware accelerator from computing system or by Deactivation, interface layer execute the cleaning operation for being directed to hardware accelerator.
To each hardware accelerator, into after system, or when being activated again after being deactivated, carry out first Initialization operation 410;In the use process of hardware accelerator, continuous repeated trigger calculates 430 and state recovery 450;When When the hardware accelerator does not work, cleaning operation 470 is executed, discharges related resource.
By setting state management scheme standardized in this way, it is adapted to the hardware accelerator of various new additions.
Based on the above-mentioned technical proposal, the embodiment of the present invention has the advantage that
1. the interface for application layer has stability, ease for use, do not need to modify interface for hardware accelerator.
2. bottom hardware accelerating hardware has scalability, whether correspond to existing interface hardware accelerator or Independently of the hardware accelerator of existing interface, can easily add.
3. a variety of usage modes, the manage and dispatches such as synchronous/asynchronous is supported to have very big flexibility.
It can be used for implementing the hardware accelerator management dress of above-mentioned hardware accelerator management method below with reference to Fig. 5 description It sets.Some details are identical as the content described above with reference to Fig. 1 to 4, and details are not described herein.
Fig. 5 is the schematic block diagram of hardware accelerator managing device according to an embodiment of the present invention.
As shown in figure 5, the managing device 500 may include interface maintenance device 510, order reception apparatus 520, module tune With device 530.The managing device 500 can for example be realized by the interface layer 130 in Fig. 1.
Interface maintenance device 510 safeguards one or more interfaces.As described above, the one or more interface is respectively used to reality Docking between existing application layer 110 and one or more hardware accelerators.
Order reception apparatus 520 receives call instruction from application layer 110, and hardware accelerator is called in call instruction instruction To execute computing function.
Module calling device 530 is based on call instruction, is added by corresponding interface to the hardware for executing computing function Fast module assignment calculating task.
Module calling device 530 can determine the hardware-accelerated mould for corresponding to and executing the computing function according to call instruction The interface of block distributes calculating task will pass through the interface to the hardware accelerator.
For example, call instruction may include order parameter.Module calling device 530 can will be held according to order parameter determination Row is what computing function, so that it is determined that needing which hardware accelerator called, it is possible thereby to determine hardware-accelerated corresponding to this The interface of module.
As described above, interface can be docked with multiple congenerous hardware accelerators for executing identical calculations function, and Calculating task is distributed between multiple congenerous hardware accelerators.
Interface maintenance device 510 can also safeguard respectively corresponding task queue for each interface, dock when with interface Hardware accelerator be carrying out computing function, without it is idle when, calculating task can be put into task queue, until have with Interface docking and idle available hardware accelerator.
In addition, as described above, input data needed for application layer 110 can will execute computing function is saved on caching.
In this case, module calling device 530 can instruct hardware accelerator according to for its distribution calculating task, Corresponding input data is read from caching, and is cached the output data write-in that computing function obtains is executed.
The call instruction that application layer 110 is sent may include the input-buffer address of input data and specifying for output data Export buffer address.
Correspondingly, module calling device 530 can be by input-buffer address and specified output buffer address with calculating times Business is collectively notified hardware accelerator.
In addition, the managing device 500 can also include condition managing device (not shown), it is hardware-accelerated for managing The mode of operation of module.
Computing system is introduced into response to hardware accelerator or is activated, and condition managing device can star for hardware The initialization operation of accelerating module.
In response to being assigned with calculating task for hardware accelerator, condition managing device can trigger hardware accelerator and open Begin to execute computing function.
Terminate the execution of computing function in response to hardware accelerator, condition managing device can be executed for hardware-accelerated The state recovery operations of module.
It is unloaded or is deactivated from computing system in response to hardware accelerator, condition managing device can execute needle To the cleaning operation of hardware accelerator.
By using condition managing device standardized in this way, it is adapted to the hardware accelerator of various new additions.
In addition, the hardware accelerator docked with the same interface can be considered as belonging to the same module group.Management dress Setting 500 can also include that device is arranged in interface, and in response to adding the new hardware accelerator independently of existing module group, setting is new Interface, for realizing docking between application layer and new hardware accelerator.
New interface and original interface are essentially identical, the interactive relation with application layer 110 and new hardware accelerator It is identical as other interfaces.Moreover, the process of adding hardware accelerating module and newly-increased interface does not all influence other modules, protecting While having demonstrate,proved forward compatibility, also there is very big flexibility.
Fig. 6, which shows an embodiment according to the present invention, can be used for realizing that the calculating of above-mentioned hardware accelerator management method is set Standby schematic block diagram.
Referring to Fig. 6, calculating equipment 600 includes memory 610 and processor 620.
Processor 620 can be the processor of a multicore, also may include multiple processors.In some embodiments, Processor 620 may include a general primary processor and one or more special coprocessors, such as graphics process Device (GPU), digital signal processor (DSP) etc..In some embodiments, the circuit reality of customization can be used in processor 620 It is existing, such as application-specific IC (ASIC, Application Specific Integrated Circuit) or scene Programmable gate array (FPGA, Field Programmable Gate Arrays).
Memory 610 may include various types of storage units, such as Installed System Memory, read-only memory (ROM), and forever Long storage device.Wherein, ROM can store the static data of other modules needs of processor 620 or computer or refer to It enables.Permanent storage can be read-write storage device.Permanent storage can be after computer circuit breaking not The non-volatile memory device of the instruction and data of storage can be lost.In some embodiments, permanent storage device uses Mass storage device (such as magnetically or optically disk, flash memory) is used as permanent storage.In other embodiment, permanently deposit Storage device can be removable storage equipment (such as floppy disk, CD-ROM drive).Installed System Memory can be read-write storage equipment or The read-write storage equipment of volatibility, such as dynamic random access memory.Installed System Memory can store some or all processors The instruction and data needed at runtime.In addition, memory 610 may include the combination of any computer readable storage medium, Including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read only memory), disk and/or CD can also use.In some embodiments, memory 610 may include that removable storage that is readable and/or writing is set It is standby, for example, laser disc (CD), read-only digital versatile disc (such as DVD-ROM, DVD-dual layer-ROM), read-only Blu-ray Disc, Super disc density, flash card (such as SD card, min SD card, Micro-SD card etc.), magnetic floppy disc etc..It is computer-readable to deposit It stores up medium and does not include carrier wave and the momentary electron signal by wirelessly or non-wirelessly transmitting.
Code can be handled by being stored on memory 610, when that can handle code by the processing of processor 620, can make to handle Device 620 executes the hardware accelerator management method addressed above.
Computing system according to the present invention and hardware accelerator manager above is described in detail by reference to attached drawing Case.
In addition, being also implemented as a kind of computer program or computer program product, the meter according to the method for the present invention Calculation machine program or computer program product include the calculating for executing the above steps limited in the above method of the invention Machine program code instruction.
Alternatively, the present invention can also be embodied as a kind of (or the computer-readable storage of non-transitory machinable medium Medium or machine readable storage medium), it is stored thereon with executable code (or computer program or computer instruction code), When the executable code (or computer program or computer instruction code) by electronic equipment (or calculate equipment, server Deng) processor execute when, so that the processor is executed each step according to the above method of the present invention.
Those skilled in the art will also understand is that, various illustrative logical blocks, mould in conjunction with described in disclosure herein Block, circuit and algorithm steps may be implemented as the combination of electronic hardware, computer software or both.
The flow chart and block diagram in the drawings show the possibility of the system and method for multiple embodiments according to the present invention realities Existing architecture, function and operation.In this regard, each box in flowchart or block diagram can represent module, a journey A part of sequence section or code, a part of the module, section or code include one or more for realizing defined The executable instruction of logic function.It should also be noted that in some implementations as replacements, the function of being marked in box can also To be occurred with being different from the sequence marked in attached drawing.For example, two continuous boxes can actually be basically executed in parallel, They can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or stream The combination of each box in journey figure and the box in block diagram and or flow chart, can the functions or operations as defined in executing Dedicated hardware based system realize, or can realize using a combination of dedicated hardware and computer instructions.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or improvement to the technology in market for best explaining each embodiment, or make the art Other those of ordinary skill can understand each embodiment disclosed herein.

Claims (18)

1. a kind of computing system characterized by comprising
Application layer;
One or more hardware accelerators execute respectively scheduled calculating for the call instruction in response to the application layer Function;And
Interface layer is provided with one or more interfaces, is respectively used to realize that the application layer adds with one or more of hardware Docking between fast module,
Wherein, the application layer sends the call instruction for calling hardware accelerator to execute computing function to the interface layer, The interface layer is based on the call instruction, by corresponding interface to the hardware accelerator for executing the computing function Distribute calculating task.
2. computing system according to claim 1, which is characterized in that
The interface layer determines according to the call instruction and corresponds to connecing for the hardware accelerator for executing the computing function Mouthful, the calculating task is distributed will pass through the interface to the hardware accelerator.
3. computing system according to claim 2, which is characterized in that
The interface can be docked with multiple congenerous hardware accelerators for executing identical calculations function, and the multiple same The calculating task is distributed between functional hardware accelerating module.
4. according to claim 1 to computing system described in any one of 3, which is characterized in that
The interface layer safeguards corresponding task queue for each interface respectively.
5. computing system according to claim 1, which is characterized in that further include:
Caching,
Input data needed for the application layer will execute the computing function is saved on the caching,
The hardware accelerator reads corresponding input data according to the calculating task for its distribution, from the caching, and will It executes the output data that the computing function obtains and the caching is written.
6. computing system according to claim 5, which is characterized in that
The call instruction includes the input-buffer address of the input data and the specified output caching ground of the output data Location.
7. computing system according to claim 1, which is characterized in that
The computing system is introduced into response to the hardware accelerator or is activated, and the interface layer starting is for described hard The initialization operation of part accelerating module;And/or
In response to being assigned with calculating task for the hardware accelerator, the interface layer triggers the hardware accelerator and starts Execute the computing function;And/or
Terminate the execution of the computing function in response to the hardware accelerator, the interface layer is executed to be added for the hardware The state recovery operations of fast module;And/or
It is unloaded or is deactivated from the computing system in response to the hardware accelerator, the interface layer execution is directed to The cleaning operation of the hardware accelerator.
8. a kind of hardware accelerator management method for computing system, which is characterized in that the computing system includes application Layer, interface layer and one or more hardware accelerators, the hardware accelerator are used for the calling in response to the application layer The respective scheduled computing function of instruction execution, this method comprises:
One or more interfaces are safeguarded by the interface layer, one or more of interfaces be respectively used to realize the application layer with Docking between one or more of hardware accelerators;
The interface layer receives the call instruction for calling the hardware accelerator to execute computing function from the application layer;With And
The interface layer is based on the call instruction, by corresponding interface to for executing the hardware-accelerated of the computing function Module assignment calculating task.
9. according to the method described in claim 8, it is characterized in that, this method further include:
The interface layer determines according to the call instruction and corresponds to connecing for the hardware accelerator for executing the computing function Mouthful, the calculating task is distributed will pass through the interface to the hardware accelerator.
10. according to the method described in claim 9, it is characterized in that, the interface can be with multiple execution identical calculations functions The docking of congenerous hardware accelerator, and distribute the calculating task between the multiple congenerous hardware accelerator.
11. the method according to any one of claim 8-10, which is characterized in that this method further include:
The interface layer safeguards corresponding task queue for each interface respectively.
12. according to the method described in claim 8, it is characterized in that, the computing system further includes caching, this method further include:
Input data needed for the application layer will execute the computing function is saved on the caching;
The hardware accelerator reads corresponding input data according to the calculating task for its distribution, from the caching, and will It executes the output data that the computing function obtains and the caching is written.
13. according to the method for claim 12, which is characterized in that
The call instruction includes the input-buffer address of the input data and the specified output caching ground of the output data Location.
14. according to the method described in claim 8, it is characterized in that, this method further include:
The computing system is introduced into response to the hardware accelerator or is activated, and the interface layer starting is for described hard The initialization operation of part accelerating module;And/or
In response to being assigned with calculating task for the hardware accelerator, the interface layer triggers the hardware accelerator and starts Execute the computing function;And/or
Terminate the execution of the computing function in response to the hardware accelerator, the interface layer is executed to be added for the hardware The state recovery operations of fast module;And/or
It is unloaded or is deactivated from the computing system in response to the hardware accelerator, the interface layer execution is directed to The cleaning operation of the hardware accelerator.
15. according to the method described in claim 8, it is characterized in that, the hardware accelerator docked with the same interface belongs to The same module group, this method further include:
In response to adding the new hardware accelerator independently of existing module group new interface is arranged, for real in the interface layer Docking between the existing application layer and the new hardware accelerator.
16. a kind of hardware accelerator managing device for computing system, which is characterized in that the hardware accelerator is used for Respective scheduled computing function is executed in response to the call instruction of application layer, which includes:
Interface maintenance device, for safeguarding that one or more interfaces, one or more of interfaces are respectively used to answer described in realization With docking between layer and one or more of hardware accelerators;
Order reception apparatus calls the hardware accelerator to refer to execute the calling of computing function for receiving from application layer It enables;And
Module calling device, for being based on the call instruction, by corresponding interface to for executing the computing function Hardware accelerator distributes calculating task.
17. a kind of calculating equipment, comprising:
Processor;And
Memory is stored thereon with executable code, when the executable code is executed by the processor, makes the processing Device executes the method as described in any one of claim 8-15.
18. a kind of non-transitory machinable medium, is stored thereon with executable code, when the executable code is electric When the processor of sub- equipment executes, the processor is made to execute the method as described in any one of claim 8 to 15.
CN201810278166.8A 2018-03-31 2018-03-31 Computing system, hardware accelerator management method and device and storage medium Pending CN110321204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810278166.8A CN110321204A (en) 2018-03-31 2018-03-31 Computing system, hardware accelerator management method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810278166.8A CN110321204A (en) 2018-03-31 2018-03-31 Computing system, hardware accelerator management method and device and storage medium

Publications (1)

Publication Number Publication Date
CN110321204A true CN110321204A (en) 2019-10-11

Family

ID=68111903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810278166.8A Pending CN110321204A (en) 2018-03-31 2018-03-31 Computing system, hardware accelerator management method and device and storage medium

Country Status (1)

Country Link
CN (1) CN110321204A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991369A (en) * 2019-12-09 2020-04-10 Oppo广东移动通信有限公司 Image data processing method and related device
CN111143078A (en) * 2019-12-31 2020-05-12 深圳云天励飞技术有限公司 Data processing method and device and computer readable storage medium
CN112887093A (en) * 2021-03-30 2021-06-01 矩阵元技术(深圳)有限公司 Hardware acceleration system and method for implementing cryptographic algorithms

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986272A (en) * 2010-11-05 2011-03-16 北京大学 Task scheduling method under cloud computing environment
CN102375801A (en) * 2011-08-23 2012-03-14 孙瑞琛 Multi-core processor storage system device and method
US20150355949A1 (en) * 2011-12-13 2015-12-10 International Business Machines Corporation Dynamically configurable hardware queues for dispatching jobs to a plurality of hardware acceleration engines
CN105893036A (en) * 2016-03-30 2016-08-24 清华大学 Compatible accelerator extension method for embedded system
CN106445876A (en) * 2015-08-13 2017-02-22 阿尔特拉公司 Application-based dynamic heterogeneous many-core systems and methods

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986272A (en) * 2010-11-05 2011-03-16 北京大学 Task scheduling method under cloud computing environment
CN102375801A (en) * 2011-08-23 2012-03-14 孙瑞琛 Multi-core processor storage system device and method
US20150355949A1 (en) * 2011-12-13 2015-12-10 International Business Machines Corporation Dynamically configurable hardware queues for dispatching jobs to a plurality of hardware acceleration engines
CN106445876A (en) * 2015-08-13 2017-02-22 阿尔特拉公司 Application-based dynamic heterogeneous many-core systems and methods
CN105893036A (en) * 2016-03-30 2016-08-24 清华大学 Compatible accelerator extension method for embedded system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991369A (en) * 2019-12-09 2020-04-10 Oppo广东移动通信有限公司 Image data processing method and related device
CN111143078A (en) * 2019-12-31 2020-05-12 深圳云天励飞技术有限公司 Data processing method and device and computer readable storage medium
CN111143078B (en) * 2019-12-31 2023-05-12 深圳云天励飞技术有限公司 Data processing method, device and computer readable storage medium
CN112887093A (en) * 2021-03-30 2021-06-01 矩阵元技术(深圳)有限公司 Hardware acceleration system and method for implementing cryptographic algorithms

Similar Documents

Publication Publication Date Title
CN104965757B (en) Method, virtual machine (vm) migration managing device and the system of live migration of virtual machine
CN105808334B (en) A kind of short optimization of job system and method for MapReduce based on resource reuse
CN105045658B (en) A method of realizing that dynamic task scheduling is distributed using multinuclear DSP embedded
CN113377540A (en) Cluster resource scheduling method and device, electronic equipment and storage medium
CN1983196B (en) System and method for grouping execution threads
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
EP1783604A2 (en) Object-oriented, parallel language, method of programming and multi-processor computer
CN105893126A (en) Task scheduling method and device
CN110308982B (en) Shared memory multiplexing method and device
CN110321204A (en) Computing system, hardware accelerator management method and device and storage medium
US10866832B2 (en) Workflow scheduling system, workflow scheduling method, and electronic apparatus
CN108108239A (en) A kind of providing method of business function, device and computer readable storage medium
CN110113408A (en) A kind of block synchronous method, equipment and storage medium
CN114237918B (en) Graph execution method and device for neural network model calculation
CN113434284B (en) Privacy computation server side equipment, system and task scheduling method
CN111506430A (en) Method and device for data processing under multitasking and electronic equipment
JP2009238197A (en) Control circuit, control method and control program for shared memory
CN109992352A (en) Data transmission method, device, electronic equipment and read/write memory medium
CN109726008A (en) Resource allocation methods and equipment
CN109840877A (en) A kind of graphics processor and its resource regulating method, device
CN103870335B (en) System and method for efficient resource management of signal flow programmed digital signal processor code
CN108156208A (en) A kind of dissemination method of application data, device and system
CN111338769A (en) Data processing method and device and computer readable storage medium
CN116302453B (en) Task scheduling method and device for quantum electronic hybrid platform
CN111310638B (en) Data processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20191012

Address after: 2100 San Jose Rojack Avenue, California, USA

Applicant after: XILINX, Inc.

Address before: 100083, 17 floor, four building four, 1 Wang Zhuang Road, Haidian District, Beijing.

Applicant before: BEIJING DEEPHI INTELLIGENT TECHNOLOGY Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191011