WO2019036901A1 - 一种加速处理方法及设备 - Google Patents

一种加速处理方法及设备 Download PDF

Info

Publication number
WO2019036901A1
WO2019036901A1 PCT/CN2017/098481 CN2017098481W WO2019036901A1 WO 2019036901 A1 WO2019036901 A1 WO 2019036901A1 CN 2017098481 W CN2017098481 W CN 2017098481W WO 2019036901 A1 WO2019036901 A1 WO 2019036901A1
Authority
WO
WIPO (PCT)
Prior art keywords
acceleration
application
resource
processing device
accelerated
Prior art date
Application number
PCT/CN2017/098481
Other languages
English (en)
French (fr)
Inventor
雷丛华
乐伟军
石磊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2017/098481 priority Critical patent/WO2019036901A1/zh
Priority to EP17922593.3A priority patent/EP3663912A4/en
Priority to CN201780049782.XA priority patent/CN109729731B/zh
Publication of WO2019036901A1 publication Critical patent/WO2019036901A1/zh
Priority to US16/798,931 priority patent/US11461148B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural
    • G06F9/4484Executing subprograms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/28Enhancement of operational speed, e.g. by using several microcontrol devices operating in parallel
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4405Initialisation of multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/34Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Definitions

  • the present application relates to the field of communications technologies, and in particular, to an accelerated processing method and device.
  • a network function virtualization (NFV) system can use a field programmable gate array (FPGA) as a hardware accelerator to transfer software execution functions to the FPGA to improve system performance.
  • the network function virtualization infrastructure (NFVI) in the NFV system can abstract the FPGA into a set of acceleration functions (or acceleration capabilities) and provide application interfaces to virtualized network functions (VNFs) or hosts. (application programming interface, API) to call this acceleration function.
  • VNFs virtualized network functions
  • API application programming interface
  • An FPGA is an integrated circuit that can be configured. Different numbers or types of FPGAs need to be configured in the NFV system to provide various acceleration functions.
  • PR partial reconfiguration
  • PR technology divides the area within the FPGA and sets up multiple areas, allowing the area to be reconfigured to meet new requirements without affecting unreconfigured areas.
  • the number of internal divisions of the FPGA is limited. After the area is divided, it can only be allocated by area. The area size cannot be dynamically adjusted according to actual needs.
  • the application provides an accelerated processing method and device, which helps to improve the utilization rate of the accelerated resources.
  • an accelerated processing method In a first aspect, an accelerated processing method is provided.
  • the acceleration processing device combines the first acceleration application and the second acceleration application to obtain a first combined application, and burns the first combined application onto the first acceleration resource.
  • the first acceleration application, the second acceleration application, and the first combination application may be HDL codes, and the first acceleration resource may be an area of the FPGA or the FPGA.
  • the first combined application includes a top-level module, a first accelerated application, and a second accelerated application, the top-level module includes a statement for calling the first accelerated application and a statement for calling the second accelerated application, thereby the first combination
  • the first acceleration resource can execute the call request for the first accelerated application and the call request for the second accelerated application.
  • the acceleration processing device after the acceleration processing device combines the acceleration application to obtain the combined application, the combined application is burned to the acceleration resource, and the utilization rate of the accelerated resource is improved relative to the manner in which only the single acceleration application is burned to the acceleration resource.
  • the top-level module includes a first port and a second port, and the first port and the second port are respectively mapped to the port of the first acceleration application and the port of the second acceleration application. Therefore, the top layer module can respectively connect the first acceleration application and the second acceleration application for signal transmission, and the signals of the first port and the second port do not affect each other, which is convenient to implement.
  • the top-level module includes a first port that is mapped to both the port of the first acceleration application and the port of the second acceleration application.
  • one port of the top layer module can simultaneously connect the ports of the first acceleration application and the second acceleration application, thereby performing signal transmission in a bus manner.
  • This port mapping method helps maintain the original port connection relationship when adding new accelerated applications, and is easy to upgrade.
  • the second acceleration application has been burned to the second acceleration resource, and the acceleration processing device burns the first combination application to the first
  • the second acceleration application is migrated from the second acceleration resource to the first acceleration resource, and the triggering instruction may be sent to the first acceleration resource to trigger the first acceleration resource to execute the second acceleration application instead of The second acceleration resource sends a trigger instruction to trigger the second acceleration resource to execute the second acceleration application.
  • the accelerated application that has been burned to the accelerated resource is migrated to the new accelerated resource, and the accelerated application is combined with other accelerated applications in the new accelerated resource, which helps to obtain higher utilization rate of the existing accelerated resource. . This helps to meet more needs with fewer FPGAs in NFV systems.
  • the acceleration processing device burns the first combined application to the first acceleration resource
  • the second acceleration application has been burned onto the first acceleration resource.
  • the acceleration processing device burns the first combination application to the first acceleration resource, and when the call request for the second acceleration application occurs, the acceleration processing device 200 may perform the second acceleration application instead of the first acceleration resource, thereby It can respond to the call request of the accelerated application on the accelerated resource being burned in the burning process in time.
  • the acceleration processing device burns the first combination application to the first acceleration resource, determining that the utilization ratio of the first combination application by using the first acceleration resource is higher than the utilization rate of using the third acceleration resource. Therefore, when there are a plurality of accelerating resources that can be burned, the selection may be performed according to the utilization ratio of the accelerating resources according to the combined application, which helps to improve the utilization rate of the accelerating resources.
  • the acceleration processing device combines the first acceleration application and the second acceleration application to obtain a plurality of acceleration applications including the first acceleration application and the second acceleration application, and according to an advantage distribution scheme among the multiple distribution schemes. And burning the first combined application to the first acceleration resource.
  • Each of the multiple allocation schemes is a correspondence between multiple acceleration resources and multiple acceleration applications, where the multiple acceleration resources include a first acceleration resource, and the advantage allocation scheme includes a first acceleration resource and a first acceleration. The correspondence between the application and the second accelerated application.
  • the acceleration processing device can select the advantageous allocation scheme to complete the burning according to different selection strategies, thereby providing a way to diversify and accelerate the utilization of the accelerated resources.
  • the number of acceleration resources ie, acceleration resources corresponding to at least one accelerated application used in the advantage allocation scheme is the least. This can save more acceleration resources to meet the subsequent demand for accelerated resources.
  • the sum of the utilization rates of the accelerated resources used in the advantageous allocation scheme is the largest. This results in higher utilization of the overall acceleration resources, including all the accelerated resources used in the distribution plan.
  • the acceleration processing device executes computer program instructions to implement the accelerated processing method provided by the first aspect, the computer program instructions being usable to implement the NFVI function.
  • the function of performing the accelerated processing method provided by the first aspect can be added to the NFVI of the existing NFV system, and the function of the existing NFV system can be expanded.
  • an accelerated processing device in a second aspect, includes means for performing the accelerated processing method provided by the first aspect.
  • an accelerated processing device in a third aspect, includes a memory and a processor, wherein the processor reads the computer program instructions stored in the memory and performs the accelerated processing method provided by the first aspect.
  • the acceleration processing device provided by the third aspect includes the first acceleration resource described in the acceleration processing method provided by the first aspect.
  • an acceleration processing system comprising the acceleration processing device provided by the second aspect or the third aspect, and the first acceleration resource described in the acceleration processing method provided by the first aspect.
  • a computer storage medium comprising computer program instructions, when the computer program instructions are run on an accelerated processing device, causes the acceleration processing device to perform the accelerated processing method provided by the first aspect.
  • a computer program product comprising computer program instructions for causing an acceleration processing device to perform an accelerated processing method provided by the first aspect when the computer program instructions are run on an accelerated processing device.
  • FIG. 1 is a schematic diagram of an NFV system architecture according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of applying an acceleration processing device 200 in the NFV system shown in FIG. 1 according to an embodiment of the present application;
  • FIG. 3 is a schematic diagram of another application of the acceleration processing device 200 in the NFV system shown in FIG. 1 according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of an acceleration processing method 400 according to an embodiment of the present application.
  • FIG. 5 is an example of a top-level module described in a Verilog language according to an embodiment of the present application.
  • FIG. 6 is another example of a top-level module described in the Verilog language according to an embodiment of the present application.
  • FIG. 7A to FIG. 7F are schematic diagrams showing a mapping relationship between a port of a top-level module and a port of a first acceleration application and a port of a second acceleration application in a first combined application according to an embodiment of the present disclosure
  • FIG. 8 is an example of an inter-area migration of an accelerated application in an FPGA according to an embodiment of the present disclosure
  • FIG. 9 is an example of an application for accelerating migration between FPGAs according to an embodiment of the present application.
  • FIG. 10 is an example of utilizing the utilization rate of each type of hardware resource in an accelerated resource by using a combined application according to an embodiment of the present disclosure
  • FIG. 11 is an example of a matching establishment packet based on an acceleration resource and an acceleration application according to an embodiment of the present application
  • FIG. 12 is an example of the utilization of the combined application and the accelerated application utilization acceleration resource provided by the embodiment of the present application.
  • FIG. 13 is a schematic diagram of an acceleration processing device 200 applied in the system shown in FIG. 1 according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an NFV system architecture according to an embodiment of the present application.
  • the NFVI in the NFV system abstracts computing hardware, storage hardware, and acceleration hardware to obtain a set of computing, storage, and acceleration capabilities, and provides APIs that call these computing, storage, and acceleration capabilities to VNF101, VNF102, and VNF103 to provide Various services, such as computing services, storage services, and acceleration services.
  • the acceleration processing system includes an acceleration processing device 200 and an FPGA chip.
  • the accelerated processing device 200 executes computer program instructions to implement the functions of the NFVI in the NFV system shown in FIG.
  • the acceleration processing device 200 includes a processor 201 and a memory 202.
  • the processor 201 and the memory 202 can be connected by a bus or directly.
  • the memory 202 stores the computer program instructions, and the processor 201 reads the computer program instructions stored in the memory 202 to implement various operations of the acceleration processing device 200.
  • the acceleration processing device 200 can be directly connected or connected to one or more FPGA chips through a bus.
  • the FPGA chip can use the PR technology to divide multiple regions or not.
  • FPGA 203 is An FPGA chip including a plurality of regions (for example, region 205 and region 206), and an FPGA 204 is an FPGA chip that does not perform region division.
  • the FPGA chip may also be included in the acceleration processing device 200.
  • the acceleration processing device 200 can obtain an application (hereinafter referred to as an acceleration application) that requires hardware acceleration from the memory.
  • the accelerated application may be a hardware description language (HDL) code that describes a logic function that requires hardware acceleration, which is used to respond to a call to an accelerated capability API.
  • the acceleration processing device 200 can burn the HDL code onto the FPGA chip, and then use the FPGA chip to perform the required logic functions when receiving the call to the acceleration capability API.
  • HDL hardware description language
  • the programming process may include: generating a netlist for the HDL code, performing layout and routing according to the verification method such as simulation by the netlist, generating a binary file, and transferring the binary file to the FPGA chip.
  • the programming can be an area in which the HDL code is burned to the FPGA chip.
  • the programming may be to burn the HDL code to the entire FPGA chip.
  • the accelerated processing device 200 can also convert code of a language other than HDL, such as a c language code (pre-stored or obtained from other devices), to obtain an accelerated application.
  • the acceleration processing device 200 may further include a network interface 207, and receive an acceleration application sent by another network device through the network interface 207, and burn the acceleration application onto the FPGA chip.
  • the acceleration processing device 200 can also receive the code of the language other than the HDL sent by the other device through the network interface 207, convert the code of the language other than the HDL into the accelerated application, and then burn the accelerated application to the FPGA chip.
  • FIG. 3 is a schematic diagram of an acceleration processing system for applying the acceleration processing device 200 in the system shown in FIG. 1 according to an embodiment of the present application.
  • the acceleration processing system includes an acceleration processing device 200 FPGA chip.
  • the acceleration processing device 200 in FIG. 3 has the same structure as the acceleration processing device 200 including the network interface 207 in FIG.
  • the acceleration processing device 200 can connect one or more acceleration devices through the network interface 207, wherein each acceleration device can include a network interface, a processor, and one or more FPGA chips, which can be multiple regions as shown in FIG.
  • the FPGA chip is either an FPGA chip that does not divide the area.
  • the acceleration device 301 includes an FPGA chip FPGA 305.
  • the acceleration device 302 includes two FPGA chips FPGA 308 and FPGA 309.
  • the acceleration processing device 200 can obtain the acceleration application locally, or receive the acceleration application sent by other devices via the network interface 207, or receive the language code other than the HDL sent by other devices via the network interface 207. Convert the language code other than the HDL to an accelerated application.
  • the process of accelerating the processing device 200 to burn the acceleration application onto the FPGA chip of the acceleration device 301 may include: the acceleration processing device 200 generates a binary file according to the acceleration application, and transmits the binary file to the acceleration device 301, where the acceleration device 301 is at the processor 304 The binary file is received via the network interface 303 under control, and the binary file is transferred to the FPGA 305 (when the FPGA 305 includes multiple regions, it is burned into an area of the FPGA 305).
  • the process of accelerating the processing device 200 to burn the acceleration application onto the FPGA chip of the acceleration device 302 is similar to the process of burning onto the FPGA chip of the acceleration device 301, and details are not described herein again.
  • Processors 201, 304, and 307 include, but are not limited to, a central processing unit (CPU), a network processor (NP), an application-specific integrated circuit (ASIC), or a programmable logic device ( One or more of programmable logic devices (PLDs).
  • the PLD may be a complex programmable logic device (CPLD), a field programmable gate array (FPGA), a general array logic (GAL), or any combination thereof.
  • the memory 202 may include a volatile memory such as a random-access memory (RAM).
  • the memory 202 may also include a non-volatile memory such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD) or a solid state hard disk ( Solid-state drive, SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state hard disk
  • Memory 202 can also include a combination of the above types of memory.
  • the memory 202 can be integrated into the processor 201 as an internal component of the processor 201.
  • Network interfaces 207, 303, and 306 can be wired communication interfaces, wireless communication interfaces, or a combination thereof.
  • the wired communication interface is, for example, an Ethernet interface, an asynchronous transfer mode (ATM) interface, or a synchronous digital hierarchy (SDH)/synchronous optical networking (SONET) packet package (packet over SONET). /SDH, POS) interface.
  • the wireless communication interface is, for example, a wireless local area network (WLAN) interface, a cellular network communication interface, or a combination thereof.
  • FIG. 4 is a flowchart of an acceleration processing method 400 according to an embodiment of the present application.
  • the accelerated processing method 400 can be performed by the acceleration processing device 200 of FIGS. 2 and 3.
  • the acceleration processing device combines the first acceleration application and the second acceleration application to obtain a first combined application.
  • the first combined application may be an HDL code, the logic function of the HDL code including a first acceleration application and a second acceleration application description logic function, the HDL code including code in the first acceleration application and code in the second acceleration application .
  • the first combined application may include a top layer module, a first acceleration application, and a second acceleration application.
  • the top level module can include statements for invoking the first acceleration application and for invoking the second acceleration application.
  • the statement for invoking the first acceleration application and for invoking the second acceleration application may be a statement of instantiating the first acceleration application (ie, establishing an instance of the first acceleration application) and instantiating the second acceleration application (ie, establishing the second A statement that speeds up the instance of the application.
  • the hardware circuit ie, the hardware circuit corresponding to the top layer module for implementing the logic function described by the top layer module on the FPGA chip and the logic function for implementing the first accelerated application description
  • the hardware circuit ie, the hardware circuit corresponding to the first acceleration application
  • the hardware circuit ie, the hardware circuit corresponding to the second acceleration application for implementing the logic function of the second acceleration application are connected.
  • the top module, the first acceleration application, and the second acceleration application each include a port list, and the port list includes one or more ports.
  • the port of the top-level module is the port of the first combined application for external communication.
  • the port of the top-level module is mapped to the designated pin of the FPGA chip (ie, the port connection of the hardware circuit corresponding to the top-level module) The designated pin), by which the hardware circuit corresponding to the top layer module can communicate with the outside of the FPGA chip via the designated pin.
  • the top-level module may map the port of the top-level module to the port of the first acceleration application and the second acceleration application in the statement for calling the first acceleration application and calling the second acceleration application (ie, connecting the port of the top-level module with the first acceleration)
  • the port of the application and the second acceleration application whereby when the first combined application is burned to the FPGA chip, the port of the corresponding hardware circuit of the top layer module is also connected to the hardware circuit corresponding to the first acceleration application and the second acceleration application.
  • the port enables the FPGA chip to receive an input value from the external bus using a hardware circuit corresponding to the top layer module, and transmits the input value to the hardware circuit corresponding to the first acceleration application and the second acceleration application for calculation, receiving the first acceleration application and the first 2. Accelerating the calculation result returned by the corresponding hardware circuit, and returning the calculation result to the external bus.
  • the top-level module can simultaneously map one port to the port of the first accelerated application and the port of the second accelerated application.
  • the top level module includes a first port that maps to a port of the first acceleration application and a port of the second acceleration application.
  • the top-level module may also map different ports to the port of the first acceleration application and the port of the second acceleration application, for example, the top-level module includes a first port and a second port, and the first port is mapped to the port of the first acceleration application, The two ports are mapped to the ports of the second accelerated application.
  • the port of the top layer module can be connected to the ports of the first acceleration application and the second acceleration application for signal transmission.
  • FIG. 5 shows an example of a top-level module in the Verilog language.
  • the top-level module 500 is named top, and the port list 501 of the top-level module 500 includes the ports aap1_in, port aap1_out, port aap2_in, and port aap2_out.
  • the top level module 500 includes a statement 502 for establishing an instance of the accelerated application aap1 (named aap1_instance) and a statement 503 for establishing an instance of the accelerated application aap2 (named aap2_instance).
  • a.aap1_in(aap1_in) is used to establish a port aap1_in of the top-level module 500 to accelerate the mapping of the port aap1_in of the application aap1
  • a.aap1_out(aap1_out) is used to establish the port aap1_out of the top-level module 500 to the port aap1_out of the acceleration application aap1.
  • a.aap2_in(aap2_in) is used to establish a port aap2_in of the top-level module 500 to accelerate the mapping of the port aap2_in of the application aap2, and "a.aap2_out(aap2_out)” is used to establish the port aap2_out of the top-level module 500 to the port aap2_out of the acceleration application aap2. Mapping.
  • the top-level module may also establish a mapping of the designated port of the intermediate module to the designated port of the first acceleration application and the second acceleration application in the statement of creating the instance of the first acceleration application and the second acceleration application, and creating the intermediate module
  • the mapping of the designated port of the top-level module to the designated port of the intermediate module and the mapping of the designated port of the first accelerated application and the second accelerated application to the designated port of the intermediate module are established.
  • the designated port of the top module communicates with the designated port of the first acceleration application and the second acceleration application via the intermediate module, and the intermediate module can perform scheduling management on the signal transmission between the top module and the first acceleration application and the second acceleration application.
  • the intermediate module arbitrates the first signal sent by the top module to the first acceleration application and the second signal sent to the second acceleration application to determine whether to transmit the first signal or the second signal preferentially.
  • FIG. 6 shows another example of a top-level module in the Verilog language.
  • the top-level module 600 is named top, and the port list 601 of the top-level module 600 includes port top1_in, port top1_out, port top2_in, and port top2_out.
  • the top level module 600 includes a statement 602 for establishing an instance of the accelerated application aap1 (named aap1_instance), a statement 603 for accelerating the instance of the application aap2 (named aap2_instance), and a statement (named aap_mid_instance) 604 for establishing an instance of the intermediate module aap_mid.
  • “.top2_in(top2_in)” establishes the mapping of the port top2_in of the top-level module to the topa_in of the intermediate module aap_mid port, and uses “.top2_out(top2_out)” to establish the mapping of the port top2_out of the top-level module to the top_out_out of the intermediate module aap_mid.
  • a.aap1_in(aap1_in) is used to establish a mapping of the port aap1_in of the application aap1 to the port aap1_in of the intermediate module aap_mid, and the port aap1_out for accelerating the application aap1 to the port aap1_out of the intermediate module aap_mid is established by ".aap1_out(aap1_out)".
  • Map use ".aap2_in(aap2_in)" to establish the mapping of the port aap2_in of the application aap2 to the port aap2_in of the intermediate module aap_mid, and use ".aap2_out(aap2_out)" to establish the port aap2_out for accelerating the application aap2 to the intermediate module aap_mid
  • the mapping of the port aap2_out use ".aap2_in(aap2_in)” to establish the mapping of the port aap2_in of the application aap2 to the port aap2_in of the intermediate module aap_mid.
  • top-level module mapping to a port of a first acceleration application and a port of a second acceleration application through different ports is shown in FIGS. 5 and 6.
  • the top-level module can also be mapped to the port of the first acceleration application and the port of the second acceleration application by using the same port, and details are not described herein again.
  • FIG. 7A to FIG. 7F are schematic diagrams showing the mapping relationship between the port of the top-level module and the port of the first acceleration application and the port of the second acceleration application in the first combined application according to the embodiment of the present application.
  • the first acceleration application 701 includes two ports
  • the second acceleration application 702 includes two ports.
  • the first acceleration application 701 corresponds to two ports of the hardware circuit and the second acceleration.
  • the two ports of the application 702 corresponding to the hardware circuit can be connected through the internal bus 711 of the FPGA, and the dotted arrows show the mapping relationship between the ports.
  • different ports of the top level module 703 are mapped to ports of the first acceleration application 701 and ports of the second acceleration application 702, respectively.
  • the input values received by the different ports of the top layer module 703 may be respectively transmitted to the ports of the first acceleration application 701 and the second acceleration application 702, and the output values sent by the first acceleration application 701 and the second acceleration application 702 are respectively transmitted to the top layer module 703. Different ports.
  • different ports of the top level module 703 are mapped to different ports of the intermediate module 704, and different ports of the intermediate module 704 are mapped to the ports of the first acceleration application 701 and the second acceleration application 702, respectively. port.
  • the same port of the top level module 703 is mapped to the port of the first acceleration application 701 and the port of the second acceleration application 702.
  • the input value received by the port of the top layer module 703 can be simultaneously transmitted to the ports of the first acceleration application 701 and the second acceleration application 702, and the first acceleration application 701 and the second acceleration application 702 can input values to determine whether the input value is sent. For itself, if it is sent to itself, it can receive the input value, calculate according to the input value and send the calculated result to the port of the top module 703.
  • the first acceleration application 701 and the second acceleration application 702 are logically related, and follow the same port standard, for example, the first acceleration application 701 is used for advanced encryption standard (AES) encryption, and the second acceleration application is used. 702 is used for data encryption standard (DES) encryption, and both can use the same input and output ports.
  • AES advanced encryption standard
  • DES data encryption standard
  • the ports of the top level module 703 are mapped to the ports of the intermediate module 704, and the same ports of the intermediate module 704 are mapped to the ports of the first acceleration application 701 and the ports of the second acceleration application 702.
  • the input values received by the ports of the top level module 703 can be simultaneously transmitted to the ports of the first acceleration application 701 and the second acceleration application 702 via the ports of the intermediate module 704.
  • the processing of the input values by the first acceleration application 701 and the second acceleration application 702 can be referred to the example in FIG. 7C, and details are not described herein again.
  • the two ports of the top layer module 703 are respectively mapped to one port of the first acceleration application 701 and one port of the second acceleration application 702, and one port of the top layer module 703 is simultaneously Reflect One port of the first acceleration application 701 and one port of the second acceleration application 702 are incident.
  • the processing of the input values by the first acceleration application 701 and the second acceleration application 702 can be referred to the example in FIG. 7A to FIG. 7D, and details are not described herein again.
  • a portion of the same port of the top-level module is mapped to the port of the first acceleration application and the port of the second acceleration application via the intermediate module, and the port and the second acceleration of the portion of the different ports mapped to the first acceleration application via the intermediate module
  • the port of the application As shown in FIG. 7F, the three ports of the top layer module 703 are mapped to the three ports of the intermediate module 704, and the other two ports of the intermediate module 704 are mapped to the one port of the first acceleration application 701 and the second acceleration application 702, respectively.
  • One port, the other port of the intermediate module 704 is simultaneously mapped to one port of the first acceleration application 701 and one port of the second acceleration application 702.
  • the processing of the input values by the first acceleration application 701 and the second acceleration application 702 can be referred to the example in FIG. 7A to FIG. 7E, and details are not described herein again.
  • the acceleration processing device 200 can also combine three or more acceleration applications to obtain a combined application, for example, the top module can map to the ports of the first acceleration application, the second acceleration application, and the third acceleration application with the same port, or use different The port is mapped to the port of the first acceleration application, the second acceleration application, and the third acceleration application, or is mapped to the ports of the first acceleration application, the second acceleration application, and the third acceleration application by using the same port and is mapped by a part of different ports. Ports to the first acceleration application, the second acceleration application, and the third acceleration application.
  • the top module can map to the ports of the first acceleration application, the second acceleration application, and the third acceleration application with the same port, or use different The port is mapped to the port of the first acceleration application, the second acceleration application, and the third acceleration application, or is mapped to the ports of the first acceleration application, the second acceleration application, and the third acceleration application by using the same port and is mapped by a part of different ports. Ports to the first acceleration application, the second acceleration application, and the third
  • the acceleration processing device burns the first combined application to the first acceleration resource.
  • the acceleration resource is an area on the FPGA chip (when the FPGA chip includes multiple regions) or the entire FPGA chip (when the FPGA chip is not partitioned).
  • the FPGA chip can be the FPGA chip of FIG. 2 or FIG. 3, such as FPGA 203, FPGA 204, FPGA 305, FPGA 308, or FPGA 309.
  • the first acceleration resource may be one of the above FPGA chips or an area on one of the above FPGA chips.
  • the programming process of the acceleration processing device 200 to burn the first combination application to the first acceleration resource may be described in FIG. 2 or FIG. 3 to burn the HDL code to the programming process on the FPGA chip, and details are not described herein again.
  • the acceleration processing method 400 combines the first acceleration application and the second acceleration application to obtain the first combined application and burns to the first acceleration resource, which can be improved relative to the method of burning an acceleration application onto an acceleration resource. Accelerate the utilization of resources.
  • the first acceleration resource may be used to execute the first acceleration application and the second acceleration application.
  • NFVI provides accelerated services to virtual network functions VNF101, VNF102, and VNF103 by providing APIs. Any of VNF 101, VNF 102, and VNF 103 may be implemented by processor 201 executing computer program instructions in memory 202 in acceleration processing device 200, or by other devices in the network.
  • the VNF 101 can send a call request with a different API name or parameter to invoke an accelerated application with different acceleration capabilities.
  • the following shows that the VNF 101 needs to use the acceleration service to send a call request to the NFVI for the first accelerated application, and the NFVI uses an example of burning the FPGA chip of the first combined application to respond.
  • VNF 101 is implemented by processor 201 executing computer program instructions.
  • the acceleration processing device 200 obtains the call request in such a manner that the processor 201 executes the computer program instructions to implement the NFVI to receive the call request sent by the VNF 101.
  • the processor 201 sends a trigger instruction to the first acceleration resource through the bus in FIG. 2, and the trigger instruction is transmitted to the pin of the first acceleration resource.
  • the pin of the first acceleration resource is the pin of the FPGA chip
  • the pin of the first accelerating resource is the pin for the area on the FPGA chip.
  • the triggering instruction may include one or more input values, which may be used to trigger the first acceleration resource to execute the first acceleration application.
  • the designated port of the top layer module of the first combined application that is programmed to the first acceleration resource may be mapped to a designated pin of the first acceleration resource, and the input value in the trigger instruction transmitted to the designated pin of the first acceleration resource is transmitted to The designated port of the top-level module.
  • the first acceleration resource may execute the first acceleration application to perform calculation according to the input value in the trigger instruction and transmit the calculation result to the acceleration processing device 200.
  • the VNF 101 is implemented by other devices in the network.
  • the acceleration processing device 200 obtains the call request in such a manner that the processor 201 executes the computer program instruction to implement the NFVI function, receives the call request sent by the other device through the network interface 207 shown in FIG. 3, and accelerates to the acceleration device 301 through the network interface 207.
  • Device 302 sends a trigger command.
  • the acceleration device 301 receives the trigger instruction through the network interface 303 under the control of the processor 304 and transmits it to the pin of the first acceleration resource via the internal bus of the acceleration device 301.
  • the acceleration device 302 receives the trigger command through the network interface 306 under the control of the processor 307 and transmits it to the pin of the first acceleration resource via the internal bus of the acceleration device 302.
  • the first acceleration resource may execute the first acceleration application to perform calculation according to the input value in the trigger instruction and transmit the calculation result to the acceleration processing device 200.
  • the manner in which the VNF 101 sends a call request to the NFVI to the second acceleration application is similar to the processing method of sending the call request to the NFVI to the first acceleration application, and details are not described herein again.
  • the acceleration application may be obtained by one or more of the following methods: obtaining an acceleration application from a local memory, and converting a code of a language other than the HDL to generate an acceleration application.
  • the code that receives the accelerated application from other devices and receives languages other than HDL from other devices is converted to generate an accelerated application.
  • the accelerated application obtained by the accelerated processing device 200 from the local storage may be an accelerated application that has been previously acquired by the processing device 200 and saved in the memory 202.
  • the second acceleration application is an acceleration application previously obtained and saved in the memory 202
  • the second acceleration application may be an acceleration application that has been burned to the second acceleration resource.
  • the second acceleration resource can be an area on the FPGA chip or the entire FPGA chip.
  • the acceleration processing device 200 saves to the memory 202 after obtaining the second acceleration application, and burns the second acceleration application to the second acceleration resource.
  • the second acceleration resource is an area on the FPGA chip (when the FPGA chip includes multiple regions) or an FPGA chip (when the FPGA chip is not divided into regions).
  • the accelerated application can migrate between regions of the FPGA.
  • FPGA 800 includes an area 801 that is a first acceleration resource and an area 802 that is a second acceleration resource.
  • the acceleration processing apparatus 200 can transmit a trigger instruction including the input value to the area 802 and obtain the returned calculation result.
  • the acceleration processing device 200 combines the second acceleration application 804 with the first acceleration application 803 to obtain the first combination application 805, and after the first combination application 805 is burned into the area 801, if a call to the second acceleration application is received
  • the request sends a trigger command only to the area 801 to trigger the area 801, that is, the first acceleration resource to execute the second acceleration application, and no longer sends a trigger instruction to the area 802, whereby the second acceleration application 804 is migrated from the area 802 to the area. 801.
  • an accelerated application can be migrated between FPGAs.
  • FPGA 901 is the first acceleration resource
  • FPGA 902 is the second acceleration resource.
  • the second accelerated application 904 has been burned to the FPGA 902 by the accelerated processing device 200 or other device, and the processing device is accelerated.
  • the 200 can send a trigger instruction including the input value to the FPGA 902 and obtain the returned calculation result.
  • the acceleration processing device 200 combines the second acceleration application 904 with the first acceleration application 903 to obtain the first combination application 905, and after the first combination application 905 is burned into the FPGA 901, if a call request for the second acceleration application is received Then, only the trigger instruction is sent to the FPGA 901 to trigger the FPGA 901, that is, the first acceleration resource to execute the second acceleration application, and no longer send the trigger instruction to the FPGA 902, whereby the second acceleration application 904 is migrated from the FPGA 902 to the FPGA 901.
  • the acceleration processing device 200 migrates the accelerated application that has been burned to the acceleration resource to the new acceleration resource when the acceleration application is combined, and the acceleration application is combined with other acceleration applications in the new acceleration resource, which helps To obtain higher utilization of existing accelerated resources. This helps to meet more needs with fewer FPGAs in NFV systems.
  • the second acceleration application may be an acceleration application that has been burned to the first acceleration resource by the acceleration processing device 200 or other device.
  • the acceleration processing method 400 may perform the second acceleration application instead of the first acceleration resource when the first combination application is programmed to the first acceleration resource.
  • the originally burned content on the first acceleration resource is replaced (ie, reconfigured) by the first combined application, so that the first acceleration resource can perform the calling request for the first accelerated application or the first 2.
  • Accelerating the call request of the application, and at this time, the call request for the second accelerated application by the first acceleration resource may be resumed, wherein the acceleration processing device 200 may execute the computer program instruction stored in the memory 202 by the processor 201 to perform the second acceleration.
  • the application converts to computer program instructions executable by processor 201 to execute the second accelerated application.
  • the accelerated processing method 400 can respond to the call request for the accelerated application that occurs during the burning process in time.
  • the utilization of the accelerating resources can be selected according to the combined application, which helps to improve the utilization rate of the accelerating resources. For example, when there is a first acceleration resource and a third acceleration resource, determining that the utilization ratio of the first combination application by using the first acceleration resource is higher than the utilization rate of using the third acceleration resource, the first combination application is burned to the first acceleration. Resources.
  • Accelerated resources can include multiple kinds of hardware resources, such as registers, lookup tables (LUTs), random access memory (RAM), and input and output ports.
  • Figure 10 shows an example of a utilization of utilization of various kinds of hardware resources in a combined application utilizing an accelerated resource. As shown in Figure 10, the combined application will use a total of 13.89% of the registers, 60.98% of the LUT, 75.56% of the RAM, and 12% of the input and output ports after burning the acceleration resources.
  • the utilization of the combined application utilizing the accelerated resources may be determined based on the utilization of the hardware resources of the various classes in the accelerated resource according to the combined application.
  • the utilization of the accelerated application by the combined application may be the utilization of the LUT in the combined application utilization acceleration resource.
  • the combined application utilizes the utilization rate of the LUT in the first acceleration resource to be greater than the utilization rate of the LUT in the second acceleration resource, it is determined.
  • the utilization of the first accelerated resource by the combined application is greater than the utilization of the second accelerated resource.
  • the utilization of the accelerated application by the combined application may be the sum of the utilization of the LUT in the accelerated application and the utilization of the utilization RAM, when the combined application utilizes the utilization of the LUT in the first accelerated resource and utilizes the first accelerated resource.
  • the utilization of the combined application utilizing the accelerated resources can be calculated using the following formula:
  • U is the utilization rate of the accelerated resource by the combined application
  • n is the number of types of hardware resources in the accelerated resource
  • a i is the number of hardware resources of the i-th kind in the combined application using the accelerated resource
  • B i is The total number of hardware resources of the i-th kind in the acceleration resource
  • x i is a weight coefficient of the hardware resource of the i-th kind in the acceleration resource.
  • a i /B i is the utilization rate of the i-th kind of hardware resources in the combined application.
  • the acceleration processing method 400 when the acceleration resource is an area of the FPGA chip, since the FPGA chip using the PR technology includes some common hardware resources for sharing with all the areas, the actual use amount of each type of hardware resources can be used at this time.
  • the utilization of each type of hardware resource is obtained by the actual total number of such hardware resources.
  • the actual usage quantity includes the number of uses of the hardware resources in the accelerated resource and the number of such hardware resources in the common hardware resource.
  • the actual total includes the total number of such hardware resources in the accelerated resource and the total number of such hardware resources in the common hardware resource. This makes the calculation of utilization more accurate.
  • the accelerated processing device 200 can execute computer program instructions to implement the functions of the NFVI in the NFV system of FIG.
  • the NFVI functions may include accelerating resource discovery (eg, discovering new FPGA chips), accelerating resource registration (eg, recording information on newly discovered FPGA chips), accelerating resource state collection (eg, recording usage information of FPGA chips, thereby obtaining Which FPGA or FPGA regions are used, which FPGA or FPGA regions are idle) and accelerated resource configurations (such as FPGA chip burn).
  • the functions of the NFVI may include a combined application management function for performing an accelerated processing method 400, which may be accomplished by separate components in the NFVI or by multiple components.
  • the first accelerated application and the second accelerated application may be combined by the combined component to obtain the first combined application, and the configuration component for executing the accelerated resource configuration function is invoked.
  • the first composite application is burned onto the area of the FPGA chip or FPGA chip.
  • the acceleration processing device 200 can obtain information of all the acceleration resources in the NFV system, and the information may include usage information for recording whether the acceleration resource has been used.
  • the acquisition of the information of the accelerated resource can be done by the NFVI.
  • the acceleration processing device 200 may first determine whether the unused acceleration resources in the NFV system are sufficient for burning the One or more new acceleration applications, if so, the accelerated application that has been burned to the acceleration resource can be excluded when combining the accelerated applications, and if not, the accelerated application can be included in the accelerated application. Accelerated application of resources to recombine burned acceleration applications and new accelerated applications with accelerated application migration, based on recombination to accelerate the burning of applications.
  • the information of the acceleration resource saved in the NFV system can be updated, and the update can be completed by the NFVI.
  • the usage information of the area 801 is updated to be used, and the usage information of the area 802 is updated to unused.
  • the FPGA 901 is updated to be used, and the FPGA 902 is updated to be unused.
  • the accelerated resources thus updated to be unused can be reused for burning combined applications or combined applications by accelerated applications.
  • the combination and burning of accelerated applications and the migration of accelerated applications in the NFV system are transparent to the VNF, thereby improving the utilization of the accelerated resources without the VNF being aware of them.
  • the multiple acceleration applications may be arbitrarily combined to obtain one or more combined applications, and the utilization of the accelerated resources is utilized according to the combined application, and the application utilization is accelerated. Accelerate resource utilization and/or the number of accelerated resources used to determine the allocation scheme.
  • One allocation scheme is a correspondence between a plurality of acceleration resources and a plurality of acceleration applications obtained by the acceleration processing device 200. Each of the plurality of accelerated applications in an allocation scheme corresponds to one acceleration resource, and one acceleration resource may not correspond to any acceleration application, or corresponds to one acceleration application or corresponding to multiple acceleration applications.
  • the acceleration processing device can be programmed according to the allocation scheme.
  • the combined application of the multiple acceleration application combinations can be burned to the acceleration resource, when an acceleration resource corresponds to an acceleration application.
  • the acceleration application can be burned to the acceleration resource.
  • the acceleration processing device 200 may determine a plurality of allocation schemes and select one allocation scheme (ie, an advantage allocation scheme) from the plurality of allocation schemes according to different selection strategies, and perform burning according to the superior allocation scheme.
  • the acceleration processing device 200 Before the acceleration processing device 200 combines the multiple acceleration applications, all acceleration resources in the NFV system, all idle acceleration resources (ie, unused acceleration resources), or all available acceleration resources information can be obtained, and each acceleration resource is obtained. Match each accelerated application to determine which accelerated resources can meet the needs of which accelerated applications (ie which accelerated resources can match which accelerated applications).
  • the matching condition may include the port rate, the number of ports, the number of RAMs, and the number of LUTs.
  • the acceleration processing device 200 can accelerate the calculation of application combination and utilization based on the above matching to reduce the amount of calculation.
  • the acceleration processing device 200 can establish a packet based on the above matching such that the acceleration application within each group can only match the acceleration resources within the group to reduce the amount of calculation.
  • the first group includes an acceleration resource 1011, an acceleration resource 1012, an acceleration application 1001, an acceleration application 1002, and an acceleration application 1003.
  • the first set of accelerated applications 1001 and accelerated applications 1002 can match the accelerated resources 1011 and the accelerated resources 1012, and the accelerated applications 1003 can only match the accelerated resources 1011 and cannot match the accelerated resources 1012.
  • the second group includes an acceleration resource 1013, an acceleration resource 1014, an acceleration application 1004, an acceleration application 1005, an acceleration application 1006, and an acceleration application 1007.
  • the second in-group acceleration application 1004, the acceleration application 1005, and the acceleration application 1006 can match the acceleration resource 1013, and the acceleration application 1005, the acceleration application 1006, and the acceleration application 1007 can match the acceleration resource 1014.
  • the acceleration processing device 200 can perform calculations for accelerating application combination and utilization within each group to reduce the amount of calculation. In the following, the acceleration processing device 200 performs the utilization calculation in the first group and the second group respectively, and determines the allocation scheme as an example. The acceleration processing device 200 may also perform the calculation according to all the acceleration resources and all the acceleration applications. .
  • the acceleration processing device 200 may arbitrarily combine the obtained plurality of acceleration applications and specify a correspondence relationship between the plurality of acceleration applications and the respective acceleration resources. In order to improve the processing efficiency, it can be removed Matching the allocation scheme of the acceleration resource (for example, the plurality of acceleration applications in the allocation scheme correspond to the acceleration resource a, and the combined application after the combination of the multiple acceleration applications cannot match the acceleration resource a, and for example, the acceleration application b corresponds to the acceleration resource c in the allocation scheme, After the acceleration application b cannot match the acceleration resource c), the advantage allocation scheme is selected from the remaining allocation schemes.
  • the accelerated processing device 200 can set different selection strategies to select an advantageous allocation scheme from among a plurality of allocation schemes, several examples are given below.
  • the advantage allocation scheme may be an allocation scheme that uses the least number of accelerated resources.
  • the accelerated resource used refers to an acceleration resource corresponding to at least one accelerated application in the allocation scheme.
  • the priority of multiple allocation schemes from high to low can be determined in descending order of the number of accelerated resources used.
  • the allocation schemes using the same number of accelerated resources may have the same priority. Applying this selection strategy can save more acceleration resources to meet the subsequent demand for accelerated resources.
  • the advantage allocation scheme may be the distribution scheme of the sum of the utilization rates of the accelerated resources used.
  • the utilization of the acceleration resource refers to a utilization ratio of the combined application after the combination of the multiple acceleration applications.
  • the utilization rate of the acceleration resource refers to the utilization rate of the acceleration resource by the one acceleration application.
  • the priority of the plurality of allocation schemes from high to low may be determined in descending order of the sum of the utilization rates of the accelerated resources used.
  • the allocation scheme using the same sum of the utilization rates of the accelerated resources may have the same priority. Applying this selection strategy can achieve higher overall utilization of the accelerated resources.
  • the advantage allocation scheme can be the most centralized distribution scheme for accelerating applications.
  • the method for determining the most concentrated allocation scheme for accelerating the application may be: grouping the utilization rates of all the accelerated resources used in each allocation scheme, removing one utilization of the smallest value in the collection, calculating the sum of the remaining utilization rates in the collection, and remaining
  • the distribution scheme with the largest utilization ratio is the most centralized distribution scheme for accelerating applications.
  • the priority of the plurality of allocation schemes from high to low may be determined according to the order of the sum of the remaining utilization rates.
  • the allocation schemes with the same sum of remaining utilizations may have the same priority.
  • the partially accelerated resource that has been previously burned may be excluded (excluding the accelerated resource having the lowest utilization value when the allocation scheme was previously determined) ), since the portion of the previously accelerated part of the accelerated resource has obtained a higher utilization rate, in the case of eliminating the part of the accelerated resource determining allocation scheme to reduce the impact on the burned acceleration resource or reduce the amount of calculation, It is still possible to obtain a higher overall acceleration resource (including the partially accelerated resource that was previously burned and the accelerated resource that was burned when the accelerated processing method 400 is executed again).
  • the above manner of determining the priorities of the multiple allocation schemes may be arbitrarily combined to select the distribution scheme with the highest priority as the advantageous allocation scheme.
  • the allocation scheme with the least number of accelerated resources used has the highest priority
  • the allocation scheme with the larger utilization of the accelerated resources used has a higher priority.
  • accelerating the application of the more centralized allocation scheme ie, the sum of the utilization rates of the accelerated resources used by the two allocation schemes respectively removes the utilization value of the smallest value
  • the allocation scheme with a larger sum of remaining utilization can be seen in the third example of determining the priority order of multiple allocation schemes) with higher priority.
  • the first group includes an acceleration application 1001, an acceleration application 1002, and an acceleration application 1003, and the acceleration processing device 200 can combine any two and three of them to obtain a plurality of combined applications. .
  • Combined application for combining acceleration application 1001, acceleration application 1002, and acceleration application 1003 When the acceleration resource 1011 can be matched, the number of accelerated resources used at this time is 1, which is less than any other allocation scheme, and the scheme is an advantageous allocation scheme.
  • the combined application obtained by any two accelerated application combinations and the utilization rate of each accelerated application utilizing the accelerated resources are as shown in FIG.
  • the combined applications 1101, 1102, and 1103 in FIG. 12 are obtained by a combination of the acceleration application 1001 and the acceleration application 1002, the acceleration application 1001 and the acceleration application 1003, the acceleration application 1002, and the acceleration application 1003, respectively.
  • the combined application 1101 utilizes the utilization ratio of the acceleration resource 1011 (80%) and the utilization rate (20%) of the acceleration application 1003 using the acceleration resource 1012 to be 100%, and the combined application 1102 utilizes the utilization rate (70%) of the acceleration resource 1011 and The acceleration application 1002 utilizes the sum of the utilization rate (30%) of the acceleration resource 1012 to be 100%, and the combined application 1103 utilizes the utilization rate of the acceleration resource 1011 (60%) and the acceleration application 1001 utilizes the utilization rate of the acceleration resource 1012 (40%). The sum of the application 1101 utilization acceleration resource 1012 utilization rate (70%) and the acceleration application 1003 utilization acceleration resource 1011 utilization rate (25%) is 95%, and the combined application 1102 utilizes the utilization rate of the acceleration resource 1012.
  • the sum of (60%) and the utilization rate (35%) of the acceleration application 1002 utilizing the acceleration resource 1011 is 95%, and the combined application 1103 utilizes the utilization rate of the acceleration resource 1012 (50%) and the acceleration application 1001 utilizes the utilization rate of the acceleration resource 1011.
  • the sum of (45%) is 95%.
  • the scheme in which the sum of the three utilization rates is 100% can have the same priority.
  • the acceleration processing device 200 may select one of the allocation schemes of the same priority as the advantageous allocation scheme when selecting the scheme.
  • the acceleration processing device 200 may also remove the smallest one of the sets, that is, the utilization rate of the corresponding acceleration application 1003 by 20%, corresponding to the acceleration application 1002, from the utilization sets included in the three allocation schemes with the sum of the utilization rates being 100%.
  • the utilization rate is 30% and the utilization rate of the accelerated application 1001 is 40%, and the sum of the remaining utilization rates is 80%, 70%, and 60%, respectively. Therefore, the allocation scheme with the highest sum of the remaining utilization ratios of the three allocation schemes, that is, 80%, is the advantageous allocation scheme (ie, the most concentrated allocation scheme for accelerating the application), in which the combined application 1101 corresponds to the acceleration resource 1011, and the acceleration application 1003 Corresponding to the acceleration resource 1012.
  • one of the advantageous allocation schemes has the highest utilization rate of the acceleration resources, and the other has the lowest utilization rate of the acceleration resources, that is, the advantage distribution.
  • the accelerated application is more concentratedly distributed to the remaining acceleration resources except the acceleration resource with the lowest utilization rate.
  • the acceleration resource 1011 can be excluded when the acceleration resource is matched, so that the acceleration resource 1011 has high utilization in the previous programming. The rate is such that even if the acceleration resource 1011 is excluded to reduce the impact on the burned acceleration resource or reduce the calculation amount when the allocation scheme is determined again, a higher overall acceleration resource (including the acceleration resource 1011 and the acceleration resource 1012) can be obtained. Utilization.
  • the second group includes the acceleration application 1004, the acceleration application 1005, the acceleration application 1006, and the acceleration application 1007, and the acceleration processing device 200 can perform any two, three, and four of them. Combine to get multiple composite applications. It is assumed that the combined application obtained by combining any three and four accelerated applications cannot match the acceleration resource 1013 and the acceleration resource 1014, and the combined application of the acceleration application 1004, the acceleration application 1005, and any two acceleration applications in the acceleration application 1006 can match the acceleration resource. 1013, the combined application of the acceleration application 1005, the acceleration application 1006, and the acceleration application 1007 can be matched and combined. Speed resource 1014.
  • the acceleration processing device 200 may perform the calculation of the sum of the utilization rates of the combined application that can match any of the acceleration resources after the combination, for example, determine the combined application of the acceleration application 1004 and the acceleration application 1005 to utilize the utilization rate of the acceleration resource 1013 and the acceleration application 1006 and
  • the accelerated application 1007 combined application utilizes the sum of the utilization ratios of the acceleration resources 1014, and then determines that the advantageous allocation scheme is a combined application corresponding acceleration resource 1013 that accelerates the combination of the application 1004 and the acceleration application 1005, and accelerates the combination of the application 1006 and the acceleration application 1007.
  • the combined application corresponds to the acceleration resource 1014.
  • one of the plurality of allocation schemes may be separately removed, and multiple allocation schemes may be determined according to the order of the remaining utilization ratios from large to small. From high to low priority, choose the solution with the highest priority as the advantage distribution plan.
  • the acceleration processing device 200 can also select one by one in the order of priority from high to low in a plurality of allocation schemes, and perform burning according to the selected allocation scheme, and the burning cannot be completed according to the selected allocation scheme during the burning process ( For example, when it is detected during the simulation that the acceleration resource cannot meet the timing constraints or hardware resources required by the combined application, the next allocation scheme is selected until the programming is completed. With the above process, the acceleration processing device 200 can automatically select the highest priority allocation scheme to complete the programming in the allocation scheme capable of completing the programming, thereby achieving an improvement in the accelerated resource utilization.
  • FIG. 13 is a schematic diagram of an acceleration processing device 200 applied in the system shown in FIG. 1 according to an embodiment of the present application.
  • the acceleration processing apparatus 200 includes a combining unit 1201 and a burning unit 1202, which can be used to execute the acceleration processing method 400.
  • the combination unit 1201 is configured to combine the first acceleration application and the second acceleration application to obtain the first combination application. For details, refer to the description of S401 in the acceleration processing method 400, and details are not described herein again.
  • the burning unit 1202 is configured to burn the first combined application to the first acceleration resource. For details, refer to the description of S402 in the acceleration processing method 400, and details are not described herein again.
  • the acceleration processing device 200 may further include a transmitting unit 1203.
  • the second acceleration application is an acceleration application that has been burned to the second acceleration resource
  • the burning unit 1202 burns the first combination application onto the first acceleration resource
  • the sending unit 1203 transmits the first acceleration resource to the first acceleration resource.
  • the instruction of the second acceleration application is triggered, and the instruction for triggering the second acceleration application is no longer sent to the second acceleration resource, thereby implementing the migration of the accelerated application.
  • the acceleration processing method 400 for accelerating the application in the FPGA or the FPGA A description of the migration between regions.
  • the acceleration processing device 200 can also include a processing unit 1204.
  • the second acceleration application is an acceleration application that has been burned to the first acceleration resource
  • the second acceleration is performed by the processing unit 1204 while the programming unit 1202 burns the first combination application onto the first acceleration resource.
  • the description of the acceleration processing device 200 for performing the second acceleration application instead of the first acceleration resource may be specifically referred to.
  • the acceleration processing device 200 may further include a determining unit 1205, so that when there are a plurality of accelerating resources that can be burned, the acceleration resource to be burned may be selected according to the utilization ratio of the accelerated resources according to the combined application. For example, after the determining unit 1205 determines that the utilization ratio of the first combined application is higher than the utilization of the third accelerated resource, the burning unit 1202 burns the first combined application to the first accelerated resource.
  • the third acceleration resource can be an area on the FPGA chip or the entire FPGA chip.
  • the acceleration processing device 200 further includes an obtaining unit 1206 to obtain a plurality of acceleration applications including the first acceleration application and the second acceleration application, and specifically refer to the description of the accelerated application acquisition manner in the acceleration processing method 400.
  • the burning unit 1202 can program the first combined application according to the advantage allocation scheme in the multiple allocation schemes. Go to the first acceleration resource.
  • Each of the plurality of allocation schemes is a correspondence between the plurality of acceleration resources and the plurality of acceleration applications. For details, refer to the description of determining the allocation scheme in the acceleration processing method 400.
  • combining unit 1201, programming unit 1202, transmitting unit 1203, processing unit 1204, determining unit 1205, and obtaining unit 1206 may be implemented by computer program instructions, which may be used to implement FIG.
  • the NFV system can perform the accelerated processing method 400 using the NFVI to increase the utilization of the accelerated resources.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer program instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer program instructions can be from a website site, computer, server or data
  • the center transmits to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a digital video disc (DVD), or a semiconductor medium (eg, a solid state hard disk), or the like.
  • a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
  • an optical medium eg, a digital video disc (DVD)
  • DVD digital video disc
  • semiconductor medium eg, a solid state hard disk

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Advance Control (AREA)

Abstract

本申请提供一种加速处理方法及设备。在一种加速处理方法中,加速处理设备组合第一加速应用和第二加速应用以获得第一组合应用,将第一组合应用烧录到第一加速资源上,所述第一组合应用包括顶层模块、所述第一加速应用和所述第二加速应用,所述顶层模块包括用于调用所述第一加速应用的语句和用于调用所述第二加速应用的语句。本申请提供的方案有助于提高加速资源的利用率。

Description

一种加速处理方法及设备 技术领域
本申请涉及通信技术领域,特别涉及一种加速处理方法及设备。
背景技术
网络功能虚拟化(network function virtualization,NFV)系统可以使用现场可编程门阵列(field programmable gate array,FPGA)作为硬件加速器,将软件执行的功能交由FPGA执行以提升系统性能。NFV系统中网络功能虚拟化基础化设施(network function virtualization infrastructure,NFVI)可以将FPGA抽象成一组加速功能(或加速能力),并向虚拟网络功能(virtualized network function,VNF)或主机提供应用程序接口(application programming interface,API)以调用该加速功能。
FPGA是可以被配置的集成电路。NFV系统中需要配置不同数量或类型的FPGA以提供各种加速功能。为提高FPGA利用率,可以引入部分重配置(partial reconfiguration,PR)技术,以便用更少的FPGA满足更多的需求。PR技术在FPGA内部进行区域划分,设置多个区域,允许对区域进行重新配置以满足新的需求,而不影响未重新配置的区域。FPGA内部区域划分的数量受限。区域划分好后只能以区域为单位进行分配,不能根据实际需求动态调整区域大小。
发明内容
本申请提供了一种加速处理方法及设备,有助于提高加速资源的利用率。
第一方面,提供了一种加速处理方法。该方法中加速处理设备组合第一加速应用和第二加速应用以获得第一组合应用,并将该第一组合应用烧录到第一加速资源上。其中第一加速应用、第二加速应用和第一组合应用可以是HDL代码,第一加速资源可以是FPGA或FPGA的一个区域。第一组合应用包括顶层模块、第一加速应用和第二加速应用,顶层模块包括用于调用所述第一加速应用的语句和用于调用所述第二加速应用的语句,由此第一组合应用被烧录到第一加速资源上后,第一加速资源既可以执行对第一加速应用的调用请求,又可以执行对第二加速应用的调用请求。该方法中加速处理设备对加速应用进行组合获得组合应用后,将组合应用烧录到加速资源上,相对于仅将单一加速应用烧录到加速资源的方式,提高了加速资源的利用率。
可选地,顶层模块包括第一端口和第二端口,第一端口和第二端口分别映射到第一加速应用的端口和第二加速应用的端口。由此顶层模块可以分别连接第一加速应用和第二加速应用以进行信号传送,第一端口和第二端口的信号互不影响,方便实现。
可选地,顶层模块包括第一端口,第一端口既映射到第一加速应用的端口,又映射到第二加速应用的端口。由此顶层模块的一个端口可以同时连接第一加速应用和第二加速应用的端口,进而以总线的方式进行信号传送。这种端口映射方式有助于在增加新的加速应用时保持原有端口连接关系,方便升级。
可选地,加速处理设备将第一组合应用烧录到第一加速资源上之前,第二加速应用已经被烧录到第二加速资源上,加速处理设备将第一组合应用烧录到第一加速资源上之后,实现第二加速应用从第二加速资源到第一加速资源的迁移,后续可以仅向第一加速资源发送触发指令以触发第一加速资源执行第二加速应用,而不再向第二加速资源发送触发指令以触发第二加速资源执行第二加速应用。将已烧录到加速资源的加速应用迁移到新的加速资源中,在该新的加速资源中该加速应用与其他加速应用进行组合,这有助于获得更高的已有加速资源的利用率。在NFV系统中这有助于用更少的FPGA满足更多的需求。
可选地,加速处理设备将第一组合应用烧录到第一加速资源上之前,第二加速应用已经被烧录到第一加速资源上。加速处理设备将第一组合应用烧录到第一加速资源上的过程中,当出现对第二加速应用的调用请求时,加速处理设备200可以代替第一加速资源执行第二加速应用,由此可以及时响应烧录过程中出现的对正在烧录的加速资源上的加速应用的调用请求。
可选地,加速处理设备将第一组合应用烧录到第一加速资源上之前,确定第一组合应用利用第一加速资源的利用率高于利用第三加速资源的利用率。由此,在存在多个可以烧录的加速资源时,可以根据组合应用利用加速资源的利用率进行选择,这有助于提高加速资源的利用率。
可选地,加速处理设备组合第一加速应用和第二加速应用之前,先获得包括第一加速应用和第二加速应用在内的多个加速应用,以及根据多个分配方案中的优势分配方案,将第一组合应用烧录到第一加速资源上。多个分配方案中的每个分配方案都是多个加速资源与多个加速应用的一种对应关系,其中多个加速资源包括第一加速资源,优势分配方案包括第一加速资源与第一加速应用和第二加速应用的对应关系。加速处理设备可以根据不同的选择策略,从多个分配方案中选择出优势分配方案完成烧录,由此提供了多样化提升加速资源的利用率的方式。
可选地,优势分配方案中使用的加速资源(即对应至少一个加速应用的加速资源)的数量最少。由此可以节省更多的加速资源以满足后续对加速资源的需求。
可选地,优势分配方案中使用的加速资源的利用率之和最大。由此可以获得较高的整体加速资源(包括分配方案中所有使用的加速资源)的利用率。
可选地,加速处理设备执行计算机程序指令来实现第一方面提供的加速处理方法,该计算机程序指令可用于实现NFVI功能。由此可以在已有NFV系统的NFVI中增加执行第一方面提供的加速处理方法的功能,扩展现有NFV系统的功能。
第二方面,提供了一种加速处理设备。该加速处理设备包括用于执行第一方面提供的加速处理方法的单元。
第三方面,提供了一种加速处理设备。该加速处理设备包括存储器和处理器,其中处理器读取存储器中存储的计算机程序指令,执行第一方面提供的加速处理方法。
可选地,该第三方面提供的加速处理设备包括第一方面提供的加速处理方法中所述的第一加速资源。
第四方面,提供了一种加速处理系统,包括第二方面或第三方面提供的加速处理设备以及第一方面提供的加速处理方法中所述的第一加速资源。
第五方面,提供了一种包括计算机程序指令的计算机存储介质,当计算机程序指令在加速处理设备上运行时,使得加速处理设备执行第一方面提供的加速处理方法。
第六方面,提供了一种包括计算机程序指令的计算机程序产品,当计算机程序指令在加速处理设备上运行时,使得加速处理设备执行第一方面提供的加速处理方法。
附图说明
图1为本申请实施例提供的一种NFV系统架构的示意图;
图2为本申请实施例提供的一种在图1所示NFV系统中应用加速处理设备200的示意图;
图3为本申请实施例提供的另一种在图1所示NFV系统中应用加速处理设备200的示意图;
图4为本申请实施例提供的一种加速处理方法400的流程图;
图5为本申请实施例提供的以Verilog语言描述的顶层模块的一个示例;
图6为本申请实施例提供的以Verilog语言描述的顶层模块的另一个示例;
图7A至图7F为本申请实施例提供的第一组合应用中顶层模块的端口与第一加速应用的端口和第二加速应用的端口建立映射关系的示意图;
图8为本申请实施例提供的加速应用在FPGA的区域间迁移的一个示例;
图9为本申请实施例提供的加速应用在FPGA间迁移的一个示例;
图10为本申请实施例提供的组合应用利用加速资源中各个种类的硬件资源的利用率的一个示例;
图11为本申请实施例提供的基于加速资源和加速应用的匹配建立分组的一个示例;
图12为本申请实施例提供的组合应用和加速应用利用加速资源的利用率的一个示例;
图13为本申请实施例提供的一种在图1所示系统中应用的加速处理设备200的示意图。
具体实施方式
图1为本申请实施例提供的一种NFV系统架构的示意图。该NFV系统中NFVI将计算硬件、存储硬件和加速硬件进行抽象得到一组计算能力、存储能力和加速能力,并向VNF101、VNF102和VNF103提供调用这些计算能力、存储能力和加速能力的API以提供各类服务,例如计算服务、存储服务和加速服务。
图2为本申请实施例提供的一种在图1所示NFV系统中应用加速处理设备200的加速处理系统的示意图。该加速处理系统中包括加速处理设备200和FPGA芯片。加速处理设备200执行计算机程序指令来实现图1所示NFV系统中NFVI的功能。加速处理设备200包括处理器201和存储器202。处理器201和存储器202可以通过总线连接或是直接连接。存储器202存储所述计算机程序指令,处理器201读取存储器202中存储的所述计算机程序指令来实现加速处理设备200的各种操作。
加速处理设备200可以直接连接或通过总线连接一个或多个FPGA芯片。FPGA芯片可以使用PR技术划分多个区域,也可以不进行区域划分。如图2所示,FPGA203是 包括多个区域(例如区域205、区域206)的FPGA芯片,FPGA204是不进行区域划分的FPGA芯片。所述FPGA芯片也可以包括在加速处理设备200中。
加速处理设备200可以从存储器获得需要进行硬件加速的应用(下称加速应用)。加速应用可以是硬件描述语言(hardware description language,HDL)的代码,该HDL代码描述了需要进行硬件加速的逻辑功能,该逻辑功能用于响应对加速能力的API的调用。加速处理设备200可以将该HDL代码烧录到FPGA芯片上,进而在收到对加速能力的API的调用时,使用FPGA芯片执行需要的逻辑功能。
烧录的过程可以包括:对HDL代码进行综合生成网表(netlist),根据网表借助仿真等验证方法进行布局布线,生成二进制文件,以及将二进制文件传送到FPGA芯片上。对于包括多个区域的FPGA芯片来说,所述烧录可以是将HDL代码烧录到FPGA芯片的一个区域。对于不进行区域划分的FPGA芯片来说,所述烧录可以是将HDL代码烧录到整个FPGA芯片。加速处理设备200也可以对HDL以外的语言的代码例如c语言代码(预先存储或从其它设备获得)进行转换,来获得加速应用。
加速处理设备200还可以包括网络接口207,通过网络接口207接收其他网络设备发送的加速应用,将加速应用烧录到FPGA芯片上。加速处理设备200还可以通过网络接口207接收其他设备发送的HDL以外的语言的代码,将该HDL以外的语言的代码转换为加速应用,再将加速应用烧录到FPGA芯片上。
图3为本申请实施例提供的一种在图1所示系统中应用加速处理设备200的加速处理系统的示意图。该加速处理系统中包括加速处理设备200FPGA芯片。图3中加速处理设备200与图2中包括网络接口207的加速处理设备200结构相同。加速处理设备200可以通过网络接口207连接一个或多个加速设备,其中每个加速设备可以包括网络接口、处理器和一个或多个FPGA芯片,该FPGA芯片可以是图2所示包括多个区域的FPGA芯片或是不进行区域划分的FPGA芯片。如图3所示,加速设备301包括一个FPGA芯片FPGA305。加速设备302包括两个FPGA芯片FPGA308和FPGA309。图3所示的环境中,加速处理设备200可以在本地获得加速应用,或是经网络接口207接收其他设备发送的加速应用,或是经网络接口207接收其他设备发送的HDL以外的语言代码并将该HDL以外的语言代码转换为加速应用。加速处理设备200将加速应用烧录到加速设备301的FPGA芯片上的过程可以包括:加速处理设备200根据加速应用生成二进制文件,将二进制文件向加速设备301发送,加速设备301在处理器304的控制下经网络接口303接收二进制文件,将二进制文件传送到FPGA305上(当FPGA305包括多个区域时,烧录到FPGA305的一个区域中)。加速处理设备200将加速应用烧录到加速设备302的FPGA芯片上的过程与烧录到加速设备301的FPGA芯片上的过程类似,在此不再赘述。
处理器201、304和307包括但不限于中央处理器(central processing unit,CPU),网络处理器(network processor,NP),专用集成电路(application-specific integrated circuit,ASIC)或者可编程逻辑器件(programmable logic device,PLD)中的一个或多个。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程门阵列(field programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
存储器202可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器202也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)。存储器202还可以包括上述种类的存储器的组合。
可选地,存储器202可以整合到处理器201中,做为处理器201的内部部件。
网络接口207、303和306可以是有线通信接口、无线通信接口或其组合。有线通信接口例如是以太网接口、异步传输模式(asynchronous transfer mode,ATM)接口或基于同步数字体系(synchronous digital hierarchy,SDH)/同步光纤网络(synchronous optical networking,SONET)的包封装(packet over SONET/SDH,POS)接口。无线通信接口例如是无线局域网(wireless local area network,WLAN)接口、蜂窝网络通信接口或其组合。
图4为本申请实施例提供的一种加速处理方法400的流程图。该加速处理方法400可以由图2和图3中加速处理设备200执行。
S401,加速处理设备组合第一加速应用和第二加速应用以获得第一组合应用;
第一组合应用可以是HDL代码,该HDL代码描述的逻辑功能包括第一加速应用和第二加速应用描述的逻辑功能,该HDL代码包括第一加速应用中的代码和第二加速应用中的代码。
第一组合应用可以包括顶层模块、第一加速应用和第二加速应用。顶层模块可以包括用于调用第一加速应用和用于调用第二加速应用的语句。该用于调用第一加速应用和用于调用第二加速应用的语句可以是实例化第一加速应用(即建立第一加速应用的实例)的语句和实例化第二加速应用(即建立第二加速应用的实例)的语句。当第一组合应用被烧录到FPGA芯片后,FPGA芯片上用于实现顶层模块描述的逻辑功能的硬件电路(即与顶层模块对应的硬件电路)与用于实现第一加速应用描述的逻辑功能的硬件电路(即与第一加速应用对应的硬件电路)和用于实现第二加速应用描述的逻辑功能的硬件电路(即与第二加速应用对应的硬件电路)连接。
顶层模块、第一加速应用和第二加速应用均包括端口列表,端口列表包括一个或多个端口。顶层模块的端口是第一组合应用对外通信的端口,当第一组合应用被烧录到FPGA芯片后,顶层模块的端口映射到FPGA芯片的指定管脚(即顶层模块对应的硬件电路的端口连接该指定管脚),由此顶层模块对应的硬件电路可以经由该指定管脚与FPGA芯片外部进行通信。顶层模块可以在用于调用第一加速应用和调用第二加速应用的语句中,将顶层模块的端口映射到第一加速应用和第二加速应用的端口(即连接顶层模块的端口与第一加速应用和第二加速应用的端口),由此当第一组合应用被烧录到FPGA芯片后,顶层模块对应的硬件电路的端口也会连接第一加速应用和第二加速应用对应的硬件电路的端口,使得该FPGA芯片可以使用顶层模块对应的硬件电路从外部总线接收输入值,将该输入值传送到第一加速应用和第二加速应用对应的硬件电路进行计算,接收第一加速应用和第二加速应用对应的硬件电路返回的计算结果,向外部总线返回该计算结果。
顶层模块可以将一个端口同时映射到第一加速应用的端口和第二加速应用的端口, 例如顶层模块包括第一端口,第一端口映射到第一加速应用的端口和第二加速应用的端口。顶层模块也可以将不同的端口分别映射到第一加速应用的端口和第二加速应用的端口,例如顶层模块包括第一端口和第二端口,第一端口映射到第一加速应用的端口,第二端口映射到第二加速应用的端口。通过以上映射,顶层模块的端口可以与第一加速应用和第二加速应用的端口连接以进行信号传送。
图5以Verilog语言给出了顶层模块的一个示例,顶层模块500名称为top,顶层模块500的端口列表501中包括端口aap1_in、端口aap1_out、端口aap2_in和端口aap2_out。顶层模块500中包括建立加速应用aap1的实例(名称为aap1_instance)的语句502和建立加速应用aap2的实例(名称为aap2_instance)的语句503。语句502中用“.aap1_in(aap1_in)”建立顶层模块500的端口aap1_in到加速应用aap1的端口aap1_in的映射,用“.aap1_out(aap1_out)”建立顶层模块500的端口aap1_out到加速应用aap1的端口aap1_out的映射。语句503中用“.aap2_in(aap2_in)”建立顶层模块500的端口aap2_in到加速应用aap2的端口aap2_in的映射,用“.aap2_out(aap2_out)”建立顶层模块500的端口aap2_out到加速应用aap2的端口aap2_out的映射。
顶层模块在创建第一加速应用和第二加速应用的实例的语句中,也可以建立中间模块的指定端口到第一加速应用和第二加速应用的指定端口的映射,并且在创建该中间模块的实例的语句中,建立顶层模块的指定端口到中间模块的指定端口的映射以及第一加速应用和第二加速应用的指定端口到中间模块的指定端口的映射。由此顶层模块的指定端口经中间模块与第一加速应用和第二加速应用的指定端口进行通信,中间模块可以对顶层模块与第一加速应用和第二加速应用之间的信号传输进行调度管理,例如中间模块对顶层模块发往第一加速应用的第一信号和发往第二加速应用的第二信号进行仲裁,确定优先发送第一信号还是第二信号。
图6以Verilog语言给出了顶层模块的另一个示例,顶层模块600名称为top,顶层模块600的端口列表601中包括端口top1_in、端口top1_out、端口top2_in和端口top2_out。顶层模块600中包括建立加速应用aap1的实例(名称为aap1_instance)的语句602,建立加速应用aap2的实例(名称为aap2_instance)的语句603和建立中间模块aap_mid的实例的语句(名称为aap_mid_instance)604。
语句604中用“.top1_in(top1_in)”建立顶层模块的端口top1_in到中间模块aap_mid端口top1_in的映射,用“.top1_out(top1_out)”建立顶层模块的端口top1_out到中间模块aap_mid端口top1_out的映射,用“.top2_in(top2_in)”建立顶层模块的端口top2_in到中间模块aap_mid端口top2_in的映射,用“.top2_out(top2_out)”建立顶层模块的端口top2_out到中间模块aap_mid端口top2_out的映射。
语句604中用“.aap1_in(aap1_in)”建立加速应用aap1的端口aap1_in到中间模块aap_mid的端口aap1_in的映射,用“.aap1_out(aap1_out)”建立加速应用aap1的端口aap1_out到中间模块aap_mid的端口aap1_out的映射,用“.aap2_in(aap2_in)”建立加速应用aap2的端口aap2_in到中间模块aap_mid的端口aap2_in的映射,用“.aap2_out(aap2_out)”建立加速应用aap2的端口aap2_out到中间模块aap_mid 的端口aap2_out的映射。
语句602中用“.aap1_in(aap1_in)”建立中间模块aap_mid的端口aap1_in到加速应用aap1的端口aap1_in的映射,用“.aap1_out(aap1_out)”建立中间模块aap_mid的端口aap1_out到加速应用aap1的端口aap1_out的映射。语句603中用“.aap2_in(aap2_in)”建立中间模块aap_mid的端口aap2_in到加速应用aap2的端口aap2_in的映射,用“.aap2_out(aap2_out)”建立中间模块aap_mid的端口aap2_out到加速应用aap2的端口aap2_out的映射。
图5和图6中给出了顶层模块通过不同的端口映射到第一加速应用的端口和第二加速应用的端口的示例。类似地,顶层模块也可以用相同的端口映射到第一加速应用的端口和第二加速应用的端口,在此不再赘述。
图7A至图7F为本申请实施例提供的第一组合应用中顶层模块的端口与第一加速应用的端口和第二加速应用的端口建立映射关系的示意图。图7A至图7F中以第一加速应用701包括2个端口,第二加速应用702包括2个端口为例,在FPGA芯片内部,第一加速应用701对应硬件电路的2个端口和第二加速应用702对应硬件电路的2个端口可以通过FPGA的内部总线711进行连接,虚线箭头示出了端口间的映射关系。
在一个示例中,如图7A所示,顶层模块703的不同端口分别映射到第一加速应用701的端口和第二加速应用702的端口。顶层模块703的不同端口接收到的输入值可以分别传送到第一加速应用701和第二加速应用702的端口,第一加速应用701和第二加速应用702发送的输出值分别传送到顶层模块703的不同端口。
在另一个示例中,如图7B所示,顶层模块703的不同端口映射到中间模块704的不同端口,中间模块704的不同端口分别映射到第一加速应用701的端口和第二加速应用702的端口。
在另一个示例中,如图7C所示,顶层模块703的相同端口映射到第一加速应用701的端口和第二加速应用702的端口。顶层模块703的端口接收到的输入值可以同时传送到第一加速应用701和第二加速应用702的端口,第一加速应用701和第二加速应用702可以输入值来判断该输入值是否是发给自身的,如果是发给自身的,可以接收输入值,根据输入值进行计算并发送计算的结果到顶层模块703的端口。在此种情况下第一加速应用701和第二加速应用702逻辑功能相关,遵循相同的端口标准,例如第一加速应用701用于高级加密标准(advanced encryption standard,AES)加密,第二加速应用702用于数据加密标准(data encryption standard,DES)加密,二者可以使用相同的输入输出端口。此种端口映射方式有助于在增加新的加速应用时兼容原有端口连接关系。
在另一个示例中,如图7D所示,顶层模块703的端口映射到中间模块704的端口,中间模块704的相同端口映射到第一加速应用701的端口和第二加速应用702的端口。顶层模块703的端口接收到的输入值可以经中间模块704的端口同时传送到第一加速应用701和第二加速应用702的端口。第一加速应用701和第二加速应用702对输入值的处理可参照图7C中示例所述,在此不再赘述。
在另一个示例中,如图7E所示,顶层模块703的2个端口分别映射到第一加速应用701的1个端口和第二加速应用702的1个端口,顶层模块703的1个端口同时映 射到第一加速应用701的1个端口和第二加速应用702的1个端口。第一加速应用701和第二加速应用702对输入值的处理可参照图7A至图7D中示例所述,在此不再赘述。
在另一个示例中,顶层模块的部分相同端口经中间模块映射到第一加速应用的端口和第二加速应用的端口,以及部分不同端口经中间模块映射到第一加速应用的端口和第二加速应用的端口。如图7F所示,顶层模块703的3个端口映射到中间模块704的3个端口,中间模块704的另2个端口分别映射到第一加速应用701的1个端口和第二加速应用702的1个端口,中间模块704的另1个端口同时映射到第一加速应用701的1个端口和第二加速应用702的1个端口。第一加速应用701和第二加速应用702对输入值的处理可参照图7A至图7E中示例所述,在此不再赘述。
加速处理设备200同样可以对三个及以上的加速应用进行组合获得组合应用,例如顶层模块可以用相同的端口映射到第一加速应用、第二加速应用和第三加速应用的端口,或者用不同的端口映射到第一加速应用、第二加速应用和第三加速应用的端口,或者用部分相同端口映射到第一加速应用、第二加速应用和第三加速应用的端口并且用部分不同端口映射到第一加速应用、第二加速应用和第三加速应用的端口。对三个及以上的加速应用进行组合获得组合应用的方式可参照以上对两个加速应用进行组合获得组合应用的方式,在此不再赘述。
S402,加速处理设备将所述第一组合应用烧录到第一加速资源上。
加速资源是FPGA芯片上的一个区域(当FPGA芯片包括多个区域时)或是整个FPGA芯片(当FPGA芯片不进行区域划分时)。该FPGA芯片可以是图2或图3中的FPGA芯片,例如FPGA203、FPGA204、FPGA305、FPGA308或FPGA309。第一加速资源可以是以上FPGA芯片中的一个,或是以上FPGA芯片中的一个上的一个区域。加速处理设备200将第一组合应用烧录到第一加速资源的烧录过程可以参照图2或图3中所述将HDL代码烧录到FPGA芯片上的烧录过程,在此不再赘述。
加速处理方法400将第一加速应用和第二加速应用进行组合以获得第一组合应用并烧录到第一加速资源上,相对于将一个加速应用烧录到一个加速资源上的方式,可以提高加速资源的利用率。
加速处理方法400中,加速处理设备200将第一组合应用烧录到第一加速资源上之后,第一加速资源即可用于执行第一加速应用和第二加速应用。如图1所示的NFV系统架构中,NFVI通过提供API向虚拟网络功能VNF101、VNF102和VNF103提供加速服务。VNF101、VNF102和VNF103中的任一个可以在加速处理设备200中由处理器201执行存储器202中的计算机程序指令来实现,也可以由网络中的其他设备实现。VNF101可以发送具有不同的API的名称或参数的调用请求来调用具有不同加速能力的加速应用。下面给出VNF101需要使用加速服务,向NFVI发送对第一加速应用的调用请求,NFVI使用烧录了第一组合应用的FPGA芯片来进行响应的示例。
在一个示例中,VNF101由处理器201执行计算机程序指令实现。加速处理设备200获得调用请求的方式为:处理器201执行计算机程序指令实现NFVI以接收VNF101发送的调用请求。加速处理设备200获得调用请求后,由处理器201通过图2中的总线向第一加速资源发送触发指令,触发指令被传送到第一加速资源的管脚。当第一加速资源是一个FPGA芯片时,第一加速资源的管脚是该FPGA芯片的管脚,当第一加速资 源是FPGA芯片的一个区域时,第一加速资源的管脚是该FPGA芯片上用于该区域的管脚。触发指令可以包括一个或多个输入值,可以用于触发第一加速资源执行第一加速应用。烧录到第一加速资源的第一组合应用的顶层模块的指定端口可以映射到第一加速资源的指定管脚,传送到第一加速资源的指定管脚的触发指令中的输入值被传送到顶层模块的指定端口。响应于触发指令,第一加速资源可以执行第一加速应用以根据触发指令中的输入值进行计算,并向加速处理设备200发送计算结果。
在另一个示例中,VNF101由网络中的其他设备实现。加速处理设备200获得调用请求的方式为:处理器201执行计算机程序指令实现NFVI功能,通过图3所示的网络接口207接收其他设备发送的调用请求,并通过网络接口207向加速设备301或加速设备302发送触发指令。加速设备301在处理器304的控制下通过网络接口303接收触发指令并经加速设备301的内部总线传送到第一加速资源的管脚。或者加速设备302在处理器307的控制下通过网络接口306接收触发指令并经加速设备302的内部总线传送到第一加速资源的管脚。响应于触发指令,第一加速资源可以执行第一加速应用以根据触发指令中的输入值进行计算,并向加速处理设备200发送计算结果。
VNF101向NFVI发送对第二加速应用的调用请求的处理方式,与向NFVI发送对第一加速应用的调用请求的处理方式类似,在此不再赘述。
加速处理方法400中,加速处理设备200组合加速应用前,可以采用以下方式中的一种或多种获得加速应用:从本地存储器获得加速应用、对HDL以外的语言的代码进行转换以生成加速应用、从其他设备接收加速应用以及从其他设备接收HDL以外的语言的代码进行转换以生成加速应用。
加速处理方法400中,加速处理设备200从本地存储器获得的加速应用可以是加速处理设备200之前获得过并保存在存储器202中的加速应用。当第二加速应用是之前获得过并保存在存储器202中的加速应用时,该第二加速应用可以是已经烧录到第二加速资源的加速应用。第二加速资源可以是FPGA芯片上的一个区域或整个FPGA芯片。加速处理设备200在获得第二加速应用后保存到存储器202中,并且将第二加速应用烧录到第二加速资源。该第二加速资源是FPGA芯片上的一个区域(当FPGA芯片包括多个区域时)或是FPGA芯片(当FPGA芯片不进行区域划分时)。
在一个示例中,加速应用可以在FPGA的区域间迁移。如图8所示,FPGA 800包括区域801和区域802,区域801是第一加速资源,区域802是第二加速资源。在将第一组合应用805烧录到区域801之前,第二加速应用804已经被烧录到区域802,加速处理设备200可以向区域802发送包括输入值的触发指令并获得返回的计算结果。加速处理设备200将第二加速应用804与第一加速应用803组合以获得第一组合应用805,并将第一组合应用805烧录到区域801中后,如果收到对第二加速应用的调用请求,则仅向区域801发送触发指令,以触发区域801即第一加速资源执行第二加速应用,而不再向区域802发送触发指令,由此第二加速应用804被从区域802迁移到区域801。
在另一个示例中,加速应用可以在FPGA间迁移。如图9所示,FPGA901是第一加速资源,FPGA902是第二加速资源。在将第一组合应用905烧录到FPGA901之前,第二加速应用904已经被加速处理设备200或其他设备烧录到FPGA902,加速处理设备 200可以向FPGA902发送包括输入值的触发指令并获得返回的计算结果。加速处理设备200将第二加速应用904与第一加速应用903组合以获得第一组合应用905,并将第一组合应用905烧录到FPGA901中后,如果收到对第二加速应用的调用请求,则仅向FPGA901发送触发指令,以触发FPGA901即第一加速资源执行第二加速应用,而不再向FPGA902发送触发指令,由此第二加速应用904被从FPGA902迁移到FPGA901。加速处理设备200在对加速应用进行组合时,将已烧录到加速资源的加速应用迁移到新的加速资源中,在该新的加速资源中该加速应用与其他加速应用进行组合,这有助于获得更高的已有加速资源的利用率。在NFV系统中这有助于用更少的FPGA满足更多的需求。
加速处理方法400中,加速处理设备200将第一组合应用向第一加速资源进行烧录之前,第二加速应用可以是已经被加速处理设备200或其他设备烧录到第一加速资源的加速应用。加速处理方法400将第一组合应用向第一加速资源进行烧录的过程中,当出现对第二加速应用的调用请求时,加速处理设备200可以代替第一加速资源执行第二加速应用。在烧录结束后,第一加速资源上原有被烧录的内容被第一组合应用替换(即重新配置),使得第一加速资源既可以执行对第一加速应用的调用请求也可以执行对第二加速应用的调用请求,此时可以恢复由第一加速资源执行对第二加速应用的调用请求,其中加速处理设备200可以通过处理器201执行存储器202中存储的计算机程序指令,将第二加速应用转换成可由处理器201执行的计算机程序指令以执行第二加速应用。由此加速处理方法400可以及时响应烧录过程中出现的对加速应用的调用请求。
加速处理方法400中,在存在多个可以烧录的加速资源时,可以根据组合应用利用加速资源的利用率进行选择,这有助于提高加速资源的利用率。例如在存在第一加速资源和第三加速资源时,确定第一组合应用利用第一加速资源的利用率高于利用第三加速资源的利用率,则将第一组合应用烧录到第一加速资源上。
加速资源(即FPGA或FPGA的一个区域)可以包括多个种类的硬件资源,例如寄存器、查找表(lookup table,LUT)、随机存取存储器(random access memory,RAM)和输入输出端口。图10给出了组合应用利用加速资源中各个种类的硬件资源的利用率的一个示例。如图10所示,该组合应用在烧录到加速资源后,将会使用占总共数量13.89%的寄存器,60.98%的LUT,75.56%的RAM和12%的输入输出端口。组合应用利用加速资源的利用率可以根据该组合应用利用该加速资源中各种类的硬件资源的利用率来确定。例如,组合应用利用加速资源的利用率可以是组合应用利用加速资源中LUT的利用率,当组合应用利用第一加速资源中LUT的利用率大于利用第二加速资源中LUT的利用率时,确定组合应用利用第一加速资源的利用率大于利用第二加速资源的利用率。例如,组合应用利用加速资源的利用率可以是组合应用利用加速资源中LUT的利用率和利用RAM的利用率之和,当组合应用利用第一加速资源中LUT的利用率和利用第一加速资源中RAM的利用率之和大于利用第二加速资源中LUT的利用率和利用第二加速资源中RAM的利用率之和时,确定组合应用利用第一加速资源的利用率大于利用第二加速资源的利用率。还例如,组合应用利用加速资源的利用率可以采用如下公式进行计算:
Figure PCTCN2017098481-appb-000001
U是该组合应用利用该加速资源的利用率,n是该加速资源中硬件资源的种类的数量,Ai是该组合应用使用该加速资源中第i个种类的硬件资源的数量,Bi是该加速资源中第i个种类的硬件资源的总数,xi是该加速资源中第i个种类的硬件资源的权重系数。Ai/Bi是该组合应用利用该加速资源中第i个种类的硬件资源的利用率。
通过以上公式也可以计算一个加速应用利用加速资源的利用率(当仅有一个加速应用被烧录到一个加速资源时),此时U是该加速应用利用该加速资源的利用率,n是该加速资源中硬件资源的种类的数量,Ai是该加速应用使用该加速资源中第i个种类的硬件资源的数量,Bi是该加速资源中第i个种类的硬件资源的总数,xi是该加速资源中第i个种类的硬件资源的权重系数。Ai/Bi是该加速应用利用该加速资源中第i个种类的硬件资源的利用率。
加速处理方法400中,当加速资源是FPGA芯片的一个区域时,由于使用PR技术的FPGA芯片中会包括一些公共硬件资源给所有区域共用,此时也可以使用每类硬件资源的实际使用数量除以该类硬件资源的实际总数来获得每类硬件资源的利用率。该实际使用数量包括该加速资源中该类硬件资源的使用数量以及公共硬件资源中该类硬件资源的数量。该实际总数包括该加速资源中该类硬件资源的总数以及公共硬件资源中该类硬件资源的总数。由此可以使利用率的计算更加准确。
加速处理方法400中,加速处理设备200可以执行计算机程序指令来实现图1所示NFV系统中NFVI的功能。该NFVI的功能可以包括加速资源发现(例如发现新的FPGA芯片)、加速资源注册(例如记录新发现的FPGA芯片的信息)、加速资源状态收集(例如记录FPGA芯片的使用信息,由此可以获得哪些FPGA或FPGA的区域是被使用状态,哪些FPGA或FPGA的区域是空闲状态)和加速资源配置(例如FPGA芯片烧录)。该NFVI的功能可以包括组合应用管理功能,用于执行加速处理方法400,该组合应用管理功能可以由NFVI中单独的组件来完成或由多个组件配合完成。举例来说,组合应用管理功能由多个组件配合完成时,可以由组合组件组合第一加速应用和第二加速应用以获得第一组合应用,并调用用于执行加速资源配置功能的配置组件将第一组合应用烧录到FPGA芯片或FPGA芯片的区域上。以上NFVI的功能可以设置于已有NFV系统的NFVI中,有助于与其他NFV系统共同进行部署。在已有NFV系统的NFVI中增加执行加速处理方法400的功能可以扩展现有NFV系统的功能,例如使得NFV系统增加组合应用管理功能。加速处理方法400中,加速处理设备200可以获得NFV系统中所有加速资源的信息,该信息可以包括记录加速资源是否已被使用的使用信息。该加速资源的信息的获得可以由NFVI来完成。
在一个示例中,加速处理设备200获得一个或多个新的加速应用(即未在存储器202中保存的加速应用)后,可以首先确定NFV系统中未使用的加速资源是否足够用于烧录该一个或多个新的加速应用,如果是,则可以在对加速应用进行组合时排除已烧录到加速资源的加速应用,如果否,则可以在对加速应用进行组合时包括已烧录到加速资源的加速应用,以借助加速应用的迁移对已烧录加速应用和新加速应用进行重新组合,基于重新组合实现加速应用的烧录。
加速处理设备200在将加速应用迁移到新的加速资源后,可以更新NFV系统中保存的加速资源的信息,更新可以由NFVI来完成。例如图8所示将第二加速应用804从区域802迁移到区域801后,区域801的使用信息被更新为已使用,区域802的使用信息被更新为未使用。例如图9所示将第二加速应用904从FPGA902迁移到FPGA901后,FPGA901被更新为已使用,FPGA902被更新为未使用。由此被更新为未使用的加速资源可以重新被用于烧录加速应用或由加速应用进行组合的组合应用。
NFV系统中对加速应用的组合和烧录以及对加速应用的迁移对VNF是透明的,由此可以在VNF不感知的情况下提高加速资源的利用率。
加速处理方法400中,加速处理设备200获得多个加速应用后,可以对该多个加速应用进行任意组合以获得一个或多个组合应用,并根据组合应用利用加速资源的利用率、加速应用利用加速资源的利用率和/或使用的加速资源的个数确定分配方案。一个分配方案是多个加速资源与加速处理设备200获得的多个加速应用之间的一种对应关系。一个分配方案中该多个加速应用中的每个加速应用都对应一个加速资源,一个加速资源可以不对应任何加速应用、或对应一个加速应用或对应多个加速应用。加速处理设备可以根据分配方案进行烧录,当一个加速资源对应多个加速应用时,可以将该多个加速应用组合的组合应用烧录到该加速资源上,当一个加速资源对应一个加速应用时,可以将该一个加速应用烧录到该加速资源上。加速处理设备200可以确定多个分配方案并根据不同的选择策略,从多个分配方案中选择出一个分配方案(即优势分配方案),根据该优势分配方案进行烧录。
加速处理设备200对该多个加速应用进行组合前,可以获得NFV系统中的所有加速资源、所有空闲加速资源(即未使用加速资源)或所有可使用加速资源的信息,并将每个加速资源与每个加速应用进行匹配,确定哪些加速资源能够满足哪些加速应用的需求(即哪些加速资源能够匹配哪些加速应用)。匹配条件可以包括端口速率、端口数量、RAM数量和LUT数量是否足够等。加速处理设备200可以基于上述匹配进行加速应用组合和利用率的计算以降低计算量。
加速处理设备200可以基于上述匹配建立分组,使得每个组内的加速应用只能匹配该组内的加速资源,以降低计算量。例如在图11所示基于加速资源和加速应用的匹配建立分组的一个示例中,第一组包括加速资源1011、加速资源1012、加速应用1001、加速应用1002和加速应用1003。第一组内加速应用1001和加速应用1002可以匹配加速资源1011和加速资源1012,加速应用1003只能匹配加速资源1011,不能匹配加速资源1012。第二组包括加速资源1013、加速资源1014、加速应用1004、加速应用1005、加速应用1006和加速应用1007。第二组内加速应用1004、加速应用1005和加速应用1006可以匹配加速资源1013,加速应用1005、加速应用1006和加速应用1007可以匹配加速资源1014。加速处理设备200可以在各个组内进行加速应用组合和利用率的计算以降低计算量。下面以加速处理设备200分别在第一组和第二组内进行利用率计算并确定分配方案为例进行说明,当然加速处理设备200也可以不进行分组,根据全部加速资源和全部加速应用进行计算。
确定分配方案时,加速处理设备200可以对获得的多个加速应用进行任意组合并指定该多个加速应用与各个加速资源的对应关系。为提高处理效率,可以在去除不能 匹配加速资源的分配方案(例如分配方案中多个加速应用对应加速资源a,该多个加速应用组合后的组合应用不能匹配该加速资源a,又例如分配方案中加速应用b对应加速资源c,该加速应用b不能匹配该加速资源c)后,从剩余的分配方案中选择优势分配方案。
加速处理设备200可以设置不同的选择策略,以从多个分配方案中选择优势分配方案,以下给出几个示例。
(1)优势分配方案可以是使用的加速资源的数量最少的分配方案。使用的加速资源是指分配方案中对应至少一个加速应用的加速资源。可以根据使用的加速资源的数量从大到小的顺序确定多个分配方案从高到低的优先级。使用的加速资源的数量相同的分配方案可以具有相同的优先级。应用该选择策略可以节省更多的加速资源以满足后续对加速资源的需求。
(2)优势分配方案可以是使用的加速资源的利用率之和最大的分配方案。当分配方案中一加速资源对应多个加速应用时,该加速资源的利用率是指该多个加速应用组合后的组合应用利用该加速资源的利用率。当分配方案中一加速资源仅对应一个加速应用时,该加速资源的利用率是指该一个加速应用利用该加速资源的利用率。可以根据使用的加速资源的利用率之和从大到小的顺序确定多个分配方案从高到低的优先级。使用的加速资源的利用率之和相同的分配方案可以具有相同的优先级。应用该选择策略可以获得较高的加速资源的整体利用率。
(3)优势分配方案可以是加速应用最集中的分配方案。确定加速应用最集中的分配方案的方式可以是:将每个分配方案中使用的所有加速资源的利用率组成集合,去除集合中数值最小的一个利用率,计算集合中剩余利用率之和,剩余利用率之和最大的分配方案即为加速应用最集中的分配方案。可以根据剩余利用率之和从大到小的顺序确定多个分配方案从高到低的优先级。剩余利用率之和相同的分配方案可以具有相同的优先级。应用该选择策略,在后续再次执行加速处理方法400中确定分配方案的步骤时,可以排除前次已烧录的部分加速资源(不包括前次确定分配方案时具有数值最小的利用率的加速资源),由于该前次已烧录的部分加速资源已经获得较高的利用率,使得在排除该部分加速资源确定分配方案以降低对已烧录加速资源的影响或降低计算量的的情况下,仍可以获得较高的整体加速资源(包括该前次已烧录的部分加速资源和再次执行加速处理方法400时烧录的加速资源)的利用率。
以上确定多个分配方案的优先级的方式可以进行任意组合,以选择出具有最高优先级的分配方案做为优势分配方案。例如,使用的加速资源的数量最少的分配方案具有最高优先级,当两个分配方案使用的加速资源数量相同时,使用的加速资源的利用率之和更大的分配方案具有更高的优先级,当两个分配方案使用的加速资源的利用率之和相同时,加速应用更集中的分配方案(即两个分配方案使用的加速资源的利用率之和各自去除数值最小的一个利用率后,剩余利用率之和更大的分配方案,可参见确定多个分配方案的优先级顺序的第3个示例)具有更高的优先级。
参见图11中第一组所示的匹配关系,第一组包括加速应用1001、加速应用1002和加速应用1003,加速处理设备200可以对其中任意两个和三个进行组合以获得多个组合应用。当对加速应用1001、加速应用1002和加速应用1003进行组合的组合应用 能够匹配加速资源1011时,此时使用的加速资源数量是1,少于其他任意分配方案,则该方案是优势分配方案。
假设加速应用1001、加速应用1002和加速应用1003进行组合的组合应用不能匹配加速资源1011和加速资源1012,任意两个加速应用进行组合的组合应用能够匹配加速资源1011和加速资源1012,以及经过计算任意两个加速应用组合获得的组合应用以及各个加速应用利用加速资源的利用率如图12所示。图12中组合应用1101、1102和1103分别由加速应用1001和加速应用1002、加速应用1001和加速应用1003、加速应用1002和加速应用1003组合获得。
组合应用1101利用加速资源1011的利用率(80%)与加速应用1003利用加速资源1012的利用率(20%)之和为100%,组合应用1102利用加速资源1011的利用率(70%)与加速应用1002利用加速资源1012的利用率(30%)之和为100%,组合应用1103利用加速资源1011的利用率(60%)与加速应用1001利用加速资源1012的利用率(40%)之和为100%,组合应用1101利用加速资源1012的利用率(70%)与加速应用1003利用加速资源1011的利用率(25%)之和为95%,组合应用1102利用加速资源1012的利用率(60%)与加速应用1002利用加速资源1011的利用率(35%)之和为95%,组合应用1103利用加速资源1012的利用率(50%)与加速应用1001利用加速资源1011的利用率(45%)之和为95%。利用率之和为100%的分配方案有3个(大于另外3个利用率95%的分配方案)。该3个利用率之和为100%的方案可以优先级相同。加速处理设备200在选择方案时对于相同优先级的分配方案可以任选一个做为优势分配方案。
加速处理设备200也可以从利用率之和为100%的3个分配方案包括的利用率集合中分别去除各个集合中最小的一个利用率即对应加速应用1003的利用率20%、对应加速应用1002的利用率30%和对应加速应用1001的利用率40%,则剩余的利用率之和分别为80%、70%和60%。由此3个分配方案中剩余利用率之和最高即80%的分配方案是优势分配方案(即加速应用最集中的分配方案),该优势分配方案中组合应用1101对应加速资源1011,加速应用1003对应加速资源1012。在3个利用率之和为100%的的分配方案同时使用两个加速资源的情况下,该优势分配方案中一个加速资源的利用率最高,另一个加速资源的利用率最低,即该优势分配方案中加速应用更集中地分布到除该利用率最低的加速资源以外的其余加速资源上。根据该优势分配方案进行烧录后,在后续再次组合加速应用并确定分配方案时,可以在匹配加速资源时排除加速资源1011,这样由于加速资源1011在之前的烧录中已经具有较高的利用率,使得再次确定分配方案时即使排除加速资源1011以降低对已烧录加速资源的影响或降低计算量的情况下,仍可以获得较高的整体加速资源(包括加速资源1011和加速资源1012)的利用率。
参见图11中第二组所示的匹配关系,第二组包括加速应用1004、加速应用1005、加速应用1006和加速应用1007,加速处理设备200可以对其中任意两个、三个和四个进行组合以获得多个组合应用。假设任意三个和四个加速应用组合获得的组合应用不能匹配加速资源1013和加速资源1014,加速应用1004、加速应用1005和加速应用1006中任意两个加速应用进行组合的组合应用能够匹配加速资源1013,加速应用1005、加速应用1006和加速应用1007中任意两个加速应用进行组合的组合应用能够匹配加 速资源1014。加速处理设备200可以对以上组合后能够匹配任意加速资源的组合应用进行利用率之和的计算,例如确定加速应用1004和加速应用1005组合的组合应用利用加速资源1013的利用率与加速应用1006和加速应用1007组合的组合应用利用加速资源1014的利用率之和最大,则确定优势分配方案是加速应用1004和加速应用1005组合的组合应用对应加速资源1013,以及加速应用1006和加速应用1007组合的组合应用对应加速资源1014。同样的,当存在多个利用率之和相同的分配方案时,可以分别去除多个分配方案中各自最小的一个利用率,根据剩余的利用率之和从大到小的顺序确定多个分配方案从高到低的优先级,选择优先级最高的方案做为优势分配方案。
加速处理设备200还可以在多个分配方案中按照优先级从高到低的顺序逐个进行选择,根据选择的分配方案进行烧录,当烧录过程中根据该选择的分配方案无法完成烧录(例如,在仿真过程中检测到加速资源不能满足组合应用需要的时序约束或硬件资源)时,选择下一个分配方案,直至完成烧录。利用以上过程,加速处理设备200可以自动在能够完成烧录的分配方案中选择出具有最高优先级的分配方案完成烧录,实现加速资源利用率的提升。
图13为本申请实施例提供的一种在图1所示系统中应用的加速处理设备200的示意图。该加速处理设备200包括组合单元1201和烧录单元1202,可以用于执行加速处理方法400。
组合单元1201用于组合第一加速应用和第二加速应用以获得第一组合应用,具体可参照加速处理方法400中S401的描述,在此不再赘述。
烧录单元1202用于将第一组合应用烧录到第一加速资源上,具体可参照加速处理方法400中S402的描述,在此不再赘述。
加速处理设备200还可以包括发送单元1203。当第二加速应用是已经被烧录到第二加速资源的加速应用时,烧录单元1202将第一组合应用烧录到第一加速资源上之后,发送单元1203向第一加速资源发送用于触发第二加速应用的指令,而不再向第二加速资源发送用于触发第二加速应用的指令,从而实现加速应用的迁移,具体可参照加速处理方法400中关于加速应用在FPGA或FPGA的区域之间迁移的描述。
加速处理设备200还可以包括处理单元1204。当第二加速应用是已经被烧录到第一加速资源的加速应用时,在烧录单元1202将第一组合应用烧录到第一加速资源上的过程中,由处理单元1204执行第二加速应用,具体可参照加速处理方法400中关于加速处理设备200代替第一加速资源执行第二加速应用的描述。
加速处理设备200还可以包括确定单元1205,以便在存在多个可以烧录的加速资源时,可以根据组合应用对加速资源的利用率选择要烧录的加速资源。例如,确定单元1205确定第一组合应用利用第一加速资源的利用率高于利用第三加速资源的利用率后,烧录单元1202再将第一组合应用烧录到第一加速资源上。第三加速资源可以是FPGA芯片上的一个区域或整个FPGA芯片。
加速处理设备200还包括获得单元1206,以便获得包括第一加速应用和第二加速应用在内的多个加速应用,具体可参照加速处理方法400中关于加速应用获得方式的描述。
烧录单元1202可以根据多个分配方案中的优势分配方案,来将第一组合应用烧录 到第一加速资源上。该多个分配方案中的每个分配方案为多个加速资源与所述多个加速应用的一种对应关系,具体可参照加速处理方法400中关于确定分配方案的描述。
以上组合单元1201、烧录单元1202、发送单元1203、处理单元1204、确定单元1205和获得单元1206中的部分或全部可以由计算机程序指令来实现,该计算机程序指令可以用于实现图1所示NFV系统中NFVI的功能。由此,NFV系统能够使用NFVI执行加速处理方法400,提高加速资源的利用率。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD)、或者半导体介质(例如固态硬盘)等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (19)

  1. 一种加速处理方法,其特征在于,包括:
    加速处理设备组合第一加速应用和第二加速应用以获得第一组合应用,所述第一组合应用包括顶层模块、所述第一加速应用和所述第二加速应用,所述顶层模块包括用于调用所述第一加速应用的语句和用于调用所述第二加速应用的语句;
    加速处理设备将所述第一组合应用烧录到第一加速资源上。
  2. 根据权利要求1所述的方法,其特征在于,所述加速处理设备将所述第一组合应用烧录到第一加速资源上之前,所述第二加速应用已经被烧录到第二加速资源上;
    所述方法还包括:
    所述加速处理设备将所述第一组合应用烧录到第一加速资源上之后,仅向所述第一加速资源发送用于触发所述第一加速资源执行所述第二加速应用的指令。
  3. 根据权利要求1所述的方法,其特征在于,所述加速处理设备将所述第一组合应用烧录到第一加速资源上之前,所述第二加速应用已经被烧录到所述第一加速资源上;
    所述方法还包括:
    所述加速处理设备将所述第一组合应用烧录到所述第一加速资源上的过程中,由所述加速处理设备执行所述第二加速应用。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述方法还包括:
    所述加速处理设备将所述第一组合应用烧录到第一加速资源上之前,确定所述第一组合应用利用所述第一加速资源的利用率高于所述第一组合应用利用第三加速资源的利用率。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述加速处理设备组合第一加速应用和第二加速应用之前,所述方法还包括:所述加速处理设备获得多个加速应用,所述多个加速应用包括所述第一加速应用和所述第二加速应用;
    所述加速处理设备将所述第一组合应用烧录到第一加速资源上包括:所述加速处理设备根据多个分配方案中的优势分配方案,将所述第一组合应用烧录到所述第一加速资源上;
    所述多个分配方案中的每个分配方案为多个加速资源与所述多个加速应用的一种对应关系,所述多个加速资源包括所述第一加速资源,所述优势分配方案包括所述第一加速资源与所述第一加速应用和所述第二加速应用的对应关系。
  6. 根据权利要求5所述的方法,其特征在于,所述多个分配方案中所述优势分配方案中对应至少一个加速应用的加速资源的数量最少。
  7. 根据权利要求5或6所述的方法,其特征在于,所述多个分配方案中所述优势分配方案中对应至少一个加速应用的加速资源的利用率之和最大。
  8. 一种加速处理设备,其特征在于,包括:
    组合单元,用于组合第一加速应用和第二加速应用以获得第一组合应用,所述第一组合应用包括顶层模块、所述第一加速应用和所述第二加速应用,所述顶层模块包括用于调用所述第一加速应用的语句和用于调用所述第二加速应用的语句;
    烧录单元,用于将所述第一组合应用烧录到第一加速资源上。
  9. 根据权利要求8所述的加速处理设备,其特征在于,所述烧录单元将所述第一组合应用烧录到第一加速资源上之前,所述第二加速应用已经被烧录到第二加速资源上;
    所述加速处理设备还包括发送单元,用于在所述烧录单元将所述第一组合应用烧录到第一加速资源上之后,仅向所述第一加速资源发送用于触发所述第一加速资源执行所述第二加速应用的指令。
  10. 根据权利要求8所述的加速处理设备,其特征在于,所述烧录单元将所述第一组合应用烧录到第一加速资源上之前,所述第二加速应用已经被烧录到所述第一加速资源上;
    所述加速处理设备还包括处理单元,用于在所述烧录单元将所述第一组合应用烧录到第一加速资源上的过程中,执行所述第二加速应用。
  11. 根据权利要求8至10任一所述的加速处理设备,其特征在于,所述加速处理设备还包括确定单元,用于在所述烧录单元将所述第一组合应用烧录到第一加速资源上之前,确定所述第一组合应用利用所述第一加速资源的利用率高于所述第一组合应用利用第三加速资源的利用率。
  12. 根据权利要求8至11任一所述的加速处理设备,其特征在于,所述加速处理设备还包括获得单元,用于在所述组合单元组合第一加速应用和第二加速应用之前,获得多个加速应用,所述多个加速应用包括所述第一加速应用和所述第二加速应用;
    所述烧录单元将所述第一组合应用烧录到第一加速资源上包括:所述烧录单元根据多个分配方案中的优势分配方案,将所述第一组合应用烧录到所述第一加速资源上;
    所述多个分配方案中的每个分配方案为多个加速资源与所述多个加速应用的一种对应关系,所述多个加速资源包括所述第一加速资源,所述优势分配方案包括所述第一加速资源与所述第一加速应用和所述第二加速应用的对应关系。
  13. 根据权利要求12所述的加速处理设备,其特征在于,所述多个分配方案中所述优势分配方案中对应至少一个加速应用的加速资源的数量最少。
  14. 根据权利要求12或13所述的加速处理设备,其特征在于,所述多个分配方案中所述优势分配方案中对应至少一个加速应用的加速资源的利用率之和最大。
  15. 一种加速处理设备,包括:
    存储器,用于存储计算机程序指令;
    处理器,用于读取所述计算机程序指令,执行权利要求1-7任一所述的方法。
  16. 根据权利要求15所述的加速处理设备,所述加速处理设备还包括所述第一加速资源。
  17. 一种加速处理系统,包括:权利要求8-15任一所述的加速处理设备和所述第一加速资源。
  18. 一种计算机存储介质,包括计算机程序指令,当所述计算机程序指令在加速处理设备上运行时,使得加速处理设备执行如权利要求1-7任一所述的方法。
  19. 一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在加速处理设备上运行时,使得加速处理设备执行如权利要求1-7任一所述的方法。
PCT/CN2017/098481 2017-08-22 2017-08-22 一种加速处理方法及设备 WO2019036901A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2017/098481 WO2019036901A1 (zh) 2017-08-22 2017-08-22 一种加速处理方法及设备
EP17922593.3A EP3663912A4 (en) 2017-08-22 2017-08-22 ACCELERATION TREATMENT PROCESS AND DEVICE
CN201780049782.XA CN109729731B (zh) 2017-08-22 2017-08-22 一种加速处理方法及设备
US16/798,931 US11461148B2 (en) 2017-08-22 2020-02-24 Field-programmable gate array (FPGA) acceleration resource conservation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/098481 WO2019036901A1 (zh) 2017-08-22 2017-08-22 一种加速处理方法及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/798,931 Continuation US11461148B2 (en) 2017-08-22 2020-02-24 Field-programmable gate array (FPGA) acceleration resource conservation

Publications (1)

Publication Number Publication Date
WO2019036901A1 true WO2019036901A1 (zh) 2019-02-28

Family

ID=65439357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/098481 WO2019036901A1 (zh) 2017-08-22 2017-08-22 一种加速处理方法及设备

Country Status (4)

Country Link
US (1) US11461148B2 (zh)
EP (1) EP3663912A4 (zh)
CN (1) CN109729731B (zh)
WO (1) WO2019036901A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11750531B2 (en) * 2019-01-17 2023-09-05 Ciena Corporation FPGA-based virtual fabric for data center computing
US20220321403A1 (en) * 2021-04-02 2022-10-06 Nokia Solutions And Networks Oy Programmable network segmentation for multi-tenant fpgas in cloud infrastructures
WO2023181380A1 (ja) * 2022-03-25 2023-09-28 Chiptip Technology株式会社 情報処理システム、情報処理装置、サーバ装置、プログラム、リコンフィグラブルデバイス、又は方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004690A (zh) * 2007-01-09 2007-07-25 京信通信技术(广州)有限公司 嵌入式系统加载程序与应用程序一体化更新方法
US7389393B1 (en) * 2004-10-21 2008-06-17 Symantec Operating Corporation System and method for write forwarding in a storage environment employing distributed virtualization
CN104951353A (zh) * 2014-03-28 2015-09-30 华为技术有限公司 一种对vnf实现加速处理的方法及装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078179A1 (en) * 2002-10-17 2004-04-22 Renesas Technology Corp. Logic verification system
JP2010186206A (ja) * 2007-06-08 2010-08-26 Mitsubishi Electric Corp ディスク再生装置および同装置を備えたナビゲーション装置
US8245222B2 (en) * 2008-07-09 2012-08-14 Aspect Software, Inc. Image installer
US8448150B2 (en) * 2008-11-21 2013-05-21 Korea University Industrial & Academic Collaboration Foundation System and method for translating high-level programming language code into hardware description language code
CN101788931B (zh) * 2010-01-29 2013-03-27 杭州电子科技大学 一种硬件实时容错的动态局部可重构系统
CN102521157B (zh) 2011-12-13 2015-01-21 曙光信息产业(北京)有限公司 一种在fpga上实现板载存储资源管理的系统和方法
EP2765528B1 (de) * 2013-02-11 2018-11-14 dSPACE digital signal processing and control engineering GmbH Wahlfreier Zugriff auf Signalwerte eines FPGA zur Laufzeit
US10162472B1 (en) * 2013-09-24 2018-12-25 EMC IP Holding Company LLC Specifying sizes for user interface elements
CN103744694B (zh) * 2013-12-24 2017-08-11 武汉烽火众智数字技术有限责任公司 基于Nand闪存的动态分区搜索装置及其方法
CN104008024A (zh) 2014-06-12 2014-08-27 北京航空航天大学 基于fpga的动态重构技术应用平台
CN105306241B (zh) 2014-07-11 2018-11-06 华为技术有限公司 一种业务部署方法及网络功能加速平台
CN106502633A (zh) * 2015-09-06 2017-03-15 黑龙江傲立辅龙科技开发有限公司 一种可重构硬件透明编程的操作系统
US10489538B2 (en) * 2015-10-30 2019-11-26 Texas Instruments Incorporated Method for comprehensive integration verification of mixed-signal circuits
EP3376399A4 (en) * 2015-12-31 2018-12-19 Huawei Technologies Co., Ltd. Data processing method, apparatus and system
CN105677422B (zh) * 2016-01-05 2019-04-30 惠州市蓝微新源技术有限公司 同时更新监控程序和应用程序及后续更新应用程序的方法
US10733024B2 (en) * 2017-05-24 2020-08-04 Qubole Inc. Task packing scheduling process for long running applications
US11416300B2 (en) * 2017-06-29 2022-08-16 Intel Corporaton Modular accelerator function unit (AFU) design, discovery, and reuse

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7389393B1 (en) * 2004-10-21 2008-06-17 Symantec Operating Corporation System and method for write forwarding in a storage environment employing distributed virtualization
CN101004690A (zh) * 2007-01-09 2007-07-25 京信通信技术(广州)有限公司 嵌入式系统加载程序与应用程序一体化更新方法
CN104951353A (zh) * 2014-03-28 2015-09-30 华为技术有限公司 一种对vnf实现加速处理的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3663912A4 *

Also Published As

Publication number Publication date
CN109729731B (zh) 2021-02-09
CN109729731A (zh) 2019-05-07
US11461148B2 (en) 2022-10-04
EP3663912A4 (en) 2020-08-12
US20200192723A1 (en) 2020-06-18
EP3663912A1 (en) 2020-06-10

Similar Documents

Publication Publication Date Title
US11960915B2 (en) Method and apparatus for creating virtual machine based on parameter information of a virtual network interface card
US10397132B2 (en) System and method for granting virtualized network function life cycle management
US8830870B2 (en) Network adapter hardware state migration discovery in a stateful environment
US9588807B2 (en) Live logical partition migration with stateful offload connections using context extraction and insertion
EP3343364B1 (en) Accelerator virtualization method and apparatus, and centralized resource manager
US11461148B2 (en) Field-programmable gate array (FPGA) acceleration resource conservation
WO2018072612A1 (zh) 一种切片实例的管理方法及装置
WO2014063463A1 (zh) 一种物理网卡管理方法、装置及物理主机
KR102022441B1 (ko) 하드웨어 가속 방법 및 관련 장치
WO2019062830A1 (zh) 实例业务拓扑的生成方法及装置
CN115858102B (zh) 一种用于部署支持虚拟化硬件加速的虚拟机的方法
CN110837488B (zh) 报文传输方法和装置
WO2021057378A1 (zh) Vnf实例化方法、nfvo、vim、vnfm及系统
WO2021185083A1 (zh) Vnf实例化方法及装置
WO2021103657A1 (zh) 网络操作方法、装置、设备和存储介质
US20230325266A1 (en) Multi-tenant radio-based application pipeline processing system
Niu et al. Network stack as a service in the cloud
WO2021120933A1 (zh) 调整资源的方法和装置
WO2020135517A1 (zh) 部署虚拟化网络功能的方法和装置
US9344376B2 (en) Quality of service in multi-tenant network
US11003618B1 (en) Out-of-band interconnect control and isolation
US11575620B2 (en) Queue-to-port allocation
Steinert et al. Hardware and software components towards the integration of network-attached accelerators into data centers
US8041902B2 (en) Direct memory move of multiple buffers between logical partitions
CN107408058A (zh) 一种虚拟资源的部署方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17922593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017922593

Country of ref document: EP

Effective date: 20200306