WO2022036852A1 - 一种实现nGraph框架支持FPGA后端设备的方法 - Google Patents

一种实现nGraph框架支持FPGA后端设备的方法 Download PDF

Info

Publication number
WO2022036852A1
WO2022036852A1 PCT/CN2020/123809 CN2020123809W WO2022036852A1 WO 2022036852 A1 WO2022036852 A1 WO 2022036852A1 CN 2020123809 W CN2020123809 W CN 2020123809W WO 2022036852 A1 WO2022036852 A1 WO 2022036852A1
Authority
WO
WIPO (PCT)
Prior art keywords
fpga
ngraph
framework
kernel
opencl
Prior art date
Application number
PCT/CN2020/123809
Other languages
English (en)
French (fr)
Inventor
曹芳
郭振华
王丽
高开
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Priority to US18/012,924 priority Critical patent/US11762721B2/en
Publication of WO2022036852A1 publication Critical patent/WO2022036852A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/34Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/451Code distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks

Definitions

  • the present application relates to the technical field of deep learning model training super-heterogeneous acceleration, and in particular, to a method for implementing nGraph framework to support FPGA back-end equipment; it also relates to a device and device for implementing nGraph framework to support FPGA back-end equipment, and a method for supporting FPGA back-end equipment nGraph framework for backend devices.
  • DNN Deep Neural Network, deep neural network
  • image and video classification For example, up to hundreds of layers, with a total of 10 million to 20 million parameters.
  • This growth makes efficient model training even more important.
  • deep learning frameworks such as Tensorflow and pytorch
  • hardware accelerators such as GPU, FPGA, and ASIC chips
  • the working principles, development and optimization methods of different deep learning frameworks and different hardware acceleration devices are very different.
  • nGraph framework is a deep neural network model compiler for various devices and frameworks, which can greatly simplify the complexity of deep learning performance optimization across frameworks and hardware platforms. Applicability and portability of deep learning models.
  • the front-end deep learning frameworks that the nGraph framework has supported or are developing and supporting include Tensorflow, MXNet, PaddlePaddle, etc.
  • the back-end hardware acceleration devices that have been supported or are being developed include CPU, NNP, and various GPUs.
  • FPGAs In addition to acceleration devices such as CPUs, NNPs, and various GPUs, FPGAs have become one of the best choices for improving server performance and reducing power consumption in data centers due to their low power consumption, programmability, and high parallelism.
  • the FPGA heterogeneous computing platform adopts a high-level comprehensive programming model, calls the OpenCL language to research and optimize the deep learning neural network, completes the efficient transplantation and deployment of the neural network algorithm on the FPGA platform, and makes full use of the board hardware pipeline design and task level. Parallelization can greatly improve the computing performance of deep learning neural network algorithms.
  • the nGraph framework does not support FPGA back-end devices. In view of the low power consumption, programmability, and high parallelism of FPGAs, if the nGraph framework can support FPGA back-end devices, it will undoubtedly improve the training performance of deep learning neural networks. It can provide great help for further improvement.
  • the purpose of this application is to provide a method for implementing the nGraph framework to support FPGA back-end devices, enabling the nGraph framework to support FPGA back-end devices, so as to further realize the training or inference process of deep learning neural network computation graphs constructed by users based on the nGraph framework Deploy to FPGA back-end devices for acceleration purposes.
  • Another object of the present application is to provide an apparatus and device for implementing an nGraph framework to support FPGA back-end devices, and an nGraph framework for supporting FPGA back-end devices, which also have the above technical effects.
  • the present application provides a method for realizing that the nGraph framework supports FPGA back-end devices, including:
  • an FPGA compilation and execution module for registering and scheduling and executing the OP kernel is created.
  • the integration of the OpenCL standard API library into the nGraph framework includes:
  • the described nGraph framework is based on the OpenCL standard API library to create an FPGA back-end device creation module for registering the FPGA back-end device, initializing the OpenCL environment and acquiring the FPGA back-end device, including:
  • an FPGA back-end device acquisition sub-module for acquiring the FPGA back-end device is created based on the OpenCL standard API library.
  • the described nGraph framework is based on the OpenCL standard API library to create an FPGA cache space processing module for opening up FPGA cache space and reading and writing FPGA cache, including:
  • nGraph framework based on the OpenCL standard API library, a read FPGA cache sub-module for reading data processing results from the FPGA cache and returning the data processing results to the host is created.
  • the present application also provides a device for implementing the nGraph framework to support FPGA back-end equipment, including:
  • the first creation unit is used to create an FPGA back-end device creation module for registering the FPGA back-end device, initializing the OpenCL environment and obtaining the FPGA back-end device based on the OpenCL standard API library in the nGraph framework;
  • the second creation unit is used to create an FPGA cache space processing module for opening up FPGA cache space and reading and writing FPGA cache based on the OpenCL standard API library in the nGraph framework;
  • the 3rd creation unit is used to create the OP kernel implementation module for creating the OP kernel and compiling the OP kernel based on the OpenCL standard API library in the nGraph framework;
  • the fourth creation unit is used to create, register and schedule an FPGA compilation and execution module for executing the OP kernel based on the OpenCL standard API library in the nGraph framework.
  • the present application also provides an nGraph framework that supports FPGA back-end devices, including:
  • the FPGA back-end device creation module is used to register the FPGA back-end device, initialize the OpenCL environment, and obtain the FPGA back-end device;
  • FPGA cache space processing module used to open up FPGA cache space and read and write FPGA cache
  • the OP kernel implementation module is used to create the OP kernel and compile the OP kernel
  • the FPGA compiles and executes the module for registering and scheduling the execution of the OP kernel.
  • the FPGA back-end device creation module includes:
  • the FPGA back-end device registration sub-module is used to register the FPGA back-end device
  • the OpenCL environment initialization submodule is used to initialize the OpenCL environment
  • the FPGA back-end device acquisition sub-module is used to acquire the FPGA back-end device.
  • the present application also provides a device that implements the nGraph framework to support FPGA back-end devices, including:
  • the processor is configured to implement the steps of the above-mentioned method for implementing the nGraph framework to support the FPGA back-end device when executing the computer program.
  • the method for realizing that the nGraph framework supports an FPGA back-end device includes: integrating the OpenCL standard API library into the nGraph framework; creating and registering the FPGA back-end device in the nGraph framework based on the OpenCL standard API library , initialize the OpenCL environment and obtain the FPGA back-end device creation module of the FPGA back-end device; Create the FPGA cache space processing module for opening up the FPGA cache space and reading and writing the FPGA cache based on the OpenCL standard API library in the nGraph framework; Create an OP kernel implementation module for creating OP kernel and compiling the OP kernel based on the OpenCL standard API library in the nGraph framework; create an OP kernel implementation module for registration and scheduling based on the OpenCL standard API library in the nGraph framework Execute the FPGA compilation and execution module of the OP kernel.
  • the method for implementing the nGraph framework to support the FPGA back-end device integrates the OpenCL standard API library in the nGraph framework, and creates the FPGA back-end device creation module, the FPGA cache space processing module, and the OP kernel based on the OpenCL standard API library. Implementation modules and FPGA compilation and execution modules. After the OpenCL standard API library is integrated into the nGraph framework and the above modules are created in the nGraph framework, the nGraph framework can be enabled to support the FPGA back-end device.
  • the user In the process of programming and development, the user only needs to designate the back-end device as FPGA when creating the back-end device, and then perform the corresponding operations through the above modules, and then use the FPGA back-end device to learn the deep learning neural network built by the user. For training or inference acceleration.
  • Fig. 1 is a kind of schematic flow chart of the method that realizes that nGraph framework supports FPGA back-end equipment provided by the embodiment of the application;
  • FIG. 2 is a schematic diagram of a functional module implementing an nGraph framework supporting an FPGA back-end device provided by an embodiment of the present application;
  • FIG. 3 is a schematic diagram of the association between an nGraph client development program and an FPGA back-end device provided by an embodiment of the present application.
  • the core of this application is to provide a method for implementing the nGraph framework to support FPGA back-end devices, enabling the nGraph framework to support FPGA back-end devices, so as to further realize the training or inference process of the deep learning neural network computation graph constructed by the user based on the nGraph framework Deploy to FPGA back-end devices for acceleration purposes.
  • Another core of the present application is to provide an apparatus and device for implementing an nGraph framework to support FPGA back-end devices, and an nGraph framework for supporting FPGA back-end devices, which also have the above technical effects.
  • FIG. 1 is a schematic flowchart of a method for implementing an nGraph framework to support an FPGA back-end device provided by an embodiment of the application. Referring to FIG. 1, the method includes:
  • OpenCL Open Computing Language, Open Computing Language
  • OpenCL Application Programming Interface, application programming interface
  • This step aims to integrate the OpenCL standard API library into the nGraph framework for the subsequent development of FPGA back-end devices.
  • an OpenCL standard API library integration module can be created and the OpenCL standard API library can be integrated into the nGraph framework through the OpenCL standard API library integration module.
  • the integration of the OpenCL standard API library into the nGraph framework may include: adding the OpenCL standard API library to the source code of the nGraph framework; modifying the cmake compilation file of the nGraph framework to compile the OpenCL standard API library into a dynamic link library in the nGraph framework.
  • the OpencCL standard API library is first added to the source code of the nGraph framework. Since the OpenCL standard API library is used for the development and use of FPGA back-end devices, in the source code of the nGraph framework, the OpenCL standard API library is added to the backend of the FPGA. The end device is located in the same directory location. After adding the OpenCL standard API library to the source code of the nGraph framework, further modify the cmake compilation file of the nGraph framework to compile the OpenCL standard API library into a dynamic link library in the nGraph framework. In this way, the OpenCL standard API library is integrated with the nGraph framework and can be used by other modules in the nGraph framework.
  • S102 Create an FPGA back-end device creation module based on the OpenCL standard API library in the nGraph framework for registering the FPGA back-end device, initializing the OpenCL environment, and obtaining the FPGA back-end device;
  • this step aims to create an FPGA back-end device creation module in the nGraph framework, and the FPGA back-end device creation module is used to register the FPGA back-end device in the nGraph framework, initialize the OpenCL environment, and obtain the FPGA back-end device.
  • an FPGA backend device creation module for registering the FPGA backend device, initializing the OpenCL environment, and obtaining the FPGA backend device is created, including: creating an FPGA backend device in the nGraph framework for registering the FPGA backend device.
  • the FPGA backend device acquires the submodule.
  • the FPGA back-end device creation module in this embodiment includes an FPGA back-end device registration sub-module, an OpenCL environment initialization sub-module, and an FPGA back-end device acquisition sub-module.
  • the FPGA backend device registration submodule is based on the function BackendManager::register_backend() provided by the nGraph framework, and registers the FPGA backend device in the nGraph framework by constructing the FPGA backend device global registration function ngraph_register_fpga_backend().
  • the function implementation of the OpenCL environment initialization submodule depends on the OpenCL standard API library, which completes the initialization of the OpenCL environment by calling the OpenCL standard API library functions.
  • the function realization of the FPGA back-end device acquisition sub-module also depends on the OpenCL standard API library, which completes the acquisition of the FPGA back-end device by calling the OpencCL standard API library functions for subsequent use.
  • this step aims to create an FPGA cache space processing module in the nGraph framework.
  • the FPGA cache space processing module is used to open up the FPGA cache space, and to read and write the FPGA cache, that is, to write data into the FPGA cache and read from the FPGA cache. data.
  • creating an FPGA cache space processing module based on the OpenCL standard API library in the nGraph framework for opening up the FPGA cache space and reading and writing the FPGA cache including: creating an FPGA based on the OpenCL standard API library in the nGraph framework to open up the FPGA for the data to be processed The FPGA cache space development sub-module of the cache space; based on the OpenCL standard API library in the nGraph framework, the write FPGA cache sub-module for writing the data to be processed from the host side to the FPGA cache space is created; based on the OpenCL standard API library in the nGraph framework Create a read FPGA cache submodule for reading data processing results from the FPGA cache and returning the data processing results to the host.
  • the FPGA cache space processing module in this embodiment includes an FPGA cache space development sub-module, a write FPGA cache sub-module, and a read FPGA cache sub-module.
  • the FPGA cache space development sub-module is mainly used to open up FPGA cache space, and is also used to create FPGA Tensor, calculate the space required for Tensor data to be processed, and layout Tensor data.
  • the process of opening up the FPGA buffer space depends on the OpenCL standard API library, which completes the operation of opening up the FPGA buffer space by calling the OpenCL standard API library functions.
  • the write FPGA cache submodule writes the data to be processed from the host to the cache of the FPGA back-end device by calling the OpenCL standard API library function, so that the FPGA back-end device can calculate the data.
  • the read FPGA cache submodule reads the calculation result from the cache of the FPGA back-end device by calling the OpenCL standard API library, and sends the calculation result back to the host.
  • S104 Create an OP kernel implementation module for creating an OP kernel and compiling an OP kernel based on the OpenCL standard API library in the nGraph framework;
  • this step aims to create an OP kernel implementation module in the nGraph framework, and the OP kernel implementation module is mainly used to create the OP kernel and compile the OP kernel.
  • OP represents the computing node in the computing graph, and the kernel of OP on the FPGA back-end device is called the OP kernel.
  • creating an OP kernel implementation module for creating an OP kernel and compiling an OP kernel based on the OpenCL standard API library in the nGraph framework includes: creating an OP kernel creation submodule for creating an FPGA-supported OP kernel in the nGraph framework; The OP kernel compilation submodule is created in the nGraph framework for compiling the OP kernel and obtaining the compiled aocx file.
  • the OP kernel implementation module includes the OP kernel creation submodule and the OP kernel compilation submodule.
  • the OP kernel creation sub-module uses the OpenCL high-level programming language to write the forward calculation and reverse calculation kernels of each OP supported by the FPGA, and optimize the kernel in parallel.
  • the OP kernel compilation submodule uses aoc to compile each created OP kernel, obtains the aocx file, and places the aocx file in the newly added FPGA directory in the nGraph framework for subsequent use.
  • S105 Create an FPGA compilation and execution module for registering and scheduling the execution of the OP kernel based on the OpenCL standard API library in the nGraph framework.
  • this step aims to create an FPGA compilation and execution module in the nGraph framework, and the FPGA compilation and execution module is used to register and schedule the execution of the OP kernel.
  • the FPGA compilation and execution module for registering and executing the OP kernel is created based on the OpenCL standard API library in the nGraph framework, including: creating an OP kernel registration sub-module for registering the OP kernel in the nGraph framework; creating an OP kernel registration sub-module in the nGraph framework; A computational graph optimization sub-module for optimizing computational graphs; an OP kernel scheduling execution sub-module is created in the nGraph framework to determine the execution order of each OP in the computational graph, and start the OP kernel for computation according to the execution order.
  • the FPGA compilation and execution module includes an OP kernel registration sub-module, a computational graph optimization sub-module, and an OP kernel scheduling and execution sub-module.
  • the OP kernel registration sub-module completes the registration of the OP kernel by defining the FPGA kernel registration list and the FPGA kernel registration function, so that the subsequent FPGA back-end devices can be identified and called.
  • the computation graph optimization sub-module optimizes the computation graph created by the user by reusing the pass optimization code of the graph in the nGraph framework, and adding an optimization pass for the FPGA back-end device to improve the training performance.
  • the OP kernel scheduling and execution sub-module is used to determine the execution order of each OP in the calculation graph, find the OP to be calculated from the registration list, and start the OP kernel for calculation according to the execution order of each OP.
  • the nGraph framework can support the FPGA back-end device.
  • nGraph client users can program and develop according to the original programming habits. They only need to specify the back-end device as "FPGA" when creating the back-end device, and then they can use the FPGA back-end device to learn the deep learning built by the user. Neural networks for training or inference acceleration. Specifically, the user only needs to use Backend::create("FPGA") to indicate that the backend device to be used is an FPGA device when calling the Backend::create() function to create a backend when using an FPGA backend device.
  • the subsequent process will automatically call each module created in the above steps to perform corresponding operations, and finally deploy the training or inference process of the deep learning neural network computation graph constructed by the user based on the nGraph framework to the FPGA back-end The purpose of the device for acceleration.
  • the nGraph Client user program uses Backend::create("FPGA") to create the FPGA Backend based on the function calculation graph, and then the FPGA backend device creation module registers the FPGA backend and initializes it. OpenCL environment and operations to obtain FPGA devices.
  • the FPGA buffer space processing module correspondingly performs the operations of opening up the FPGA storage space, writing the FPGA buffer, and reading the FPGA buffer.
  • the FPGA compilation execution module executes Op kernel registration, computational graph optimization, and Op kernel scheduling operations.
  • the FPGA compilation execution module executes the operations from the registration list. Find the Op kernel and the operation to start the kernel. Further, the OP kernel implementation module compiles the aocx executable file obtained by the OP kernel.
  • nGraph framework to support FPGA back-end device integrates OpenCL standard API library in nGraph framework, and creates FPGA back-end device creation module and FPGA cache space processing module based on OpenCL standard API library , OP kernel implementation module and FPGA compilation and execution module.
  • the nGraph framework can be enabled to support FPGA back-end devices.
  • users only need to designate the back-end device as FPGA when creating the back-end device, and then perform the corresponding operations through the above modules, and then use the FPGA back-end device to learn the deep learning neural network constructed by the user. For training or inference acceleration.
  • the present application also provides an apparatus for implementing an nGraph framework to support an FPGA back-end device, and the apparatus described below may refer to the method described above in correspondence with each other.
  • the device includes:
  • the first creation unit is used to create and register the FPGA back-end device based on the OpenCL standard API library in the nGraph framework, initialize the OpenCL environment, and obtain the FPGA back-end device creation module of the FPGA back-end device;
  • the second creation unit is used to create an FPGA cache space processing module for opening up the FPGA cache space and reading and writing the FPGA cache based on the OpenCL standard API library in the nGraph framework;
  • the third creation unit is used to create the OP kernel implementation module for creating the OP kernel and compiling the OP kernel based on the OpenCL standard API library in the nGraph framework;
  • the fourth creation unit is used to create an FPGA compilation and execution module for registering and scheduling the execution of the OP kernel based on the OpenCL standard API library in the nGraph framework.
  • the integrated unit includes:
  • the modification unit is used to modify the cmake compilation file of the nGraph framework, and compile the OpencCL standard API library into a dynamic link library in the nGraph framework.
  • the first creation unit includes:
  • the registration submodule creation unit is used to create the FPGA backend device registration submodule for registering the FPGA backend device in the nGraph framework;
  • the initialization submodule creation unit is used to create an OpenCL environment initialization submodule for initializing the OpenCL environment based on the OpenCL standard API library in the nGraph framework;
  • the acquisition submodule creation unit is used to create an FPGA backend device acquisition submodule for acquiring the FPGA backend device based on the OpenCL standard API library in the nGraph framework.
  • the second creation unit includes:
  • the buffer space development sub-module creation unit is used to create an FPGA cache space development sub-module for opening up the FPGA buffer space based on the OpenCL standard API library in the nGraph framework;
  • the write cache submodule creation unit is used to create a write FPGA cache submodule for writing the data to be processed from the host side to the FPGA cache based on the OpenCL standard API library in the nGraph framework;
  • the read cache submodule creation unit is used to create a read FPGA cache submodule based on the OpenCL standard API library in the nGraph framework for reading data processing results from the FPGA cache and returning the data processing results to the host.
  • the third creation unit includes:
  • the kernel creates a submodule creation unit, which is used to create an OP kernel creation submodule for creating an FPGA-supported OP kernel based on the OpenCL standard API library in the nGraph framework;
  • the kernel compilation submodule creation unit is used to create an OP kernel compilation submodule for compiling the OP kernel and obtaining the compiled aocx file based on the OpenCL standard API library in the nGraph framework.
  • the fourth creation unit includes:
  • the kernel registration submodule creation unit is used to create the OP kernel registration submodule for registering the OP kernel in the nGraph framework;
  • a computational graph optimization sub-module creation unit which is used to create a computational graph optimization sub-module for optimizing computational graphs in the nGraph framework;
  • the kernel scheduling and execution sub-module creation unit is used to create an OP kernel scheduling and execution sub-module for determining the execution order of each OP in the calculation graph in the nGraph framework, and starting the OP kernel for calculation according to the execution order.
  • This application also provides an nGraph framework that supports FPGA back-end devices, including:
  • the FPGA back-end device creation module is used to register the FPGA back-end device, initialize the OpenCL environment, and obtain the FPGA back-end device;
  • FPGA cache space processing module used to open up FPGA cache space and read and write FPGA cache
  • OP kernel implementation module used to create OP kernel and compile OP kernel
  • the FPGA compiles and executes the module, which is used to register and schedule the execution of the OP kernel.
  • the FPGA back-end device creation module includes:
  • the FPGA back-end device registration sub-module is used to register the FPGA back-end device
  • the OpenCL environment initialization submodule is used to initialize the OpenCL environment
  • the FPGA back-end device acquisition sub-module is used to acquire the FPGA back-end device.
  • the FPGA cache space processing module includes:
  • the FPGA cache space development sub-module is used to open up the FPGA cache space
  • the write FPGA cache sub-module is used to write the data to be processed from the host side to the FPGA cache;
  • the read FPGA cache sub-module is used to read the data processing result from the FPGA cache and send the data processing result back to the host.
  • the OP kernel implementation module includes:
  • the OP kernel creates a sub-module for creating an OP kernel that supports the FPGA backend;
  • the OP kernel compiles the submodule, which is used to compile the OP kernel and obtain the compiled aocx file.
  • the FPGA compilation and execution module includes:
  • the OP kernel registration submodule is used to register the OP kernel
  • the OP kernel scheduling execution sub-module is used to determine the execution order of each OP in the calculation diagram, and start the OP kernel for calculation according to the execution order.
  • the present application also provides a device for implementing an nGraph framework to support an FPGA back-end device, the device including a memory and a processor.
  • the memory is used to store the computer program;
  • the processor is used to execute the computer program to realize the following steps:
  • Integrate the OpenCL standard API library into the nGraph framework create an FPGA back-end device creation module based on the OpenCL standard API library in the nGraph framework for registering the FPGA back-end device, initializing the OpenCL environment, and obtaining the FPGA back-end device; in the nGraph framework Create an FPGA cache space processing module for opening up FPGA cache space and reading and writing FPGA cache based on the OpenCL standard API library; create an OP kernel implementation module for creating and compiling OP kernels based on the OpenCL standard API library in the nGraph framework; Based on the OpenCL standard API library, the nGraph framework creates an FPGA compilation and execution module for registering and scheduling the execution of the OP kernel.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Neurology (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Stored Programmes (AREA)

Abstract

一种实现nGraph框架支持FPGA后端设备的方法及相关装置,该方法包括:将OpenCL标准API库集成到nGraph框架中;在nGraph框架中创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;在nGraph框架中创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;在nGraph框架中创建用于创建OP kernel以及编译OP kernel的OP kernel实现模块;以及在nGraph框架中创建用于注册并调度执行OP kernel的FPGA编译执行模块。该方法能够使nGraph框架支持FPGA后端设备。

Description

一种实现nGraph框架支持FPGA后端设备的方法
本申请要求于2020年08月20日提交至中国专利局、申请号为202010844796.4、发明名称为“一种实现nGraph框架支持FPGA后端设备的方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及深度学习模型训练超异构加速技术领域,特别涉及一种实现nGraph框架支持FPGA后端设备的方法;还涉及一种实现nGraph框架支持FPGA后端设备的装置及设备以及一种支持FPGA后端设备的nGraph框架。
背景技术
目前DNN(Deep Neural Network,深度神经网络)已获得了广泛的应用,包括图像和视频分类,语音识别和语言翻译。然而随着深度神经网络越来越广泛地开发和使用,模型尺寸变得越来越大,例如可达数百个层,总共有1000万到2000万个参数。这种增长使得高效的模型训练变得更加重要。Tensor flow、pytorch等深度学习框架,以及GPU、FPGA、ASIC芯片等各种硬件加速器的出现,为神经网络训练性能提升做出了巨大贡献。然而,不同的深度学习框架之间,以及不同的硬件加速设备之间的工作原理,开发及优化方法天差地别,在开发工程中想要更换深度学习框架或者想要将深度学习模型部署到其他更先进的设备时,开发者需要付出大量的精力和时间进行迁移和优化。针对上述问题,Intel推出了nGraph框架,这是一种面向各种设备和框架的深度神经网络模型编译器,可以大大简化跨框架和硬件平台实现深度学习性能优化这类工作的复杂性,扩展了深度学习模型的适用性和可移植性。目前,nGraph框架已经支持或正在开发支持的前端深度学习框架有Tensorflow、MXNet、PaddlePaddle等,已经支持或正在开发支持的后端硬件加速设备有CPU、NNP以及各类GPU。
除CPU、NNP以及各类GPU等加速设备外,FPGA因其低功耗、可编程、高度并行等特性,已经成为提高数据中心服务器性能及降低功耗的最佳选择之一。FPGA异构计算平台采用高层次综合编程模型,调用OpenCL语言 对深度学习神经网络进行研究和优化,完成神经网络算法在FPGA平台上的高效移植和部署,通过充分利用板卡硬件流水设计和任务级并行,能够大幅提升深度学习神经网络算法的计算性能。然而,目前nGraph框架并不支持FPGA后端设备,鉴于FPGA具有低功耗、可编程、高度并行等特性,所以若能够使nGraph框架支持FPGA后端设备,无疑将为深度学习神经网络的训练性能够进一步提升提供巨大助力。
因此,如何实现nGraph框架支持FPGA后端设备已成为本领域技术人员亟待解决的技术问题。
发明内容
本申请的目的是提供一种实现nGraph框架支持FPGA后端设备的方法,能够使nGraph框架支持FPGA后端设备,以进一步实现将用户基于nGraph框架构建的深度学习神经网络计算图的训练或推理过程部署到FPGA后端设备进行加速的目的。本申请的另一目的是提供一种实现nGraph框架支持FPGA后端设备的装置、设备以及一种支持FPGA后端设备的nGraph框架,同样具有上述技术效果。
为解决上述技术问题,本申请提供了一种实现nGraph框架支持FPGA后端设备的方法,包括:
将OpenCL标准API库集成到nGraph框架中;
在所述nGraph框架中基于所述OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;
在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;
在所述nGraph框架中基于所述OpenCL标准API库创建用于创建OP kernel以及编译所述OP kernel的OP kernel实现模块;
在所述nGraph框架中基于所述OpenCL标准API库创建用于调注册并调度执行所述OP kernel的FPGA编译执行模块。
可选的,所述将OpenCL标准API库集成到nGraph框架中,包括:
在nGraph框架的源码中添加所述OpencCL标准API库;
修改所述nGraph框架的cmake编译文件,将所述OpencCL标准API库编 译为所述nGraph框架内的动态链接库。
可选的,所述在所述nGraph框架中基于所述OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块,包括:
在所述nGraph框架中创建用于注册FPGA后端设备的FPGA后端设备注册子模块;
在所述nGraph框架中基于所述OpenCL标准API库创建用于初始化OpenCL环境的OpenCL环境初始化子模块;
在所述nGraph框架中基于所述OpenCL标准API库创建用于获取FPGA后端设备的FPGA后端设备获取子模块。
可选的,所述在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块,包括:
在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间的FPGA缓存空间开辟子模块;
在所述nGraph框架中基于所述OpenCL标准API库创建用于将待处理数据从主机端写入所述FPGA缓存的写FPGA缓存子模块;
在所述nGraph框架中基于所述OpenCL标准API库创建用于从所述FPGA缓存中读取数据处理结果并将数据处理结果传回所述主机端的读FPGA缓存子模块。
可选的,所述在所述nGraph框架中基于所述OpenCL标准API库创建用于创建OP kernel以及编译所述OP kernel的OP kernel实现模块,包括:
在所述nGraph框架中创建用于创建支持FPGA后端OP kernel的OP kernel创建子模块;
在所述nGraph框架中创建用于编译所述OP kernel并获取编译得到的aocx文件的OP kernel编译子模块。
可选的,在所述nGraph框架中基于所述OpenCL标准API库创建用于注册并调度执行所述OP kernel的FPGA编译执行模块,包括:
在所述nGraph框架中创建用于注册所述OP kernel的OP kernel注册子模块;
在所述nGraph框架中创建用于优化计算图的计算图优化子模块;
在所述nGraph框架中创建用于确定所述计算图中各OP的执行顺序,根据所述执行顺序启动所述OP kernel进行计算的OP kernel调度执行子模块。
为解决上述技术问题,本申请还提供了一种实现nGraph框架支持FPGA后端设备的装置,包括:
集成单元,用于将OpenCL标准API库集成到nGraph框架中;
第一创建单元,用于在所述nGraph框架中基于所述OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;
第二创建单元,用于在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;
第三创建单元,用于在所述nGraph框架中基于所述OpenCL标准API库创建用于创建OP kernel以及编译所述OP kernel的OP kernel实现模块;
第四创建单元,用于在所述nGraph框架中基于所述OpenCL标准API库创建注册并调度执行所述OP kernel的FPGA编译执行模块。
为解决上述技术问题,本申请还提供了一种支持FPGA后端设备的nGraph框架,包括:
OpenCL标准API库;
FPGA后端设备创建模块,用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备;
FPGA缓存空间处理模块,用于开辟FPGA缓存空间以及读写FPGA缓存;
OP kernel实现模块,用于创建OP kernel以及编译所述OP kernel;
FPGA编译执行模块,用于注册并调度执行所述OP kernel。
可选的,所述FPGA后端设备创建模块包括:
FPGA后端设备注册子模块,用于注册FPGA后端设备;
OpenCL环境初始化子模块,用于初始化OpenCL环境;
FPGA后端设备获取子模块,用于获取FPGA后端设备的。
为解决上述技术问题,本申请还提供了一种实现nGraph框架支持FPGA 后端设备的设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现如上所述的实现nGraph框架支持FPGA后端设备的方法的步骤。
本申请所提供的实现nGraph框架支持FPGA后端设备的方法,包括:将OpenCL标准API库集成到nGraph框架中;在所述nGraph框架中基于所述OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;在所述nGraph框架中基于所述OpenCL标准API库创建用于创建OP kernel以及编译所述OP kernel的OP kernel实现模块;在所述nGraph框架中基于所述OpenCL标准API库创建用于注册并调度执行所述OP kernel的FPGA编译执行模块。
可见,本申请所提供的实现nGraph框架支持FPGA后端设备的方法,在nGraph框架中集成OpenCL标准API库,并基于OpenCL标准API库创建FPGA后端设备创建模块、FPGA缓存空间处理模块、OP kernel实现模块以及FPGA编译执行模块。在完成将OpenCL标准API库集成到nGraph框架中以及在nGraph框架中创建上述各模块后,便可使nGraph框架支持FPGA后端设备。用户在编程开发过程中,只需在创建后端设备时,将后端设备指定为FPGA,后续通过上述各模块执行相应的操作,即可使用FPGA后端设备对用户所构建的深度学习神经网络进行训练或推理加速。
本申请所提供的实现nGraph框架支持FPGA后端设备的装置以及设备均具有上述技术效果。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对现有技术和实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种实现nGraph框架支持FPGA后端设备 的方法的流程示意图;
图2为本申请实施例所提供的一种实现nGraph框架支持FPGA后端设备的功能模块示意图;
图3为本申请实施例所提供的一种nGraph client开发程序与FPGA后端设备的关联示意图。
具体实施方式
本申请的核心是提供一种实现nGraph框架支持FPGA后端设备的方法,能够使nGraph框架支持FPGA后端设备,以进一步实现将用户基于nGraph框架构建的深度学习神经网络计算图的训练或推理过程部署到FPGA后端设备进行加速的目的。本申请的另一核心是提供一种实现nGraph框架支持FPGA后端设备的装置、设备以及一种支持FPGA后端设备的nGraph框架,同样具有上述技术效果。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。调用本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参考图1,图1为本申请实施例所提供的一种实现nGraph框架支持FPGA后端设备的方法的流程示意图,参考图1所示,该方法包括:
S101:将OpenCL标准API库集成到nGraph框架中;
具体的,OpenCL(Open Computing Language,开放运算语言)是针对异构装置进行并行化计算的标准API(Application Programming Interface,应用程序接口)及编程语言。较之传统FPGA算法开发与HLS开发,基于OpenCL高层次综合编程软件开发FPGA后端设备能够大大简化FPGA开发流程,缩短开发周期。本步骤旨在将OpenCL标准API库集成到nGraph框架中,以供后续FPGA后端设备的开发使用。参考图2所示,可创建OpenCL标准API库集成模块并通过OpenCL标准API库集成模块将OpenCL标准API库集成到nGraph框架中。
其中,将OpenCL标准API库集成到nGraph框架中可以包括:在nGraph框架的源码中添加OpenCL标准API库;修改nGraph框架的cmake编译文件,将OpenCL标准API库编译为nGraph框架内的动态链接库。
具体而言,首先在nGraph框架的源码中添加OpencCL标准API库,由于OpenCL标准API库用于FPGA后端设备开发使用,因此,在nGraph框架的源码中,将OpenCL标准API库添加到与FPGA后端设备位于相同目录的位置。在nGraph框架的源码中添加OpenCL标准API库后,进一步修改nGraph框架的cmake编译文件,将OpenCL标准API库编译为nGraph框架内的动态链接库。如此,OpenCL标准API库便与nGraph框架集成为一体,可供nGraph框架中的其他模块使用。
S102:在nGraph框架中基于OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;
具体的,本步骤旨在在nGraph框架中创建FPGA后端设备创建模块,该FPGA后端设备创建模块用于在nGraph框架中注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备。
其中,在nGraph框架中基于OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块,包括:在nGraph框架中创建用于注册FPGA后端设备的FPGA后端设备注册子模块;在nGraph框架中基于OpenCL标准API库创建用于初始化OpenCL环境的OpenCL环境初始化子模块;在nGraph框架中基于OpenCL标准API库创建用于获取FPGA后端设备的FPGA后端设备获取子模块。
具体而言,本实施例中FPGA后端设备创建模块包括FPGA后端设备注册子模块、OpenCL环境初始化子模块以及FPGA后端设备获取子模块。
FPGA后端设备注册子模块基于nGraph框架所提供的函数BackendManager::register_backend(),通过构造FPGA后端设备全局注册函数ngraph_register_fpga_backend()实现向nGraph框架中注册FPGA后端设备。
OpenCL环境初始化子模块的功能实现依赖于OpenCL标准API库,其通过调用OpenCL标准API库函数,完成OpenCL环境的初始化。
FPGA后端设备获取子模块的功能实现同样依赖于OpenCL标准API库,其通过调用OpencCL标准API库函数完成FPGA后端设备的获取,以供后续使用。
S103:在nGraph框架中基于OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;
具体的,本步骤旨在在nGraph框架中创建FPGA缓存空间处理模块,该FPGA缓存空间处理模块用于开辟FPGA缓存空间,以及读、写FPGA缓存,即将数据写入FPGA缓存以及从FPGA缓存读取数据。
其中,在nGraph框架中基于OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块,包括:在nGraph框架中基于OpenCL标准API库创建用于为待处理数据开辟FPGA缓存空间的FPGA缓存空间开辟子模块;在nGraph框架中基于OpenCL标准API库创建用于将待处理数据从主机端写入FPGA缓存空间的写FPGA缓存子模块;在nGraph框架中基于OpenCL标准API库创建用于从FPGA缓存中读取数据处理结果并将数据处理结果传回主机端的读FPGA缓存子模块。
具体而言,本实施例中FPGA缓存空间处理模块包括FPGA缓存空间开辟子模块、写FPGA缓存子模块以及读FPGA缓存子模块。
FPGA缓存空间开辟子模块主要用于开辟FPGA缓存空间,还用于创建FPGA Tensor,计算待处理Tensor数据所需空间大小,以及布局Tensor数据。其中,开辟FPGA缓存空间的过程依赖于OpenCL标准API库,其通过调用OpenCL标准API库函数而完成开辟FPGA缓存空间的操作。
写FPGA缓存子模块通过调用OpenCL标准API库函数,将待处理数据从主机端写入FPGA后端设备的缓存,以便FPGA后端设备对此数据进行计算。
读FPGA缓存子模块通过调用OpenCL标准API库,从FPGA后端设备的缓存中读取计算结果,并将计算结果传回主机端。
S104:在nGraph框架中基于OpenCL标准API库创建用于创建OP kernel以及编译OP kernel的OP kernel实现模块;
具体的,本步骤旨在在nGraph框架中创建OP kernel实现模块,该OP kernel实现模块主要用于创建OP kernel以及编译OP kernel。OP表示计算图 中的计算节点,OP在FPGA后端设备上的kernel称之为OP kernel。
其中,在nGraph框架中基于OpenCL标准API库创建用于创建OP kernel以及编译OP kernel的OP kernel实现模块,包括:在nGraph框架中创建用于创建FPGA支持的OP kernel的OP kernel创建子模块;在nGraph框架中创建用于编译OP kernel并获取编译得到的aocx文件的OP kernel编译子模块。
具体而言,本实施例中OP kernel实现模块包括OP kernel创建子模块与OP kernel编译子模块。OP kernel创建子模块采用OpenCL高层次编程语言,编写FPGA支持的各OP的前向计算与反向计算的kernel,并对kernel进行并行优化。OP kernel编译子模块使用aoc对所创建的各OP kernel进行编译,并获取aocx文件,将aocx文件放置到nGraph框架中新增的FPGA目录下,供后续使用。
S105:在nGraph框架中基于OpenCL标准API库创建用于注册并调度执行OP kernel的FPGA编译执行模块。
具体的,本步骤旨在在nGraph框架中创建FPGA编译执行模块,该FPGA编译执行模块用于注册并调度执行OP kernel。
其中,在nGraph框架中基于OpenCL标准API库创建用于注册并执行OP kernel的FPGA编译执行模块,包括:在nGraph框架中创建用于注册OP kernel的OP kernel注册子模块;在nGraph框架中创建用于优化计算图的计算图优化子模块;在nGraph框架中创建用于确定计算图中各OP的执行顺序,并根据执行顺序启动OP kernel进行计算的OP kernel调度执行子模块。
具体而言,本实施例中FPGA编译执行模块包括OP kernel注册子模块、计算图优化子模块以及OP kernel调度执行子模块。
OP kernel注册子模块通过定义FPGA kernel注册列表与FPGA kernel注册函数完成对OP kernel的注册,以便后续FPGA后端设备识别与调用。
计算图优化子模块通过复用nGraph框架中图的pass优化部分代码,并添加针对FPGA后端设备的优化pass对用户端创建的计算图进行优化操作,以提升训练性能。
OP kernel调度执行子模块用于确定计算图中各OP的执行顺序,从注册列表中查找待计算OP,并根据各OP的执行顺序,启动OP kernel进行计算。
完成上述各步骤后,便可实现nGraph框架对FPGA后端设备的支持。参 考图3所示,nGraph client用户可以按照原来的编程习惯进行编程开发,只需在创建后端设备时将后端设备指定为"FPGA",即可使用FPGA后端设备对用户构建的深度学习神经网络进行训练或推理加速。具体而言,用户使用FPGA后端设备只需在调用Backend::create()函数创建backend时使用Backend::create(“FPGA”)指明要使用的后端设备是FPGA设备即可。后端设备指定为FPGA后,后续流程会自动调用到上述步骤所创建各个模块执行相应的操作,最终实现将用户基于nGraph框架构建的深度学习神经网络计算图的训练或推理过程部署到FPGA后端设备进行加速的目的。
如图3所示,nGraph Client端用户程序在构建function计算图的基础上,使用Backend::create(“FPGA”)创建FPGA Backend后端,进而FPGA后端设备创建模块进行注册FPGA后端、初始化OpenCL环境以及获取FPGA设备的操作。nGraph Client端用户程序执行creat_tensor()函数、write()函数以及Read()函数时,FPGA缓存空间处理模块对应的分别进行开辟FPGA存储空间、写FPGA buffer与读FPGA buffer的操作。nGraph Client端用户程序执行compile()函数时FPGA编译执行模块执行Op kernel注册、计算图优化以及Op kernel调度的操作,nGraph Client端用户程序执行call()函数时FPGA编译执行模块执行从注册列表中找到Op kernel以及启动kernel的操作。进一步,OP kernel实现模块编译所述OP kernel得到的aocx可执行文件。
综上所述,本申请所提供的实现nGraph框架支持FPGA后端设备的方法,在nGraph框架中集成OpenCL标准API库,并基于OpenCL标准API库创建FPGA后端设备创建模块、FPGA缓存空间处理模块、OP kernel实现模块以及FPGA编译执行模块。在完成将OpenCL标准API库集成到nGraph框架中以及在nGraph框架中创建上述各模块后,便可使nGraph框架支持FPGA后端设备。用户在编程开发过程中,只需在创建后端设备时,将后端设备指定为FPGA,后续通过上述各模块执行相应的操作,即可使用FPGA后端设备对用户所构建的深度学习神经网络进行训练或推理加速。
本申请还提供了一种实现nGraph框架支持FPGA后端设备的装置,下文描述的该装置可以与上文描述的方法相互对应参照。该装置包括:
集成单元,用于将OpenCL标准API库集成到nGraph框架中;
第一创建单元,用于在nGraph框架中基于OpenCL标准API库创建注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;
第二创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;
第三创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于创建OP kernel以及编译OP kernel的OP kernel实现模块;
第四创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于注册并调度执行OP kernel的FPGA编译执行模块。
在上述实施例的基础上,可选的,集成单元包括:
添加单元,用于在nGraph框架的源码中添加OpencCL标准API库;
修改单元,用于修改nGraph框架的cmake编译文件,将OpencCL标准API库编译为nGraph框架内的动态链接库。
在上述实施例的基础上,可选的,第一创建单元包括:
注册子模块创建单元,用于在nGraph框架中创建用于注册FPGA后端设备的FPGA后端设备注册子模块;
初始化子模块创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于初始化OpenCL环境的OpenCL环境初始化子模块;
获取子模块创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于获取FPGA后端设备的FPGA后端设备获取子模块。
在上述实施例的基础上,可选的,第二创建单元包括:
缓存空间开辟子模块创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于开辟FPGA缓存空间的FPGA缓存空间开辟子模块;
写缓存子模块创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于将待处理数据从主机端写入FPGA缓存的写FPGA缓存子模块;
读缓存子模块创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于从FPGA缓存中读取数据处理结果并将数据处理结果传回主机端的读FPGA缓存子模块。
在上述实施例的基础上,可选的,第三创建单元包括:
kernel创建子模块创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于创建FPGA支持的OP kernel的OP kernel创建子模块;
kernel编译子模块创建单元,用于在nGraph框架中基于OpenCL标准API库创建用于编译OP kernel并获取编译得到的aocx文件的OP kernel编译子模块。
在上述实施例的基础上,可选的,第四创建单元包括:
kernel注册子模块创建单元,用于在nGraph框架中创建用于注册OP kernel的OP kernel注册子模块;
计算图优化子模块创建单元,用于在nGraph框架中创建用于优化计算图的计算图优化子模块;
kernel调度执行子模块创建单元,用于在nGraph框架中创建用于确定计算图中各OP的执行顺序,根据执行顺序启动OP kernel进行计算的OP kernel调度执行子模块。
本申请还提供了一种支持FPGA后端设备的nGraph框架,包括:
OpenCL标准API库;
FPGA后端设备创建模块,用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备;
FPGA缓存空间处理模块,用于开辟FPGA缓存空间以及读写FPGA缓存;
OP kernel实现模块,用于创建OP kernel以及编译OP kernel;
FPGA编译执行模块,用于注册并调度执行OP kernel。
在上述实施例的基础上,可选的,FPGA后端设备创建模块包括:
FPGA后端设备注册子模块,用于注册FPGA后端设备;
OpenCL环境初始化子模块,用于初始化OpenCL环境;
FPGA后端设备获取子模块,用于获取FPGA后端设备的。
在上述实施例的基础上,可选的,FPGA缓存空间处理模块包括:
FPGA缓存空间开辟子模块,用于开辟FPGA缓存空间;
写FPGA缓存子模块,用于用于将待处理数据从主机端写入FPGA缓存;
读FPGA缓存子模块,用于从FPGA缓存中读取数据处理结果并将数据处理结果传回主机端。
在上述实施例的基础上,可选的,OP kernel实现模块包括:
OP kernel创建子模块,用于创建支持FPGA后端OP kernel;
OP kernel编译子模块,用于编译OP kernel并获取编译得到的aocx文件。
在上述实施例的基础上,可选的,FPGA编译执行模块包括:
OP kernel注册子模块,用于注册OP kernel;
计算图优化子模块,用于优化计算图;
OP kernel调度执行子模块,用于确定计算图中各OP的执行顺序,根据执行顺序启动OP kernel进行计算。
本申请还提供了一种实现nGraph框架支持FPGA后端设备的设备,该设备包括存储器和处理器。其中,存储器,用于存储计算机程序;处理器,用于执行计算机程序实现如下的步骤:
将OpenCL标准API库集成到nGraph框架中;在nGraph框架中基于OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;在nGraph框架中基于OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;在nGraph框架中基于OpenCL标准API库创建用于创建OP kernel以及编译OP kernel的OP kernel实现模块;在nGraph框架中基于OpenCL标准API库创建用于注册并调度执行OP kernel的FPGA编译执行模块。
对于本申请所提供的设备的介绍请参照上述方法实施例,本申请在此不做赘述。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置、设备以及计算机可读存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的技术方案进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围。

Claims (10)

  1. 一种实现nGraph框架支持FPGA后端设备的方法,其特征在于,包括:
    将OpenCL标准API库集成到nGraph框架中;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于创建OP kernel以及编译所述OP kernel的OP kernel实现模块;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于注册并调度执行所述OP kernel的FPGA编译执行模块。
  2. 根据权利要求1所述的方法,其特征在于,所述将OpenCL标准API库集成到nGraph框架中,包括:
    在nGraph框架的源码中添加所述OpencCL标准API库;
    修改所述nGraph框架的cmake编译文件,将所述OpencCL标准API库编译为所述nGraph框架内的动态链接库。
  3. 根据权利要求2所述的方法,其特征在于,所述在所述nGraph框架中基于所述OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块,包括:
    在所述nGraph框架中创建用于注册FPGA后端设备的FPGA后端设备注册子模块;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于调用初始化OpenCL环境的OpenCL环境初始化子模块;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于获取FPGA后端设备的FPGA后端设备获取子模块。
  4. 根据权利要求3所述的方法,其特征在于,所述在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块,包括:
    在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间的FPGA缓存空间开辟子模块;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于将待处理数据从主机端写入所述FPGA缓存的写FPGA缓存子模块;
    在所述nGraph框架中基于所述OpenCL标准API库创建用于从所述FPGA缓存中读取数据处理结果并将数据处理结果传回所述主机端的读FPGA缓存子模块。
  5. 根据权利要求4所述的方法,其特征在于,所述在所述nGraph框架中基于所述OpenCL标准API库创建用于创建OP kernel以及编译所述OP kernel的OP kernel实现模块,包括:
    在所述nGraph框架中创建用于支持FPGA后端OP kernel的OP kernel创建子模块;
    在所述nGraph框架中创建用于编译所述OP kernel并获取编译得到的aocx文件的OP kernel编译子模块。
  6. 根据权利要求5所述的方法,其特征在于,在所述nGraph框架中基于所述OpenCL标准API库创建用于注册并调度执行所述OP kernel的FPGA编译执行模块,包括:
    在所述nGraph框架中创建用于注册所述OP kernel的OP kernel注册子模块;
    在所述nGraph框架中创建用于优化计算图的计算图优化子模块;
    在所述nGraph框架中创建用于确定所述计算图中各OP的执行顺序,根据所述执行顺序启动所述OP kernel进行计算的OP kernel调度执行子模块。
  7. 一种实现nGraph框架支持FPGA后端设备的装置,其特征在于,包括:
    集成单元,用于将OpenCL标准API库集成到nGraph框架中;
    第一创建单元,用于在所述nGraph框架中基于所述OpenCL标准API库创建用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备的FPGA后端设备创建模块;
    第二创建单元,用于在所述nGraph框架中基于所述OpenCL标准API库创建用于开辟FPGA缓存空间以及读写FPGA缓存的FPGA缓存空间处理模块;
    第三创建单元,用于在所述nGraph框架中基于所述OpenCL标准API库创建用于创建OP kernel以及编译所述OP kernel的OP kernel实现模块;
    第四创建单元,用于在所述nGraph框架中创建用于注册并调度执行所述OP kernel的FPGA编译执行模块。
  8. 一种支持FPGA后端设备的nGraph框架,其特征在于,包括:
    OpenCL标准API库;
    FPGA后端设备创建模块,用于注册FPGA后端设备、初始化OpenCL环境以及获取FPGA后端设备;
    FPGA缓存空间处理模块,用于开辟FPGA缓存空间以及读写FPGA缓存;
    OP kernel实现模块,用于创建OP kernel以及编译所述OP kernel;
    FPGA编译执行模块,用于注册并调度执行所述OP kernel。
  9. 根据权利要求8所述的nGraph框架,其特征在于,所述FPGA后端设备创建模块包括:
    FPGA后端设备注册子模块,用于注册FPGA后端设备;
    OpenCL环境初始化子模块,用于初始化OpenCL环境;
    FPGA后端设备获取子模块,用于获取FPGA后端设备的。
  10. 一种实现nGraph框架支持FPGA后端设备的设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至6任一项所述的实现nGraph框架支持FPGA后端设备的方法的步骤。
PCT/CN2020/123809 2020-08-20 2020-10-27 一种实现nGraph框架支持FPGA后端设备的方法 WO2022036852A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/012,924 US11762721B2 (en) 2020-08-20 2020-10-27 Method for realizing nGraph framework supporting FPGA rear-end device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010844796.4 2020-08-20
CN202010844796.4A CN112001494A (zh) 2020-08-20 2020-08-20 一种实现nGraph框架支持FPGA后端设备的方法

Publications (1)

Publication Number Publication Date
WO2022036852A1 true WO2022036852A1 (zh) 2022-02-24

Family

ID=73473958

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123809 WO2022036852A1 (zh) 2020-08-20 2020-10-27 一种实现nGraph框架支持FPGA后端设备的方法

Country Status (3)

Country Link
US (1) US11762721B2 (zh)
CN (1) CN112001494A (zh)
WO (1) WO2022036852A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988287A (zh) * 2021-09-30 2022-01-28 浪潮电子信息产业股份有限公司 基于nGraph框架实现分布式神经网络训练的方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190324810A1 (en) * 2018-04-20 2019-10-24 EMC IP Holding Company LLC Method, device and computer readable medium for scheduling dedicated processing resource
CN110781126A (zh) * 2019-09-20 2020-02-11 苏州浪潮智能科技有限公司 一种TensorFlow的FPGA异构加速实现方法、系统、终端及存储介质
WO2020087072A1 (en) * 2018-10-26 2020-04-30 Tensil AI Company Method and apparatus for compiling computation graphs into an integrated circuit
CN111124656A (zh) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 用于向专用计算资源分配任务的方法、设备和计算机程序产品

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103992340B (zh) 2014-05-28 2016-06-22 贵州大学 单取代六甲基六元瓜环-稀土加合物及合成方法和应用
US9983857B2 (en) * 2015-06-16 2018-05-29 Architecture Technology Corporation Dynamic computational acceleration using a heterogeneous hardware infrastructure
CN106528171B (zh) * 2016-11-24 2019-09-24 郑州云海信息技术有限公司 一种异构计算平台子系统间的接口设计方法、装置及系统
CN109447256A (zh) * 2018-09-12 2019-03-08 上海交通大学 基于FPGA的Tensorflow系统加速的设计方法
CN111490946B (zh) * 2019-01-28 2023-08-11 阿里巴巴集团控股有限公司 基于OpenCL框架的FPGA连接实现方法及装置
US10761821B1 (en) * 2019-03-27 2020-09-01 Sap Se Object oriented programming model for graphics processing units (GPUS)
CN110399234A (zh) * 2019-07-10 2019-11-01 苏州浪潮智能科技有限公司 一种任务加速处理方法、装置、设备及可读存储介质
CN110929883A (zh) * 2019-11-22 2020-03-27 苏州浪潮智能科技有限公司 一种在TensorFlow中支持FPGA训练的方法和装置
CN111198843B (zh) * 2019-12-19 2023-03-28 西安交通大学 一种基于应用处理器片上总线控制的文件系统写加速方法
CN111459871A (zh) * 2020-04-01 2020-07-28 济南浪潮高新科技投资发展有限公司 一种基于fpga异构计算的区块链加速系统及方法
US20220365833A1 (en) * 2021-05-13 2022-11-17 Nvidia Corporation Application programming interface to compress data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190324810A1 (en) * 2018-04-20 2019-10-24 EMC IP Holding Company LLC Method, device and computer readable medium for scheduling dedicated processing resource
WO2020087072A1 (en) * 2018-10-26 2020-04-30 Tensil AI Company Method and apparatus for compiling computation graphs into an integrated circuit
CN111124656A (zh) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 用于向专用计算资源分配任务的方法、设备和计算机程序产品
CN110781126A (zh) * 2019-09-20 2020-02-11 苏州浪潮智能科技有限公司 一种TensorFlow的FPGA异构加速实现方法、系统、终端及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HEART OF THE MACHINE PRO: "Intel Open Source nGraph Compiler: Easy Model Deployment from Multiple Frameworks to Multiple Devices Heart of the Machine Pro Github address", 21 March 2018 (2018-03-21), pages 1 - 5, XP055902493, Retrieved from the Internet <URL:https://baijiahao.baidu.com/s?id=1595539624466877556&wfr=spider&for=pc&searchword=ngraph> [retrieved on 20220317] *

Also Published As

Publication number Publication date
CN112001494A (zh) 2020-11-27
US20230267024A1 (en) 2023-08-24
US11762721B2 (en) 2023-09-19

Similar Documents

Publication Publication Date Title
Ragan-Kelley et al. Halide: Decoupling algorithms from schedules for high-performance image processing
US10949182B2 (en) Systems and methods for generating code for parallel processing units
US11941400B2 (en) Methods and apparatus for intentional programming for heterogeneous systems
US8631395B2 (en) Inter-procedural dead catch handler optimizations
Membarth et al. Generating device-specific GPU code for local operators in medical imaging
Lee et al. OpenMPC: extended OpenMP for efficient programming and tuning on GPUs
US20220222226A1 (en) Integration of model execution engine containers with a model development environment
Araujo et al. NAS Parallel Benchmarks with CUDA and beyond
Vocke et al. Extending halide to improve software development for imaging dsps
WO2022036852A1 (zh) 一种实现nGraph框架支持FPGA后端设备的方法
US20240103821A1 (en) Optimising computer program code
Zhang et al. Snowflake: A lightweight portable stencil dsl
CN112219192A (zh) 用于使用元数据在神经网络中进行机会性负载平衡的方法和系统
Falch et al. ImageCL: An image processing language for performance portability on heterogeneous systems
CN113688982A (zh) 处理单元、相关装置和方法
Huang et al. Specialized accelerators and compiler flows: Replacing accelerator apis with a formal software/hardware interface
CN115983378A (zh) 一种机器学习操作系统内核的自动化编译方法
de Carvalho et al. Exploring heterogeneous mobile architectures with a high-level programming model
US11782706B1 (en) Reconfigurable neural network processing based on subgraph recognition
CN105700854B (zh) 运行应用任务的方法及装置
Cho A program optimization method for embedded software developed using open sources
Huang et al. Application-level Validation of Accelerator Designs Using a Formal Software/Hardware Interface
CN111340175A (zh) 图重写的处理方法及装置、计算设备及可读介质
Honorat et al. Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow
Acosta et al. Towards the optimal execution of Renderscript applications in Android devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20950056

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20950056

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20950056

Country of ref document: EP

Kind code of ref document: A1