WO2022001128A1 - Fpga板卡内存数据的读取方法、装置及介质 - Google Patents

Fpga板卡内存数据的读取方法、装置及介质 Download PDF

Info

Publication number
WO2022001128A1
WO2022001128A1 PCT/CN2021/076885 CN2021076885W WO2022001128A1 WO 2022001128 A1 WO2022001128 A1 WO 2022001128A1 CN 2021076885 W CN2021076885 W CN 2021076885W WO 2022001128 A1 WO2022001128 A1 WO 2022001128A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
ddr
space
memory
fpga board
Prior art date
Application number
PCT/CN2021/076885
Other languages
English (en)
French (fr)
Inventor
樊嘉恒
王彦伟
阚宏伟
郝锐
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Priority to US18/012,921 priority Critical patent/US11687242B1/en
Publication of WO2022001128A1 publication Critical patent/WO2022001128A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Definitions

  • the present application relates to the technical field of data access, and in particular, to a method, a device and a computer-readable storage medium for reading memory data of an FPGA board card.
  • FPGA device belongs to a kind of semi-custom circuit in application-specific integrated circuit. It is a programmable logic array, including programmable input and output unit, configurable logic block, digital clock management module, embedded block RAM, wiring resources, embedded special Hard core, with embedded functional units at the bottom. Because FPGA has the characteristics of abundant wiring resources, reprogrammable and high integration, and low investment, it has been widely used in the field of digital circuit design.
  • a heterogeneous computing platform composed of FPGA boards, CPU (Central Processing Unit, central processing unit), GPU (Graphics Processing Unit, image processor) and other processors can greatly improve data processing efficiency and performance, especially complex data processing processes. And is widely used in all walks of life.
  • Opencl Open Computing Language
  • Kernels Kernels functions
  • APIs Application Programming Interface
  • Kernels are functions that run on Opencl devices
  • Opencl provides a parallel computing mechanism based on task division and data division.
  • Opencl is divided into two parts, one part is the program on the host side, and the other part is the Kernel program on the FPGA side.
  • the program steps on the host side are:
  • the data is stored in the DDR0 memory of the FPGA board.
  • the FPGA board reads data, it reads the data through one of its own DDR controllers, while other DDR memory And the DDR controller is idle, the data reading speed of the FPGA board is not high, and the resource utilization rate is poor.
  • the present application provides a method, device and computer-readable storage medium for reading memory data of an FPGA board, which can effectively improve the efficiency of reading data and resource utilization of the FPGA board, thereby improving the overall operation efficiency and reducing the system data processing delay .
  • One aspect of the embodiments of the present invention provides a method for reading memory data of an FPGA board, which is applied to an FPGA board, including:
  • the data to be calculated is processed into data slices based on the data space application request;
  • the data space application request carries the dedicated application space capacity of each DDR and the data to be calculated , the total number of slices of the data to be calculated is not greater than the total number of the DDR memory;
  • the data of each slice is transferred to the corresponding DDR space, and the data is read and calculated in parallel according to the data storage location of the slice data in each slice of DDR.
  • performing data slicing processing on the data to be calculated based on the data space application request includes:
  • the transmitting data of each slice to the corresponding DDR space is:
  • Each slice data is transferred into the corresponding DDR space through direct memory access.
  • the transmitting the data of each slice to the corresponding DDR space, and reading the data in parallel according to the data storage location of the slice data in each piece of DDR and calculating include:
  • Kernel to read the corresponding data in parallel according to the data storage address on each piece of DDR to perform calculation.
  • the embodiment of the present invention also provides a device for reading memory data of an FPGA board, which is applied to the FPGA board, including:
  • the data feedback module is used to send the number of controllers and the total number of DDR memories to the host when receiving a hardware information acquisition request from the host;
  • a data slicing module configured to perform data slicing processing on the to-be-computed data based on the data space application request when receiving the data space application request from the host;
  • the data space application request carries the dedicated application space capacity of each DDR and the data to be calculated, the total number of slices of the data to be calculated is not greater than the total number of the DDR memory;
  • the data storage module is used to transmit the data of each slice to the corresponding DDR space
  • the data reading module is used to read and calculate the data in parallel according to the data storage location of the sliced data in each piece of DDR.
  • the data slicing module includes:
  • an information reading submodule used for reading the dedicated application space capacity of each DDR from the data space application request
  • the judgment sub-module is used to judge whether the dedicated application space capacity of each DDR is the same;
  • Matching slice sub-modules if the dedicated application space capacity of each DDR is different, for the dedicated application space capacity of each DDR slice, the data to be calculated is cut into data slices with the same value as the dedicated application space capacity value of the current DDR slice, and setting identification information for the data slice, so as to identify that the data in the data slice is stored in the DDR memory space of the current slice.
  • the data reading module includes:
  • the address feedback sub-module is used to transmit the data of each slice to the corresponding DDR space, so that each DDR space transmits the structure address stored by the data source to the Kernel;
  • the data parallel reading sub-module is used to call the Kernel to read the corresponding data in parallel according to the data storage address on each DDR to perform calculation.
  • Another aspect of the embodiments of the present invention provides a method for reading memory data of an FPGA board, which is applied to a host, including:
  • An embodiment of the present invention also provides a device for reading memory data of an FPGA board, including a processor, where the processor is configured to read the memory data of the FPGA board as described in any preceding item when executing a computer program stored in the memory Take the steps of the method.
  • Embodiments of the present invention finally provide a computer-readable storage medium, where a program for reading memory data of the FPGA board is stored on the computer-readable storage medium, and the program for reading the memory data of the FPGA board is stored by a processor
  • the steps of implementing the method for reading the memory data of the FPGA board card described in any of the preceding items are implemented during execution.
  • the advantage of the technical solution provided in this application is that it changes the way that Opencl applies for the memory of the FPGA board, first slicing the calculation data, and then copying the data to the corresponding DDR memory space of the FPGA board, making full use of the support of the FPGA board.
  • the data to be calculated can be read simultaneously through multiple DDR controllers, which can effectively improve the data reading efficiency, maximize the use of existing software and hardware resources, improve resource utilization, and thus improve the overall operating efficiency. Reduce system data processing delay.
  • the embodiments of the present invention also provide a corresponding implementation device and a computer-readable storage medium for the method for reading the memory data of the FPGA board card, which further makes the method more practical.
  • the device and the computer-readable storage medium have corresponding advantages.
  • FIG. 1 is a schematic diagram of data reading of an exemplary application scenario in the prior art provided by an embodiment of the present invention
  • Fig. 2 is the schematic flow chart of the reading method of a kind of FPGA board memory data provided by the embodiment of the present invention
  • FIG. 3 is an interactive schematic diagram of a method for reading memory data of an FPGA board card provided by an embodiment of the present invention
  • FIG. 4 is a structural diagram of a specific implementation of a device for reading memory data of an FPGA board card provided by an embodiment of the present invention
  • FIG. 5 is a structural diagram of another specific implementation manner of an apparatus for reading memory data of an FPGA board card according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a method for reading memory data of an FPGA board provided by an embodiment of the present invention.
  • the execution subject of the embodiment of the present invention is an FPGA board, which may include the following content:
  • the host is responsible for scheduling, the FPGA board is responsible for computing data processing, and the host needs to send the data to be calculated to the FPGA board.
  • the data processing structure is fed back to the host. Since this application aims to improve the data reading efficiency by changing the storage method of the data to be calculated in the FPGA board, the host side needs to obtain the FPGA board that constitutes a heterogeneous computing platform before sending the data to be calculated on the FPGA board.
  • the hardware information includes the total number of DDR memories on the FPGA board and the total number of controllers. Generally speaking, one controller controls the data read from one DDR memory.
  • S202 When receiving a data space application request from the host, perform data slicing processing on the data to be calculated based on the data space application request.
  • the host after receiving the hardware information fed back by the FPGA board, the host needs to apply to the FPGA board for storage space for the data to be calculated.
  • the related technology stores the data to be calculated in the same storage space. Fetching data is controlled by one controller, other memory spaces are not used and the controller is also idle, which wastes resources seriously.
  • the host side of this application calculates the space capacity occupied by applying to the FPGA board for separately storing the data to be calculated in multiple pieces of DDR memory, that is, calculating the space capacity of each DDR memory.
  • the dedicated application space capacity of the chip DDR, and the dedicated application space capacity of each DDR is used to store a corresponding part of the data to be calculated.
  • the host side sends a data space application request to the FPGA board, it will carry the dedicated application space capacity of each DDR and the data to be calculated in the request, so that the FPGA board can specify the data to be calculated and its storage requirements to be processed.
  • the data to be calculated may be equally divided into all DDRs in the FPGA board, or may be stored in several of the DDRs, which is not limited in this application.
  • the FPGA board slices the data to be calculated, the total number of slices of the data to be calculated is not greater than the total number of DDR memories.
  • the data to be calculated is divided into a plurality of sub-data or called data slices in S202, the data of each slice can be transferred to the corresponding DDR space through the direct memory access DMA.
  • S204 Read and calculate data in parallel according to the data storage location of the slice data in each piece of DDR.
  • each DDR space can pass the address of the structure stored by the data source, that is, the storage location of the data slice to the Kernel , the FPGA board can obtain the storage location information of each data slice of the data to be calculated from the Kernel.
  • the data storage location of each data slice can carry the identification information of the data to be calculated and the identification information of the corresponding DDR. Since the hardware of the FPGA board supports multiple DDR controllers, and the FPGA board has the ability to process data in parallel, the data on the DDR memory can be read through multiple DDR controllers at the same time, and the Kernel can be called according to each DDR controller.
  • the data storage address on the controller reads the corresponding data in parallel through the corresponding controller and processes the data to be calculated based on the calculation requirements.
  • the way in which Opencl applies for the memory of the FPGA board card is changed, and the calculation data is first sliced, and then the data is copied to the corresponding DDR memory space of the FPGA board card, and the FPGA board card is fully utilized. It supports the advantages of multiple DDR control and its own parallel processing, and reads the data to be calculated through multiple DDR controllers at the same time, which effectively improves the data reading efficiency, maximizes the use of existing software and hardware resources, improves resource utilization, and improves overall operating efficiency. , reduce the system data processing delay.
  • a data slicing method is provided, which may include the following steps:
  • the FPGA board reads the dedicated application space capacity of each DDR from the data space application request. Determine whether the dedicated application space capacity of each DDR is the same; if so, divide the data to be calculated into n parts, where n is the total value of DDR memory; if not, for the dedicated application space capacity of each DDR, divide the data to be calculated All of them are cut into data slices with the same value as the dedicated application space capacity of the current slice DDR, and identification information is set for the data slices.
  • the identification information is used to identify that the data in the data slice is stored in the DDR memory space of the current slice, and based on the identification information of the data slice, it can be known in which DDR memory space the data slice is stored.
  • the data to be calculated can be equally divided into all DDRs, or it can be divided into less than the total number of DDR slices based on the data processing logic of the data to be calculated, not all of the DDR memory space is occupied, and the occupied space capacity of each DDR memory space is also different. For example, if step B and step C of the data to be calculated share the same multiple parameter values, then the calculation data for realizing step B and step C can be placed in the same data slice.
  • each DDR can be sorted according to the available space from large to small, and the dedicated application space of the DDR with the largest available space can be used. Large capacity, small available space The dedicated application space of DDR is small to ensure balanced storage of each DDR memory space.
  • the present application also provides a method for reading the memory data of the FPGA board based on the host, please refer to Figure 3, which may include:
  • the data space application request carries the dedicated application space capacity of each DDR and the data to be calculated, so that the FPGA board can slice the data to be calculated and store the corresponding data. DDR space.
  • the embodiments of the present invention effectively improve the data reading efficiency and resource utilization of the FPGA board, thereby improving the overall operation efficiency and reducing the system data processing delay.
  • the present application also provides a schematic example. If the space occupied by the data to be calculated is 1G, the FPGA board has 4 DDR memories and 4 controllers, and the FPGA The reading process of board memory data can include the following:
  • buff0 clCreateBuffer(context, CL_MEM_READ_ONLY, 256M, NULL, &status, ddr0);
  • buff1 clCreateBuffer(context, CL_MEM_READ_ONLY, 256M, NULL, &status, ddr1);
  • buff2 clCreateBuffer(context, CL_MEM_READ_ONLY, 256M, NULL, &status, ddr2);
  • buff3 clCreateBuffer(context, CL_MEM_READ_ONLY, 256M, NULL, &status, ddr3).
  • the embodiment of the present invention changes the way that Opencl applies for the FPGA memory, first slicing the calculation data, and then copying the data to each memory of the FPGA board respectively.
  • the efficiency of fetching data is increased by 4 times.
  • the embodiment of the present invention also provides a corresponding device for the method for reading the memory data of the FPGA board card, which further makes the method more practical.
  • the device for reading the memory data of the FPGA board card provided by the embodiment of the present invention is described below.
  • the device for reading the memory data of the FPGA board card described below is the same as the one described above.
  • the reading method of the memory data of the FPGA board card can be referred to each other.
  • FIG. 4 is a structural diagram of a device for reading memory data of an FPGA board card provided in an embodiment of the present invention in a specific implementation manner. Based on the FPGA board card, the device may include:
  • the data feedback module 401 is configured to send the number of controllers and the total number of DDR memories to the host when receiving a hardware information acquisition request from the host.
  • the data slicing module 402 is configured to perform data slicing processing on the data to be calculated based on the data space application request when receiving a data space application request from the host; the data space application request carries the dedicated application space capacity of each DDR and the data to be calculated, to be calculated.
  • the total number of slices of data is the same as the total number of DDR memory.
  • the data storage module 403 is used for transmitting the data of each slice to the corresponding DDR space.
  • the data reading module 404 is configured to read and calculate the data in parallel according to the data storage location of the slice data in each slice of DDR.
  • the data slicing module 402 may include:
  • the information reading sub-module is used to read the dedicated application space capacity of each DDR from the data space application request;
  • the judgment sub-module is used to judge whether the dedicated application space capacity of each DDR is the same;
  • Matching slice sub-modules if the dedicated application space capacity of each DDR is different, for the dedicated application space capacity of each DDR slice, the data to be calculated is divided into data slices with the same value as the dedicated application space capacity of the current DDR slice, and is The identification information of the data slice is set to identify that the data in the data slice is stored in the DDR memory space of the current slice.
  • the data storage module 403 may also be a module that transmits the data of each slice to the corresponding DDR space through direct memory access.
  • the data reading module 404 may further include, for example:
  • the address feedback sub-module is used to transmit the data of each slice to the corresponding DDR space, so that each DDR space transmits the structure address stored by the data source to the Kernel;
  • the data parallel reading sub-module is used to call the Kernel to read the corresponding data in parallel according to the data storage address on each piece of DDR for calculation.
  • the embodiments of the present invention effectively improve the data reading efficiency and resource utilization of the FPGA board, thereby improving the overall operation efficiency and reducing the system data processing delay.
  • FIG. 5 is a structural diagram of another device for reading memory data of an FPGA board card provided by an embodiment of the present application. As shown in Figure 5, the device includes a memory 50 for storing computer programs;
  • the processor 51 is configured to implement the steps of the method for reading data in the memory of the FPGA board as mentioned in any of the foregoing embodiments when executing the computer program.
  • the processor 51 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 51 may be implemented by at least one hardware form among DSP (Digital Signal Processing, digital signal processing) and PLA (Programmable Logic Array, programmable logic array).
  • the processor 51 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 51 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 51 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 50 may include one or more computer-readable storage media, which may be non-transitory. Memory 50 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash storage devices. In this embodiment, the memory 50 is at least used to store the following computer program 501, wherein, after the computer program is loaded and executed by the processor 51, it can realize the related information of the method for reading the memory data of the FPGA board disclosed in any of the foregoing embodiments step.
  • the resources stored in the memory 50 may also include an operating system 502, data 503, etc., and the storage mode may be short-term storage or permanent storage.
  • the operating system 502 may include Windows, Unix, Linux, and the like.
  • the data 503 may include, but is not limited to, data corresponding to the test results, and the like.
  • the device for reading data in the memory of the FPGA board may further include a display screen 52 , an input/output interface 53 , a communication interface 54 , a power supply 55 and a communication bus 56 .
  • FIG. 5 does not constitute a limitation on the device for reading the memory data of the FPGA board, and may include more or less components than the one shown, for example, the sensor 57 may also be included.
  • the embodiments of the present invention effectively improve the data reading efficiency and resource utilization of the FPGA board, thereby improving the overall operation efficiency and reducing the system data processing delay.
  • the method for reading the memory data of the FPGA board in the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electrically erasable programmable ROM, registers, hard disks, programmable Various media that can store program codes, such as removable disks, CD-ROMs, magnetic disks, or optical disks.
  • an embodiment of the present invention also provides a computer-readable storage medium, which stores a program for reading memory data of an FPGA board, and the program for reading memory data on an FPGA board is executed by a processor as described in any of the above implementations. For example, the steps of the method for reading the memory data of the FPGA board card are described.
  • the embodiments of the present invention effectively improve the data reading efficiency and resource utilization of the FPGA board, thereby improving the overall operation efficiency and reducing the system data processing delay.

Abstract

一种FPGA板卡内存数据的读取方法、装置及计算机可读存储介质。其中,方法包括FPGA板卡在接收主机端的硬件信息获取请求后将控制器数量和DDR内存总个数进行反馈;当接收到主机端的数据空间申请请求,基于数据空间申请请求将待计算数据进行数据切片处理;数据空间申请请求携带各DDR的专用申请空间容量和待计算数据,待计算数据的切片总数与DDR内存总个数相同;将各切片数据传输至相应的DDR空间中,并根据每片DDR中切片数据的数据存储位置通过多个控制器并行从DDR内存空间中读取数据并计算,从而有效提升FPGA板卡读取数据效率和资源利用率,进而提高整体运行效率,降低系统数据处理延时。

Description

FPGA板卡内存数据的读取方法、装置及介质
本申请要求于2020年6月30日提交中国专利局、申请号为CN202010616628.X、发明名称为“FPGA板卡内存数据的读取方法、装置及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据访问技术领域,特别是涉及一种FPGA板卡内存数据的读取方法、装置及计算机可读存储介质。
背景技术
随着用户对数据处理效率要求越来越高,多线程并行数据处理成为热点,FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)作为PAL(Programmable Array Logic,可编程阵列逻辑)、GAL(generic array logic,通用阵列逻辑)等可编程器件的基础上进一步发展的半定制电路,既解决了定制电路的不足,又克服了原有可编程器件门电路数有限的缺点。FPGA器件属于专用集成电路中的一种半定制电路,是可编程的逻辑列阵,包括可编程输入输出单元,可配置逻辑块,数字时钟管理模块,嵌入式块RAM,布线资源,内嵌专用硬核,底层内嵌功能单元。由于FPGA具有布线资源丰富,可重复编程和集成度高,投资较低的特点,在数字电路设计领域得到了广泛的应用。由FPGA板卡和CPU(Central Processing Unit,中央处理器)、GPU(Graphics Processing Unit,图像处理器)等处理器构成的异构计算平台可大幅提升数据处理效率和性能尤其是复杂数据处理过程,而被广泛应用在各行各业。
Opencl(Open Computing Language,开放运算语言)为异构平台编写程序的框架,其由一门用于基于C99编写Kernels(核函数)的语言和一组用于定义并控制平台的API(Application Programming Interface,应用程序接口)组成,Kernels为在Opencl设备上运行的函数,Opencl提供了基于任务分割和数据分割的并行计算机制。
Opencl分为两部分,一部分为主机端的程序,另一部分为FPGA端的 Kernel程序。主机端的程序步骤为:
通过下述函数申请FPGA板卡上的内存,如图1的DDR0->BUFF,buff=clCreateBuffer(context,CL_MEM_READ_ONLY,1G,NULL,&status);
调用下述函数,将主机端的计算数据通过DMA(Direct Memory Access,直接存储器访问),拷贝的FPGA板卡的内存上,如图1将主机OPENCL->BUFF的数据拷贝到FPGADDR0->BUFF上去:
status=clEnqueueWriteBuffer(queue,buff0,CL_FALSE,0,1G,data,0,NULL,&write_event[0]);
将FPGA板卡存放计算数据的地址通过参数status传递给Kernel,并执行FPGA端的Kernel程序:
status=clSetKernelArg(Kernel,buff,sizeof(structbuff),buf)。
通过下述函数读取存在FPGA板卡内存的DDR(DoubleDataRate双倍速率)上的结果:
clEnqueueReadBuffer(queue,output_buf,CL_FALSE,0,1G,output[i],1G,&Kernel_event,&finish_event);
相关技术中,结合如图1所示,数据存储在FPGA板卡的DDR0的内存上,FPGA板卡在读取数据的时候,通过自身的其中一个DDR控制器来读取数据,而其它DDR内存和DDR控制器是闲置的,FPGA板卡的数据读取速度不高,资源利用率较差。
鉴于此,如何提升FPGA板卡读取数据效率和资源利用率,是所属领域技术人员需要解决的技术问题。
发明内容
本申请提供了一种FPGA板卡内存数据的读取方法、装置及计算机可读存储介质,有效提升FPGA板卡读取数据效率和资源利用率,从而提高整体运行效率,降低系统数据处理延时。
为解决上述技术问题,本发明实施例提供以下技术方案:
本发明实施例一方面提供了一种FPGA板卡内存数据的读取方法,应用于FPGA板卡,包括:
当接收到主机端的硬件信息获取请求,将控制器数量和DDR内存总个数发送给所述主机端;
当接收到所述主机端的数据空间申请请求,基于所述数据空间申请请求将所述待计算数据进行数据切片处理;所述数据空间申请请求携带各DDR的专用申请空间容量和所述待计算数据,所述待计算数据的切片总数不大于所述DDR内存总个数;
将各切片数据传输至相应的DDR空间中,并根据每片DDR中切片数据的数据存储位置并行读取数据并计算。
可选的,所述基于所述数据空间申请请求将所述待计算数据进行数据切片处理包括:
从所述数据空间申请请求中读取各DDR的专用申请空间容量;
判断各DDR的专用申请空间容量是否均相同;
若是,则将所述待计算数据均分为n份,n为所述DDR内存总个数值;
若否,则对每片DDR的专用申请空间容量,将所述待计算数据切割为与当前片DDR的专用申请空间容量值相同的数据片,并为所述数据片设置标识信息,以用于标识所述数据片中的数据存储在所述当前片DDR内存空间。
可选的,所述将各切片数据传输至相应的DDR空间中为:
将各切片数据通过直接存储器访问传输至相应的DDR空间中。
可选的,所述将各切片数据传输至相应的DDR空间中,并根据每片DDR中切片数据的数据存储位置并行读取数据并计算包括:
将各切片数据传输至相应的DDR空间中,以使各DDR空间将数据源存放的结构体地址传递给Kernel;
调用所述Kernel根据每片DDR上的数据存储地址并行读取相应数据进行计算。
本发明实施例还提供了一种FPGA板卡内存数据的读取装置,应用于FPGA板卡,包括:
数据反馈模块,用于当接收到主机端的硬件信息获取请求,将控制器数量和DDR内存总个数发送给所述主机端;
数据切片模块,用于当接收到所述主机端的数据空间申请请求,基于所述数据空间申请请求将所述待计算数据进行数据切片处理;所述数据空间申请请求携带各DDR的专用申请空间容量和所述待计算数据,所述待计算数据的切片总数不大于所述DDR内存总个数;
数据存储模块,用于将各切片数据传输至相应的DDR空间中;
数据读取模块,用于根据每片DDR中切片数据的数据存储位置并行读取数据并计算。
可选的,所述数据切片模块包括:
信息读取子模块,用于从所述数据空间申请请求中读取各DDR的专用申请空间容量;
判断子模块,用于判断各DDR的专用申请空间容量是否均相同;
均分切片子模块,用于若各DDR的专用申请空间容量均相同,则将所述待计算数据均分为n份,n为所述DDR内存总个数值;
匹配切片子模块,若各DDR的专用申请空间容量不相同,则对每片DDR的专用申请空间容量,将所述待计算数据切割为与当前片DDR的专用申请空间容量值相同的数据片,并为所述数据片设置标识信息,以用于标识所述数据片中的数据存储在所述当前片DDR内存空间。
可选的,所述数据读取模块包括:
地址反馈子模块,用于将各切片数据传输至相应的DDR空间中,以使各DDR空间将数据源存放的结构体地址传递给Kernel;
数据并行读取子模块,用于调用所述Kernel根据每片DDR上的数据存储地址并行读取相应数据进行计算。
本发明实施例另一方面提供了一种FPGA板卡内存数据的读取方法,应用于主机端,包括:
获取FPGA板卡的控制器数量和DDR内存总个数;
基于所述DDR内存总个数和所述控制器数量确定各DDR的专用申请空间容量;
调用Opencl的数据请求函数向所述FPGA板卡发送数据空间申请请求,所述数据空间申请请求携带各DDR的专用申请空间容量和待计算数 据,以使所述FPGA板卡将所述待计算数据进行数据切片处理并存储相应的DDR空间。
本发明实施例还提供了一种FPGA板卡内存数据的读取装置,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如前任一项所述FPGA板卡内存数据的读取方法的步骤。
本发明实施例最后还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有FPGA板卡内存数据的读取程序,所述FPGA板卡内存数据的读取程序被处理器执行时实现如前任一项所述FPGA板卡内存数据的读取方法的步骤。
本申请提供的技术方案的优点在于,改变Opencl在申请FPGA板卡内存上的方式,先将计算数据进行切片,再将数据分别拷贝到FPGA板卡的相应DDR内存空间,充分利用FPGA板卡支持多DDR控制和自身并行处理的优势,通过多个DDR控制器同时读取待计算数据,有效提高数据读取效率,最大化利用已有软件硬件资源,提高资源利用率,从而提高整体运行效率,降低系统数据处理延时。
此外,本发明实施例还针对FPGA板卡内存数据的读取方法提供了相应的实现装置及计算机可读存储介质,进一步使得所述方法更具有实用性,所述装置及计算机可读存储介质具有相应的优点。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
为了更清楚的说明本发明实施例或相关技术的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一个现有技术中示例性应用场景的数据读取示意图;
图2为本发明实施例提供的一种FPGA板卡内存数据的读取方法的流 程示意图;
图3为本发明实施例提供的FPGA板卡内存数据的读取方法的交互示意图;
图4为本发明实施例提供的FPGA板卡内存数据的读取装置的一种具体实施方式结构图;
图5为本发明实施例提供的FPGA板卡内存数据的读取装置的另一种具体实施方式结构图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象,而不是用于描述特定的顺序。此外术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
在介绍了本发明实施例的技术方案后,下面详细的说明本申请的各种非限制性实施方式。
首先参见图2和图3,图2为本发明实施例提供的一种FPGA板卡内存数据的读取方法的流程示意图,本发明实施例的执行主语为FPGA板卡,可包括以下内容:
S201:当接收到主机端的硬件信息获取请求,将控制器数量和DDR内存总个数发送给主机端。
可以理解的是,在由主机端和FPGA板卡构成的异构计算平台中,主机端负责调度,FPGA板卡负责计算数据处理,主机端需要将待计算数据 发送给FPGA板卡,FPGA板卡在计算处理完这些待计算处理后,将数据处理结构反馈给主机端。由于本申请要通过改变FPGA板卡中待计算数据的存储方式来提高数据读取效率,故在主机端在发送FPGA板卡的待计算数据之前,需要获取与其构成异构计算平台的FPGA板卡的硬件信息,硬件信息包括FPGA板卡上的DDR内存总个数,控制器总个数,通常来讲,一个控制器控制读取一个DDR内存的数据。
S202:当接收到主机端的数据空间申请请求,基于数据空间申请请求将待计算数据进行数据切片处理。
在本申请中,主机端在接收到FPGA板卡反馈的硬件信息后,需要向FPGA板卡申请容纳待计算数据的存储空间,相关技术将这些待计算数据存储在同一个存储空间中,这样读取数据由一个控制器控制,其他内存空间不用且控制器也空闲,资源浪费严重。本申请的主机端基于控制器数量、DDR内存总个数和待计算数据占用空间容量计算向FPGA板卡申请将待计算数据分开存储在多片DDR内存中所占用的空间容量,也即计算各片DDR的专用申请空间容量,每个DDR的专用申请空间容量用于存储相应的一部分待计算数据。主机端在向FPGA板卡发送数据空间申请请求,会将各DDR的专用申请空间容量和待计算数据携带在请求中,以使FPGA板卡明确要处理的待计算数据及其存储要求。在该步骤中,待计算数据可被均分至FPGA板卡中的所有DDR中,也可被存储在其中几个DDR中,本申请对此不作任何限定。相应的,FPGA板卡在将待计算数据进行切片数量时,待计算数据的切片总数不大于DDR内存总个数。
S203:将各切片数据传输至相应的DDR空间中。
在S202将待计算数据分割为多个子数据或称为数据片后,可将各切片数据通过直接存储器访问DMA传输至相应的DDR空间中。
S204:根据每片DDR中切片数据的数据存储位置并行读取数据并计算。
在S203将各数据片存储至相应DDR内存空间后,FPGA板卡要读取这些数据需要知道具体存储位置,各DDR空间可将数据源存放的结构体地址也即数据片的存储位置传递给Kernel,FPGA板卡便可从Kernel中获 取待计算数据的每个数据片的存储位置信息,可选的,每个数据片的数据存储位置可携带待计算数据的标识信息、相应DDR的标识信息。由于FPGA板卡的硬件是支持多个DDR控制器,且FPGA板卡有并行处理的数据的能力,是可以同时通过多个DDR控制器来读取DDR内存上的数据,调用Kernel根据每片DDR上的数据存储地址通过相应控制器并行读取相应数据并基于计算需求对待计算数据进行处理。
在本发明实施例提供的技术方案中,改变Opencl在申请FPGA板卡内存上的方式,先将计算数据进行切片,再将数据分别拷贝到FPGA板卡的相应DDR内存空间,充分利用FPGA板卡支持多DDR控制和自身并行处理的优势,通过多个DDR控制器同时读取待计算数据,有效提高数据读取效率,最大化利用已有软件硬件资源,提高资源利用率,从而提高整体运行效率,降低系统数据处理延时。
在上述实施例中,对于如何执行步骤S202并不做限定,本实施例中给出一种数据切片方式,可包括如下步骤:
FPGA板卡从数据空间申请请求中读取各DDR的专用申请空间容量。判断各DDR的专用申请空间容量是否均相同;若是,则将待计算数据均分为n份,n为DDR内存总个数值;若否,对于每片DDR的专用申请空间容量,将待计算数据均切割为与当前片DDR的专用申请空间容量值相同的数据片,并为数据片设置标识信息。
需要说明的是,本申请中各步骤之间没有严格的先后执行顺序,只要符合逻辑上的顺序,则这些步骤可以同时执行,也可按照某种预设顺序执行,图2-图3只是一种示意方式,并不代表只能是这样的执行顺序。
在本发明实施例中,标识信息用于标识数据片中的数据存储在当前片DDR内存空间,基于数据片的标识信息便可知道该数据片存储在哪片DDR的内存空间中。待计算数据可被均分至所有DDR中,也可基于待计算数据的数据处理逻辑划分为少于DDR总数个切片,不全部占用所有DDR内存空间且各DDR内存空间的占用空间容量也不同。举例来说,若待计算 数据的B步骤与C步骤共用相同的多个参数值,那么可将实现B步骤和C步骤的计算数据放在同一个数据片中,若待计算数据的A步骤的计算结果为D步骤的输入,则可将实现A步骤和D步骤的计算数据放在同一个的数据片中。另外一种实施方式,为了保证各DDR内存数据存储的均衡,在计算各DDR的专用申请空间容量时,可将各DDR按照可用空间从大到小进行排序,可用空间大的DDR的专用申请空间容量多,可用空间小的DDR的专用申请空间容量小,以保证各DDR内存空间存储均衡。
此外,本申请还基于主机端提供了一种FPGA板卡内存数据的读取方法,请参阅图3,可包括:
获取FPGA板卡的控制器数量和DDR内存总个数;
基于DDR内存总个数和控制器数量确定各DDR的专用申请空间容量;
调用Opencl的数据请求函数向FPGA板卡发送数据空间申请请求,数据空间申请请求携带各DDR的专用申请空间容量和待计算数据,以使FPGA板卡将待计算数据进行数据切片处理并存储相应的DDR空间。
本发明实施例所述FPGA板卡内存数据的读取方法的具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例有效提升FPGA板卡读取数据效率和资源利用率,从而提高整体运行效率,降低系统数据处理延时。
为了使本领域技术人员更加清楚明白本申请的技术方案,本申请还提供了一个示意例子,若待计算数据占用空间容量为1G,FPGA板卡上有4个DDR内存和4个控制器,FPGA板卡内存数据的读取过程可包括下述内容:
A:由于DDR内存数据为4个,相应的计算数据可均分为4份,创建如下所示的内存结构体:
struct buff
{
buff0;
buff1;
buff2;
buff3;
}
B:根据待计算数据占用空间总量/DDR内存个数=每个DDR内存要申请数据包的大小(专用申请空间容量),可将1G/4(4个DDR内存)=256M,所以在FPGA板卡的每个DDR上申请独立的256M内存空间,调用Opencl的函数申请数据空间。
buff0=clCreateBuffer(context,CL_MEM_READ_ONLY,256M,NULL,&status,ddr0);
buff1=clCreateBuffer(context,CL_MEM_READ_ONLY,256M,NULL,&status,ddr1);
buff2=clCreateBuffer(context,CL_MEM_READ_ONLY,256M,NULL,&status,ddr2);
buff3=clCreateBuffer(context,CL_MEM_READ_ONLY,256M,NULL,&status,ddr3)。
C:将主机端的1G大小的待计算数据分成4片每片256M,分别存储到FPGA板卡的每个内存上去。
status=clEnqueueWriteBuffer(queue,buff0,CL_FALSE,0,256M,data,0,NULL,&write_event[0]);
status=clEnqueueWriteBuffer(queue,buff1,CL_FALSE,0,256M,data,0,NULL,&write_event[0]);
status=clEnqueueWriteBuffer(queue,buff2,CL_FALSE,0,256M,data,0,NULL,&write_event[0]);
status=clEnqueueWriteBuffer(queue,buff3,CL_FALSE,0,256M,data,0,NULL,&write_event[0]);
D:将数据源存放的结构体地址BUFF,传递给Kernel。FPGA的Kernel再根据每个DDR上存放的数据地址,并行读取数据。最后进行数据的计 算处理。
status=clSetKernelArg(Kernel[i],buff,sizeof(struct buff),buf)。
由上可知,本发明实施例改变Opencl在申请FPGA内存上的方式,先将计算数据进行切片,再将数据分别拷贝到FPGA板卡的每个内存上去可比图1所示的现有技术的读取数据的效率提高4倍。
本发明实施例还针对FPGA板卡内存数据的读取方法提供了相应的装置,进一步使得所述方法更具有实用性。其中,装置可从FPGA板卡的功能模块的角度,下面对本发明实施例提供的FPGA板卡内存数据的读取装置进行介绍,下文描述的FPGA板卡内存数据的读取装置与上文描述的FPGA板卡内存数据的读取方法可相互对应参照。
基于功能模块的角度,参见图4,图4为本发明实施例提供的FPGA板卡内存数据的读取装置在一种具体实施方式下的结构图,基于FPGA板卡,该装置可包括:
数据反馈模块401,用于当接收到主机端的硬件信息获取请求,将控制器数量和DDR内存总个数发送给主机端。
数据切片模块402,用于当接收到主机端的数据空间申请请求,基于数据空间申请请求将待计算数据进行数据切片处理;数据空间申请请求携带各DDR的专用申请空间容量和待计算数据,待计算数据的切片总数与DDR内存总个数相同。
数据存储模块403,用于将各切片数据传输至相应的DDR空间中。
数据读取模块404,用于根据每片DDR中切片数据的数据存储位置并行读取数据并计算。
可选的,在本实施例的一些实施方式中,所述数据切片模块402可以包括:
信息读取子模块,用于从数据空间申请请求中读取各DDR的专用申请空间容量;
判断子模块,用于判断各DDR的专用申请空间容量是否均相同;
均分切片子模块,用于若各DDR的专用申请空间容量均相同,则将待计算数据均分为n份,n为DDR内存总个数值;
匹配切片子模块,若各DDR的专用申请空间容量不相同,则对每片DDR的专用申请空间容量,将待计算数据切割为与当前片DDR的专用申请空间容量值相同的数据片,并为数据片设置标识信息,以用于标识数据片中的数据存储在当前片DDR内存空间。
在本发明实施例的其他一些实施方式中,所述数据存储模块403还可为将各切片数据通过直接存储器访问传输至相应的DDR空间中的模块。
可选的,在本实施例的另一些实施方式中,所述数据读取模块404例如还可以包括:
地址反馈子模块,用于将各切片数据传输至相应的DDR空间中,以使各DDR空间将数据源存放的结构体地址传递给Kernel;
数据并行读取子模块,用于调用Kernel根据每片DDR上的数据存储地址并行读取相应数据进行计算。
本发明实施例所述FPGA板卡内存数据的读取装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例有效提升FPGA板卡读取数据效率和资源利用率,从而提高整体运行效率,降低系统数据处理延时。
上文中提到的FPGA板卡内存数据的读取装置是从FPGA板卡的功能模块的角度描述,进一步的,本申请还提供一种FPGA板卡内存数据的读取装置,是从主机端硬件角度描述。图5为本申请实施例提供的另一种FPGA板卡内存数据的读取装置的结构图。如图5所示,该装置包括存储器50,用于存储计算机程序;
处理器51,用于执行计算机程序时实现如上述任一实施例提到的FPGA板卡内存数据的读取方法的步骤。
其中,处理器51可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器51可以采用DSP(Digital Signal Processing,数字 信号处理)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器51也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器51可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器51还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器50可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器50还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本实施例中,存储器50至少用于存储以下计算机程序501,其中,该计算机程序被处理器51加载并执行之后,能够实现前述任一实施例公开的FPGA板卡内存数据的读取方法的相关步骤。另外,存储器50所存储的资源还可以包括操作系统502和数据503等,存储方式可以是短暂存储或者永久存储。其中,操作系统502可以包括Windows、Unix、Linux等。数据503可以包括但不限于测试结果对应的数据等。
在一些实施例中,FPGA板卡内存数据的读取装置还可包括有显示屏52、输入输出接口53、通信接口54、电源55以及通信总线56。
本领域技术人员可以理解,图5中示出的结构并不构成对FPGA板卡内存数据的读取装置的限定,可以包括比图示更多或更少的组件,例如还可包括传感器57。
本发明实施例所述FPGA板卡内存数据的读取装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例有效提升FPGA板卡读取数据效率和资源利用率,从而提高整体运行效率,降低系统数据处理延时。
可以理解的是,如果上述实施例中的FPGA板卡内存数据的读取方法以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、磁碟或者光盘等各种可以存储程序代码的介质。
基于此,本发明实施例还提供了一种计算机可读存储介质,存储有FPGA板卡内存数据的读取程序,所述FPGA板卡内存数据的读取程序被处理器执行时如上任意一实施例所述FPGA板卡内存数据的读取方法的步骤。
本发明实施例所述计算机可读存储介质的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例有效提升FPGA板卡读取数据效率和资源利用率,从而提高整体运行效率,降低系统数据处理延时。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可 以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上对本申请所提供的一种FPGA板卡内存数据的读取方法、装置及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (10)

  1. 一种FPGA板卡内存数据的读取方法,其特征在于,基于FPGA板卡,包括:
    当接收到主机端的硬件信息获取请求,将控制器数量和DDR内存总个数发送给所述主机端;
    当接收到所述主机端的数据空间申请请求,基于所述数据空间申请请求将所述待计算数据进行数据切片处理;所述数据空间申请请求携带各DDR的专用申请空间容量和所述待计算数据,所述待计算数据的切片总数不大于所述DDR内存总个数;
    将各切片数据传输至相应的DDR空间中,并根据每片DDR中切片数据的数据存储位置并行读取数据并计算。
  2. 根据权利要求1所述的FPGA板卡内存数据的读取方法,其特征在于,所述基于所述数据空间申请请求将所述待计算数据进行数据切片处理包括:
    从所述数据空间申请请求中读取各DDR的专用申请空间容量;
    判断各DDR的专用申请空间容量是否均相同;
    若是,则将所述待计算数据均分为n份,n为所述DDR内存总个数值;
    若否,则对每片DDR的专用申请空间容量,将所述待计算数据切割为与当前片DDR的专用申请空间容量值相同的数据片,并为所述数据片设置标识信息,以用于标识所述数据片中的数据存储在所述当前片DDR内存空间。
  3. 根据权利要求2所述的FPGA板卡内存数据的读取方法,其特征在于,所述将各切片数据传输至相应的DDR空间中为:
    将各切片数据通过直接存储器访问传输至相应的DDR空间中。
  4. 根据权利要求3所述的FPGA板卡内存数据的读取方法,其特征在于,所述将各切片数据传输至相应的DDR空间中,并根据每片DDR中切片数据的数据存储位置并行读取数据并计算包括:
    将各切片数据传输至相应的DDR空间中,以使各DDR空间将数据源存放的结构体地址传递给Kernel;
    调用所述Kernel根据每片DDR上的数据存储地址并行读取相应数据进行计算。
  5. 一种FPGA板卡内存数据的读取装置,其特征在于,基于FPGA板卡,包括:
    数据反馈模块,用于当接收到主机端的硬件信息获取请求,将控制器数量和DDR内存总个数发送给所述主机端;
    数据切片模块,用于当接收到所述主机端的数据空间申请请求,基于所述数据空间申请请求将所述待计算数据进行数据切片处理;所述数据空间申请请求携带各DDR的专用申请空间容量和所述待计算数据,所述待计算数据的切片总数不大于所述DDR内存总个数;
    数据存储模块,用于将各切片数据传输至相应的DDR空间中;
    数据读取模块,用于根据每片DDR中切片数据的数据存储位置并行读取数据并计算。
  6. 根据权利要求5所述的FPGA板卡内存数据的读取装置,其特征在于,所述数据切片模块包括:
    信息读取子模块,用于从所述数据空间申请请求中读取各DDR的专用申请空间容量;
    判断子模块,用于判断各DDR的专用申请空间容量是否均相同;
    均分切片子模块,用于若各DDR的专用申请空间容量均相同,则将所述待计算数据均分为n份,n为所述DDR内存总个数值;
    匹配切片子模块,若各DDR的专用申请空间容量不相同,则对每片DDR的专用申请空间容量,将所述待计算数据切割为与当前片DDR的专用申请空间容量值相同的数据片,并为所述数据片设置标识信息,以用于标识所述数据片中的数据存储在所述当前片DDR内存空间。
  7. 根据权利要求6所述的FPGA板卡内存数据的读取装置,其特征在于,所述数据读取模块包括:
    地址反馈子模块,用于将各切片数据传输至相应的DDR空间中,以使各DDR空间将数据源存放的结构体地址传递给Kernel;
    数据并行读取子模块,用于调用所述Kernel根据每片DDR上的数据 存储地址并行读取相应数据进行计算。
  8. 一种FPGA板卡内存数据的读取方法,其特征在于,基于主机端,包括:
    获取FPGA板卡的控制器数量和DDR内存总个数;
    基于所述DDR内存总个数、所述控制器数量确定各DDR的专用申请空间容量;
    调用Opencl的数据请求函数向所述FPGA板卡发送数据空间申请请求,所述数据空间申请请求携带各DDR的专用申请空间容量和待计算数据,以使所述FPGA板卡将所述待计算数据进行数据切片处理并存储相应的DDR空间。
  9. 一种FPGA板卡内存数据的读取装置,其特征在于,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求8所述FPGA板卡内存数据的读取方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有FPGA板卡内存数据的读取程序,所述FPGA板卡内存数据的读取程序被处理器执行时实现如权利要求8所述FPGA板卡内存数据的读取方法的步骤。
PCT/CN2021/076885 2020-06-30 2021-02-19 Fpga板卡内存数据的读取方法、装置及介质 WO2022001128A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/012,921 US11687242B1 (en) 2020-06-30 2021-02-19 FPGA board memory data reading method and apparatus, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010616628.XA CN111858038A (zh) 2020-06-30 2020-06-30 Fpga板卡内存数据的读取方法、装置及介质
CN202010616628.X 2020-06-30

Publications (1)

Publication Number Publication Date
WO2022001128A1 true WO2022001128A1 (zh) 2022-01-06

Family

ID=72989289

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076885 WO2022001128A1 (zh) 2020-06-30 2021-02-19 Fpga板卡内存数据的读取方法、装置及介质

Country Status (3)

Country Link
US (1) US11687242B1 (zh)
CN (1) CN111858038A (zh)
WO (1) WO2022001128A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116488794A (zh) * 2023-06-16 2023-07-25 杭州海康威视数字技术股份有限公司 基于fpga的高速sm4密码模组实现方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858038A (zh) 2020-06-30 2020-10-30 浪潮电子信息产业股份有限公司 Fpga板卡内存数据的读取方法、装置及介质
CN115190175B (zh) * 2022-07-18 2023-07-14 浪潮(北京)电子信息产业有限公司 连接处理方法、系统、电子设备、服务器及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180059939A1 (en) * 2016-08-26 2018-03-01 Huawei Technologies Co., Ltd. Method, Device, and System for Implementing Hardware Acceleration Processing
CN109445863A (zh) * 2018-11-01 2019-03-08 郑州云海信息技术有限公司 一种基于fpga的数据处理方法、装置、设备及介质
CN110515727A (zh) * 2019-08-16 2019-11-29 苏州浪潮智能科技有限公司 一种fpga的内存空间操作方法及相关装置
CN110908797A (zh) * 2019-11-07 2020-03-24 浪潮电子信息产业股份有限公司 调用请求数据处理方法、装置、设备、存储介质及系统
CN111143272A (zh) * 2019-12-28 2020-05-12 浪潮(北京)电子信息产业有限公司 异构计算平台的数据处理方法、装置及可读存储介质
CN111858038A (zh) * 2020-06-30 2020-10-30 浪潮电子信息产业股份有限公司 Fpga板卡内存数据的读取方法、装置及介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8949502B2 (en) * 2010-11-18 2015-02-03 Nimble Storage, Inc. PCIe NVRAM card based on NVDIMM
US9880971B2 (en) * 2013-12-20 2018-01-30 Rambus Inc. Memory appliance for accessing memory
US9959208B2 (en) * 2015-06-02 2018-05-01 Goodrich Corporation Parallel caching architecture and methods for block-based data processing
CN107305506A (zh) * 2016-04-20 2017-10-31 阿里巴巴集团控股有限公司 动态分配内存的方法、装置及系统
CN107122490B (zh) * 2017-05-18 2021-01-29 苏州浪潮智能科技有限公司 一种分组查询中聚合函数的数据处理方法及系统
US11023249B1 (en) * 2018-09-26 2021-06-01 United States Of America As Represented By The Administrator Of Nasa First stage bootloader (FSBL)

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180059939A1 (en) * 2016-08-26 2018-03-01 Huawei Technologies Co., Ltd. Method, Device, and System for Implementing Hardware Acceleration Processing
CN109445863A (zh) * 2018-11-01 2019-03-08 郑州云海信息技术有限公司 一种基于fpga的数据处理方法、装置、设备及介质
CN110515727A (zh) * 2019-08-16 2019-11-29 苏州浪潮智能科技有限公司 一种fpga的内存空间操作方法及相关装置
CN110908797A (zh) * 2019-11-07 2020-03-24 浪潮电子信息产业股份有限公司 调用请求数据处理方法、装置、设备、存储介质及系统
CN111143272A (zh) * 2019-12-28 2020-05-12 浪潮(北京)电子信息产业有限公司 异构计算平台的数据处理方法、装置及可读存储介质
CN111858038A (zh) * 2020-06-30 2020-10-30 浪潮电子信息产业股份有限公司 Fpga板卡内存数据的读取方法、装置及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116488794A (zh) * 2023-06-16 2023-07-25 杭州海康威视数字技术股份有限公司 基于fpga的高速sm4密码模组实现方法及装置
CN116488794B (zh) * 2023-06-16 2023-09-19 杭州海康威视数字技术股份有限公司 基于fpga的高速sm4密码模组实现方法及装置

Also Published As

Publication number Publication date
US20230195310A1 (en) 2023-06-22
CN111858038A (zh) 2020-10-30
US11687242B1 (en) 2023-06-27

Similar Documents

Publication Publication Date Title
WO2022001128A1 (zh) Fpga板卡内存数据的读取方法、装置及介质
US11222256B2 (en) Neural network processing system having multiple processors and a neural network accelerator
US20190188024A1 (en) Virtual machine hot migration method and apparatus, and system
KR20210011451A (ko) 하드웨어 가속을 위한 하드웨어 리소스들의 임베디드 스케줄링
US11243795B2 (en) CPU overcommit with guest idle polling
WO2012000820A1 (en) Accelerator for virtual machine migration
CN115988217B (zh) 一种虚拟化视频编解码系统、电子设备和存储介质
US10761822B1 (en) Synchronization of computation engines with non-blocking instructions
CN103744716A (zh) 一种基于当前vcpu调度状态的动态中断均衡映射方法
US20240143392A1 (en) Task scheduling method, chip, and electronic device
WO2021108122A1 (en) Hierarchical partitioning of operators
US9699093B2 (en) Migration of virtual machine based on proximity to peripheral device in NUMA environment
US8291436B2 (en) Synchronization of event handlers
US10705993B2 (en) Programming and controlling compute units in an integrated circuit
US11237994B2 (en) Interrupt controller for controlling interrupts based on priorities of interrupts
US7689991B2 (en) Bus management techniques
US10769092B2 (en) Apparatus and method for reducing latency of input/output transactions in an information handling system using no-response commands
CN113568734A (zh) 基于多核处理器的虚拟化方法、系统、多核处理器和电子设备
CN114328350A (zh) 一种基于axi总线的通讯方法、装置以及介质
US11061654B1 (en) Synchronization of concurrent computation engines
KR20230091861A (ko) 하드웨어 가속을 위한 높은 처리량 회로 아키텍처
US11907144B1 (en) Early semaphore update
US11625453B1 (en) Using shared data bus to support systolic array tiling
US11182314B1 (en) Low latency neural network model loading
US11347544B1 (en) Scheduling work items based on declarative constraints

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21834183

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21834183

Country of ref document: EP

Kind code of ref document: A1