WO2021017378A1 - Fpga-based convolution parameter acceleration device and data read-write method - Google Patents

Fpga-based convolution parameter acceleration device and data read-write method Download PDF

Info

Publication number
WO2021017378A1
WO2021017378A1 PCT/CN2019/126433 CN2019126433W WO2021017378A1 WO 2021017378 A1 WO2021017378 A1 WO 2021017378A1 CN 2019126433 W CN2019126433 W CN 2019126433W WO 2021017378 A1 WO2021017378 A1 WO 2021017378A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolution parameters
read
last
write
convolution
Prior art date
Application number
PCT/CN2019/126433
Other languages
French (fr)
Chinese (zh)
Inventor
马向华
马成森
边立剑
Original Assignee
上海安路信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海安路信息科技有限公司 filed Critical 上海安路信息科技有限公司
Publication of WO2021017378A1 publication Critical patent/WO2021017378A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present invention relates to the technical field of integrated circuits, in particular to an FPGA-based convolution parameter acceleration device and a data reading and writing method.
  • CNN Convolutional Neural Network
  • CNN neural network is mainly based on the computer platform.
  • the CNN architecture is deployed on the development side computer, and then the mass data is used for weight training, and finally the appropriate weight coefficient is generated. After the weight coefficient is solidified, the same can be deployed on the product side.
  • CNN architecture but cancels the training part, and uses the generated weight coefficients to directly make the CNN neural network work. If portability and practicality must be considered on the product side, high-end servers and workstations are generally not used to set up CNN neural networks. Under the requirements of reducing costs and reducing size, using embedded development becomes the first choice.
  • CNN neural network convolution accelerators has become a popular direction, such as “Convolutional Neural Networks” FPGA Parallel Architecture Design of CNN Algorithms” (Wang Wei et al. 2019.4 Microelectronics and Computers) and “Convolutional Neural Network Accelerator Design and Application Research Based on ZYNQ Platform” (Deng Shuai 2018.5 Beijing University of Technology), the latter It only describes the theoretical process, without giving the actual design model and performance analysis. The former proposed a more specific neural network convolution accelerator model.
  • the model After the analysis of the paper, the model has a greater improvement in performance. Level implementation still has the disadvantage of insufficient internal data throughput of the chip, and it is difficult to implement applications.
  • the image classification algorithm of yoloV2 requires 17.4G calculation times for each frame of image. According to this design, only under the condition of seamless data connection A processing speed of 1.15 frames/s can be achieved.
  • CNN Neural Network Accelerator Refer to Figure 1.
  • the existing CNN neural network accelerator technology is not yet mature.
  • the main problem is that the cost is relatively high, and the data throughput rate is low, which leads to too much calculation delay, which cannot satisfy real-time applications and low cost.
  • CNN neural network is a complex system.
  • the existing public design ideas are basically integrated design, without configurable modular design, which will lead to design changes, upgrades and low transplantation efficiency, and reduce design reusability. .
  • the purpose of the present invention is to provide an FPGA-based convolution parameter acceleration device and a data reading and writing method to solve the technical problems of slow data processing and insufficient data throughput in the prior art.
  • an FPGA-based method for reading and writing convolution parameter data including:
  • the first random read/write memory outputs one of the set of convolution parameters according to the address, and the first read control counter increments automatically; judge whether it is completed If the output of the set of convolution parameters for a predetermined number of times is completed, the first read control counter and the second read control counter are cleared.
  • the write control counter is cleared.
  • the first read control counter is cleared, and the second read control counter is automatically reset. Increase by 1.
  • the second random access memory while writing a set of convolution parameters to the first random access memory, the second random access memory outputs another set of convolution parameters; or, the first random access memory While outputting a set of convolution parameters, write another set of convolution parameters into the second random access memory.
  • the input of a group of convolution parameters after the input of a group of convolution parameters is completed, it further includes: judging whether it is the last one of the input another group of convolution parameters. If it is not the last one, the write control counter is incremented automatically, and in the second random An address is allocated to each of the other set of convolution parameters in the read-write memory.
  • the output of a set of convolution parameters after the output of a set of convolution parameters is completed, it further includes: judging whether it is the last one of another set of convolution parameters to be output, if not the last one, the second random read/write memory outputs the other set of convolution One of the parameters, the first read control counter increments automatically.
  • the application also discloses an FPGA-based convolution parameter acceleration device including:
  • At least one random read-write memory configured to store convolution parameters
  • the write address control unit is configured to determine whether it is the last one of the input set of convolution parameters, if it is not the last one, the write control counter is incremented, and in the first random read/write memory, it is each set of convolution parameters. Allocation address
  • the read address control unit judges whether it is the last one of a set of output convolution parameters. If it is not the last one, the first random read/write memory outputs one of the set of convolution parameters according to the address, and the first read control counter automatically Increment; It is judged whether the output of the predetermined number of times of the set of convolution parameters is completed, and if completed, the first read control counter and the second read control counter are cleared.
  • it includes first and second random access memory, while writing a set of convolution parameters into the first random access memory, the second random access memory outputs another set of convolution parameters Or, while the first random access memory outputs a set of convolution parameters, another group of convolution parameters is written into the second random access memory.
  • the write address control unit is also configured to: determine whether it is the last one of another set of input convolution parameters, if not the last one, An address is allocated to each of the other set of convolution parameters in the second random read/write memory, and the write address controls the counter to increment.
  • the read address control unit is also configured to determine whether it is the last one of another set of output convolution parameters, if not the last one, then the second random The read-write memory outputs one of the other set of convolution parameters, and the first read control counter increases automatically.
  • the FPGA-based convolution parameter acceleration device of the present invention uses the least logic resources to form a minimized convolution parameter management.
  • the device interface is simple and easy to use, less resource occupancy, easy to transplant, short input and output paths, and due to internal use of two
  • a random read-write memory can read and write data at the same time, continuously output, and maintain the peak state for a long time, which can greatly improve the parallelism and achieve high data throughput.
  • Figure 1 shows a process diagram of the convolution technology in the CNN neural network model in the prior art
  • Figure 2 shows a schematic diagram of an acceleration device in an embodiment of the present invention
  • Figure 3 shows a schematic diagram of an acceleration device in another embodiment of the present invention.
  • Figure 4 shows a process diagram of data writing in an embodiment of the present invention
  • Fig. 5 shows a process diagram of data output in an embodiment of the present invention.
  • CNN Convolutional Neural Networks, Convolutional Neural Network
  • Convolution parameters Convolution kernel parameters in CNN
  • FPGA Field Programmable Logic Gate Array
  • RAM Random Access Memory
  • RAM Random Access Memory
  • the acceleration device 100 includes:
  • At least one random access memory shown in FIG. 2 includes a first random access memory 101, configured to store convolution parameters;
  • the write address control unit 201 is configured to determine whether it is the last one of the input set of convolution parameters. If it is not the last one, the write control counter (not shown in the figure) in the write address control unit 201 increments by 1, in Allocate an address in the first random access memory 101 for each of the group of convolution parameters;
  • the read address control unit 202 is configured to determine whether it is the last one of a set of output convolution parameters. If it is not the last one, the first random access memory 101 outputs one of the set of convolution parameters according to the address, and the read address The first read control counter in the control unit 202 increments by 1, and determines whether the output of the set of convolution parameters is completed for a predetermined number of times. If it is completed, the second read control counter in the address control unit 202 (not shown in the figure) is read. Out) cleared.
  • the least logical resources are used to form a minimized convolution parameter management acceleration unit, and its interface is simple and easy to use, with low resource occupation, easy transplantation, and short input and output paths.
  • the acceleration device of the present application includes a first random read/write memory 101 and a second random read/write memory 102, and a set of volumes is written into the first random read/write memory 101 While accumulating parameters, the second random access memory 102 outputs another set of convolution parameters; or, the first random access memory 101 outputs a group of convolution parameters while writing to the second random access memory 102 Enter another set of convolution parameters.
  • the write address control unit 201 is also configured to determine whether it is the last one of another set of input convolution parameters, If it is not the last one, an address is assigned to each of the other set of convolution parameters in the second random access memory 202, and the write address controls the counter to increment by one.
  • the read address control unit 202 is also configured to determine whether it is the last one of another set of output convolution parameters, if not The last one, the second random read/write memory 102 outputs one of the other set of convolution parameters, and the first read control counter in the read address control unit 202 is incremented.
  • the acceleration device uses two RAMs inside, the data can be read and written at the same time, one RAM is used for writing data, and the other RAM is used for reading data, so that parallel processing of data can be realized, continuous output, and long-term retention of the peak state.
  • Another embodiment of the present application also discloses an FPGA-based data reading and writing method, including:
  • the FPGA-based data writing method includes:
  • step 11 it is judged whether there is data input, if there is no data input, go to step 15, and the write control counter of the write control unit is cleared;
  • step 12 determines whether it is the last one of the input set of convolution parameters. If it is not the last one, go to step 13, and the write control counter is incremented by one, and then go to step S14, in the first random read and write Allocate an address in the memory for each of the group of convolution parameters;
  • step 15 If it is the last one of the group of convolution parameters, go to step 15, and the write control counter of the write control unit is cleared.
  • the FPGA-based data writing method includes:
  • step 26 determine whether the data is output 21. If there is no data output, go to step 26, the first read control counter is cleared, and the second read control counter is cleared.
  • step 22 determines whether it is the last one of the set of convolution parameters. If it is the last one of the set of convolution parameters, go to step 27, the first read control counter is cleared, and the second read control The counter increments by 1.
  • step 23 the first random read/write memory 101 outputs one of the group of convolution parameters according to the address, and at the same time go to step 24, the first read control counter increments by 1; then, repeat step 21 .
  • step 27 the first read control counter is cleared, and the first read control counter is cleared.
  • the second reading control counter increments by 1, indicating that the output of a set of convolution parameters for one point has been completed.
  • step 25 determines whether the output of the predetermined number of convolution parameters of the group of convolution parameters has been completed, and if not, enter step 21 again to determine whether to output data.
  • step 26 is entered, and the first read control counter and the second read control counter are cleared.
  • the second random read/write memory 102 while writing a set of convolution parameters to the first random read/write memory 101, the second random read/write memory 102 outputs another set of convolution parameters; or, the first random read While the write memory 101 outputs a set of convolution parameters, another set of convolution parameters is written into the second random access memory 102.
  • a set of convolution parameters are stored in the first random read/write memory 101 at this time, and then further includes: determining whether it is the last one of the input another set of convolution parameters , If it is not the last one, the write control counter is incremented, an address is assigned to each of the other set of convolution parameters in the second random read/write memory, and another set of convolution parameters is written into the second random Read and write memory, and the first random read and write memory can output data at the same time.
  • the output of a set of convolution parameters in the first random read/write memory 101 is completed, and then it also includes: judging whether it is the last set of output convolution parameters One, if it is not the last one, the second random read/write memory outputs one of the other set of convolution parameters, the first read control counter increments automatically, so that the second random read/write memory is used to output data, and the first Random read-write memory can write data at the same time.
  • the acceleration device uses two RAMs, data can be read and written at the same time, one RAM is used to write data, and the other RAM is used to read data, so that parallel processing of data can be realized, continuous output, and long-term retention of the peak state.
  • the first embodiment is a method embodiment corresponding to this embodiment.
  • the technical details in the first embodiment can be applied to this embodiment, and the technical details in this embodiment can also be applied to the first embodiment.
  • each module shown in the implementation of the acceleration device can be understood with reference to the relevant description of the data reading and writing method.
  • the function of each module shown in the embodiment of the acceleration device can be realized by a program (executable instruction) running on the processor, or can be realized by a specific logic circuit. If the acceleration device of the embodiment of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application essentially or the part that contributes to the prior art can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, Read Only Memory (ROM, Read Only Memory), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific hardware and software combination.
  • FPGA-readable storage media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of storage media for FPGA configuration files include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), Readable memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical Storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by computing devices. According to the definition in this article, FPGA-readable storage media does not include transitory media, such as modulated data signals and carrier waves.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM Readable memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • CD-ROM compact disc
  • DVD digital versatile disc
  • magnetic cassettes magnetic tape storage or other magnetic storage devices or any other
  • an act is performed based on a certain element, it means that the act is performed at least based on that element, which includes two situations: performing the act only based on the element, and performing the act based on the element and Other elements perform the behavior.
  • Multiple, multiple, multiple, etc. expressions include two, two, two, and two or more, two or more, and two or more expressions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A FPGA-based convolution parameter acceleration device (100) and a data read-write method, wherein the method comprises: determining whether a parameter is the last one of a set of input convolution parameters, if not, a write control counter is incremented, and assigning an address to each of the set of convolution parameters in a first random read-write memory (101); determining whether a parameter is the last one of a set of output convolution parameters, if not, the first random read-write memory (101) outputs one of the set of convolution parameters according to the address, and a first read control counter is incremented; and determining whether the predetermined number of outputs of the set of convolution parameters is completed, if so, clearing the first read control counter and a second read control counter.

Description

基于FPGA的卷积参数加速装置、数据读写方法FPGA-based convolution parameter acceleration device and data reading and writing method 技术领域Technical field
本发明涉及集成电路技术领域,特别涉及基于FPGA的卷积参数加速装置、数据读写方法。The present invention relates to the technical field of integrated circuits, in particular to an FPGA-based convolution parameter acceleration device and a data reading and writing method.
背景技术Background technique
在人工智能领域卷积神经网络(CNN)是一个研究较为成熟的技术方案,它具有表征学习能力,能够按其阶层结构对输入信息进行平移不变分类。随着深度学习理论的提出和数值计算设备的改进,CNN神经网络得到了快速发展,并被大量应用于计算机视觉、自然语言处理等方向,主要用于目标的分类处理,在图像识别、语言识别等应用中具有压倒性的优势。In the field of artificial intelligence, Convolutional Neural Network (CNN) is a relatively mature technical solution. It has the ability to characterize learning and can classify input information according to its hierarchical structure. With the advancement of deep learning theory and the improvement of numerical computing equipment, CNN neural networks have developed rapidly and have been widely used in computer vision, natural language processing and other directions. They are mainly used for target classification processing, in image recognition and language recognition. It has an overwhelming advantage in such applications.
目前CNN神经网络的实施以计算机平台为主,在开发端计算机上部署CNN架构,然后通过海量数据进行权重训练,最终生成合适的权重系数,在权重系数固化后,可以在产品端布置一个相同的CNN架构,但取消了训练部分,使用已生成的权重系数可直接使得CNN神经网络进行工作。产品端若要考虑到便携性和实用性,一般不会用高端服务器、工作站架设CNN神经网络,在降低成本、缩小体积的要求下,使用嵌入式开发成为首选。At present, the implementation of CNN neural network is mainly based on the computer platform. The CNN architecture is deployed on the development side computer, and then the mass data is used for weight training, and finally the appropriate weight coefficient is generated. After the weight coefficient is solidified, the same can be deployed on the product side. CNN architecture, but cancels the training part, and uses the generated weight coefficients to directly make the CNN neural network work. If portability and practicality must be considered on the product side, high-end servers and workstations are generally not used to set up CNN neural networks. Under the requirements of reducing costs and reducing size, using embedded development becomes the first choice.
针对CNN神经网络嵌入式开发的研究近年来有一些开展。数字信号处理(Digital Signal Processing,DSP)或ARM(Advanced RISC Machines)的方式因计算时间过长,基本不予考虑,因此FPGA设计CNN神经网络卷积加速器的研究成为热门方向,如《卷积神经网络(CNN)算法的FPGA并行结构设计》(王巍等2019.4微电子学与计算机)和《基于ZYNQ平台的卷积神经网络加速器设计及其应用研究》(邓帅2018.5北京工业大学),后者 只是描述了理论过程,没有给出实际的设计模型与性能分析,前者则提出了一个较为具体的神经网络卷积加速器模型,该模型经过论文分析,在性能上有较大的提升,但对于产品级的实现,还存在芯片内部数据吞吐量不足,难以实现实施应用的缺点,比如对yoloV2的图像分类算法,对每帧图像要求17.4G计算次数,按照该设计在数据无缝衔接的条件下只能实现1.15帧/s的处理速度。Research on the embedded development of CNN neural network has been carried out in recent years. Digital signal processing (Digital Signal Processing, DSP) or ARM (Advanced RISC Machines) methods are basically not considered due to the long calculation time. Therefore, FPGA design of CNN neural network convolution accelerators has become a popular direction, such as "Convolutional Neural Networks" FPGA Parallel Architecture Design of CNN Algorithms" (Wang Wei et al. 2019.4 Microelectronics and Computers) and "Convolutional Neural Network Accelerator Design and Application Research Based on ZYNQ Platform" (Deng Shuai 2018.5 Beijing University of Technology), the latter It only describes the theoretical process, without giving the actual design model and performance analysis. The former proposed a more specific neural network convolution accelerator model. After the analysis of the paper, the model has a greater improvement in performance. Level implementation still has the disadvantage of insufficient internal data throughput of the chip, and it is difficult to implement applications. For example, the image classification algorithm of yoloV2 requires 17.4G calculation times for each frame of image. According to this design, only under the condition of seamless data connection A processing speed of 1.15 frames/s can be achieved.
CNN神经网络加速器参考图1所示,现有的CNN神经网络加速器技术尚未成熟,主要问题在成本比较高、数据吞吐率低导致计算延时太大,对于实时性的应用和低成本性无法满足,另外CNN神经网络作为复杂系统,现有公开的设计思路基本上以一体化设计为主,没有进行可配置的模块化设计,会导致设计的更改、升级和移植效率低,减弱设计的重用性。CNN Neural Network Accelerator Refer to Figure 1. The existing CNN neural network accelerator technology is not yet mature. The main problem is that the cost is relatively high, and the data throughput rate is low, which leads to too much calculation delay, which cannot satisfy real-time applications and low cost. In addition, CNN neural network is a complex system. The existing public design ideas are basically integrated design, without configurable modular design, which will lead to design changes, upgrades and low transplantation efficiency, and reduce design reusability. .
发明内容Summary of the invention
本发明的目的在于提供一种基于FPGA的卷积参数加速装置、数据读写方法,解决现有技术中数据处理慢、数据吞吐量不足的技术问题。The purpose of the present invention is to provide an FPGA-based convolution parameter acceleration device and a data reading and writing method to solve the technical problems of slow data processing and insufficient data throughput in the prior art.
为了解决上述问题,本申请公开了一种基于FPGA的卷积参数数据读写方法,包括:In order to solve the above problems, this application discloses an FPGA-based method for reading and writing convolution parameter data, including:
判断是否为输入的一组卷积参数的最后一个,若不是最后一个,则写控制计数器自增,在第一随机读写存储器中为该组卷积参数中的每个分配地址;Judge whether it is the last one of a set of input convolution parameters, if it is not the last one, the write control counter is incremented, and an address is assigned to each of the set of convolution parameters in the first random read/write memory;
判断是否为输出的一组卷积参数的最后一个,若不是最后一个,所述第一随机读写存储器根据地址输出该组卷积参数中的一个,第一读控制计数器自增;判断是否完成该组卷积参数的预定次数的输出,若完成,则所述第一读控制计数器、第二读控制计数器清零。It is judged whether it is the last one of a set of output convolution parameters. If it is not the last one, the first random read/write memory outputs one of the set of convolution parameters according to the address, and the first read control counter increments automatically; judge whether it is completed If the output of the set of convolution parameters for a predetermined number of times is completed, the first read control counter and the second read control counter are cleared.
在一个优选例中,若是输入的一组卷积参数的最后一个,则所述写控制计数器清零。In a preferred example, if it is the last of a set of input convolution parameters, the write control counter is cleared.
在一个优选例中,若是输出的一组卷积参数的最后一个,且未完成该组 卷积参数的预订次数的输出,所述第一读控制计数器清零,所述第二读控制计数器自增1。In a preferred example, if it is the last one of a set of output convolution parameters, and the output of the set of convolution parameters has not been completed, the first read control counter is cleared, and the second read control counter is automatically reset. Increase by 1.
在一个优选例中,向所述第一随机读写存储器中写入一组卷积参数的同时,第二随机读写存储器输出另一组卷积参数;或,所述第一随机读写存储器输出一组卷积参数的同时,向第二随机读写存储器中写入另一组卷积参数。In a preferred example, while writing a set of convolution parameters to the first random access memory, the second random access memory outputs another set of convolution parameters; or, the first random access memory While outputting a set of convolution parameters, write another set of convolution parameters into the second random access memory.
在一个优选例中,一组卷积参数输入完成后还包括:判断是否为输入的另一组卷积参数的最后一个,若不是最后一个,则所述写控制计数器自增,在第二随机读写存储器中为该另一组卷积参数中的每个分配地址。In a preferred example, after the input of a group of convolution parameters is completed, it further includes: judging whether it is the last one of the input another group of convolution parameters. If it is not the last one, the write control counter is incremented automatically, and in the second random An address is allocated to each of the other set of convolution parameters in the read-write memory.
在一个优选例中,一组卷积参数输出完成后还包括:判断是否为输出的另一组卷积参数的最后一个,若不是最后一个,第二随机读写存储器输出该另一组卷积参数中的一个,所述第一读控制计数器自增。In a preferred example, after the output of a set of convolution parameters is completed, it further includes: judging whether it is the last one of another set of convolution parameters to be output, if not the last one, the second random read/write memory outputs the other set of convolution One of the parameters, the first read control counter increments automatically.
本申请还公开了一种基于FPGA的卷积参数加速装置包括:The application also discloses an FPGA-based convolution parameter acceleration device including:
至少一个随机读写存储器,配置为存储卷积参数;At least one random read-write memory, configured to store convolution parameters;
写地址控制单元,配置为判断是否为输入的一组卷积参数的最后一个,若不是最后一个,则写控制计数器自增,在第一随机读写存储器中为该组卷积参数中的每个分配地址;The write address control unit is configured to determine whether it is the last one of the input set of convolution parameters, if it is not the last one, the write control counter is incremented, and in the first random read/write memory, it is each set of convolution parameters. Allocation address
读地址控制单元,判断是否为输出的一组卷积参数的最后一个,若不是最后一个,所述第一随机读写存储器根据地址输出该组卷积参数中的一个,第一读控制计数器自增;判断是否完成该组卷积参数的预定次数的输出,若完成,则所述第一读控制计数器、第二读控制计数器清零。The read address control unit judges whether it is the last one of a set of output convolution parameters. If it is not the last one, the first random read/write memory outputs one of the set of convolution parameters according to the address, and the first read control counter automatically Increment; It is judged whether the output of the predetermined number of times of the set of convolution parameters is completed, and if completed, the first read control counter and the second read control counter are cleared.
在一个优选例中,包括第一和第二随机读写存储器,向所述第一随机读写存储器中写入一组卷积参数的同时,第二随机读写存储器输出另一组卷积参数;或,所述第一随机读写存储器输出一组卷积参数的同时,向第二随机读写存储器中写入另一组卷积参数。In a preferred example, it includes first and second random access memory, while writing a set of convolution parameters into the first random access memory, the second random access memory outputs another set of convolution parameters Or, while the first random access memory outputs a set of convolution parameters, another group of convolution parameters is written into the second random access memory.
在一个优选例中,包括第一和第二随机读写存储器;所述写地址控制单元还被配置为:判断是否为输入的另一组卷积参数的最后一个,若不是最后 一个,则在第二随机读写存储器中为该另一组卷积参数中的每个分配地址,所述写地址控制计数器自增。In a preferred example, it includes first and second random read-write memories; the write address control unit is also configured to: determine whether it is the last one of another set of input convolution parameters, if not the last one, An address is allocated to each of the other set of convolution parameters in the second random read/write memory, and the write address controls the counter to increment.
在一个优选例中,包括第一和第二随机读写存储器;读地址控制单元还被配置为:判断是否为输出的另一组卷积参数的最后一个,若不是最后一个,则第二随机读写存储器输出该另一组卷积参数中的一个,所述第一读控制计数器自增。In a preferred example, it includes first and second random read-write memories; the read address control unit is also configured to determine whether it is the last one of another set of output convolution parameters, if not the last one, then the second random The read-write memory outputs one of the other set of convolution parameters, and the first read control counter increases automatically.
相对于现有技术,本申请具有以下有益效果:Compared with the prior art, this application has the following beneficial effects:
本发明的基于FPGA的卷积参数加速装置使用最少的逻辑资源,形成最小化的卷积参数管理,该装置接口简单易使用、资源占用少、易移植,输入输出路径短,并且因内部使用两个随机读写存储器,可同时读写数据,连续输出,长期保持在峰值状态,从而可极大的提高并行度,做到数据的高吞吐率。The FPGA-based convolution parameter acceleration device of the present invention uses the least logic resources to form a minimized convolution parameter management. The device interface is simple and easy to use, less resource occupancy, easy to transplant, short input and output paths, and due to internal use of two A random read-write memory can read and write data at the same time, continuously output, and maintain the peak state for a long time, which can greatly improve the parallelism and achieve high data throughput.
附图说明Description of the drawings
图1示出了现有技术中CNN神经网络模型中卷积技术的过程图;Figure 1 shows a process diagram of the convolution technology in the CNN neural network model in the prior art;
图2示出了本发明一实施例中加速装置的示意图;Figure 2 shows a schematic diagram of an acceleration device in an embodiment of the present invention;
图3示出了本发明另一实施例中加速装置的示意图;Figure 3 shows a schematic diagram of an acceleration device in another embodiment of the present invention;
图4示出了本发明一实施例中数据写入的过程图;Figure 4 shows a process diagram of data writing in an embodiment of the present invention;
图5示出了本发明一实施例中数据输出的过程图。Fig. 5 shows a process diagram of data output in an embodiment of the present invention.
具体实施方式Detailed ways
在以下的叙述中,为了使读者更好地理解本申请而提出了许多技术细节。但是,本领域的普通技术人员可以理解,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请各权利要求所要求保护的技术方案。In the following description, many technical details are proposed in order to enable readers to better understand this application. However, those of ordinary skill in the art can understand that even without these technical details and various changes and modifications based on the following embodiments, the technical solutions required by the claims of this application can be implemented.
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发 明的实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.
部分概念的说明:Explanation of some concepts:
CNN:Convolutional Neural Networks,卷积神经网络CNN: Convolutional Neural Networks, Convolutional Neural Network
卷积参数:CNN中的卷积核参数Convolution parameters: Convolution kernel parameters in CNN
FPGA:现场可编程逻辑门阵列FPGA: Field Programmable Logic Gate Array
RAM,Random Access Memory,随机读写存储器RAM, Random Access Memory, random read-write memory
参考图2所示,本申请还公开了一种基于FPGA的卷积参数加速装置,所述加速装置100包括:As shown in FIG. 2, this application also discloses an FPGA-based convolution parameter acceleration device, the acceleration device 100 includes:
至少一个随机读写存储器(Random Access Memory,RAM),图2中示出了包括一个第一随机读写存储器101,配置为存储卷积参数;At least one random access memory (RAM), shown in FIG. 2 includes a first random access memory 101, configured to store convolution parameters;
写地址控制单元201,配置为判断是否为输入的一组卷积参数的最后一个,若不是最后一个,则写地址控制单元201中的写控制计数器(图中未示出)自增1,在第一随机读写存储器101中为该组卷积参数中的每个分配地址;The write address control unit 201 is configured to determine whether it is the last one of the input set of convolution parameters. If it is not the last one, the write control counter (not shown in the figure) in the write address control unit 201 increments by 1, in Allocate an address in the first random access memory 101 for each of the group of convolution parameters;
读地址控制单元202,配置为判断是否为输出的一组卷积参数的最后一个,若不是最后一个,所述第一随机读写存储器101根据地址输出该组卷积参数中的一个,读地址控制单元202中的第一读控制计数器自增1,并判断是否完成该组卷积参数的预定次数的输出,若完成,则读地址控制单元202中的第二读控制计数器(图中未示出)清零。本实施例中,使用最少的逻辑资源,形成最小化的卷积参数管理的加速单元,其接口简单易使用,资源占用少、易移植,输入输出路径短。The read address control unit 202 is configured to determine whether it is the last one of a set of output convolution parameters. If it is not the last one, the first random access memory 101 outputs one of the set of convolution parameters according to the address, and the read address The first read control counter in the control unit 202 increments by 1, and determines whether the output of the set of convolution parameters is completed for a predetermined number of times. If it is completed, the second read control counter in the address control unit 202 (not shown in the figure) is read. Out) cleared. In this embodiment, the least logical resources are used to form a minimized convolution parameter management acceleration unit, and its interface is simple and easy to use, with low resource occupation, easy transplantation, and short input and output paths.
在一个优选例中,参考图3所示,本申请的加速装置包括第一随机读写存储器101和第二随机读写存储器102,向所述第一随机读写存储器101中写入一组卷积参数的同时,第二随机读写存储器102输出另一组卷积参数;或,所述第一随机读写存储器101输出一组卷积参数的同时,向第二随机读 写存储器102中写入另一组卷积参数。In a preferred example, referring to FIG. 3, the acceleration device of the present application includes a first random read/write memory 101 and a second random read/write memory 102, and a set of volumes is written into the first random read/write memory 101 While accumulating parameters, the second random access memory 102 outputs another set of convolution parameters; or, the first random access memory 101 outputs a group of convolution parameters while writing to the second random access memory 102 Enter another set of convolution parameters.
在一个优选例中,包括第一随机读写存储器101和第二随机读写存储器102;所述写地址控制单元201还被配置为:判断是否为输入的另一组卷积参数的最后一个,若不是最后一个,则在第二随机读写存储器202中为该另一组卷积参数中的每个分配地址,所述写地址控制计数器自增1。In a preferred example, it includes a first random read/write memory 101 and a second random read/write memory 102; the write address control unit 201 is also configured to determine whether it is the last one of another set of input convolution parameters, If it is not the last one, an address is assigned to each of the other set of convolution parameters in the second random access memory 202, and the write address controls the counter to increment by one.
在一个优选例中,包括第一随机读写存储器101和第二随机读写存储器102;读地址控制单元202还被配置为:判断是否为输出的另一组卷积参数的最后一个,若不是最后一个,则第二随机读写存储器102输出该另一组卷积参数中的一个,所述读地址控制单元202中的第一读控制计数器自增。In a preferred example, it includes a first random read/write memory 101 and a second random read/write memory 102; the read address control unit 202 is also configured to determine whether it is the last one of another set of output convolution parameters, if not The last one, the second random read/write memory 102 outputs one of the other set of convolution parameters, and the first read control counter in the read address control unit 202 is incremented.
由于加速装置内部使用两个RAM,数据可同时读写,一个RAM用于写数据,另一个RAM用于读数据,从而可实现数据的并行处理,连续输出,长期保持在峰值状态。Because the acceleration device uses two RAMs inside, the data can be read and written at the same time, one RAM is used for writing data, and the other RAM is used for reading data, so that parallel processing of data can be realized, continuous output, and long-term retention of the peak state.
本申请的另一实施例中还公开了一种基于FPGA的数据读写方法,包括:Another embodiment of the present application also discloses an FPGA-based data reading and writing method, including:
参考图4所示,基于FPGA的数据写入方法包括:As shown in Figure 4, the FPGA-based data writing method includes:
步骤11中,判断是否有数据输入,若没有数据输入,进入步骤15,写控制单元的写控制计数器清零;In step 11, it is judged whether there is data input, if there is no data input, go to step 15, and the write control counter of the write control unit is cleared;
若是有数据输入,进入步骤12,判断是否为输入的一组卷积参数的最后一个,若不是最后一个,进入步骤13,则写控制计数器自增1,进入步骤S14,在第一随机读写存储器中为该组卷积参数中的每个分配地址;If there is data input, go to step 12 to determine whether it is the last one of the input set of convolution parameters. If it is not the last one, go to step 13, and the write control counter is incremented by one, and then go to step S14, in the first random read and write Allocate an address in the memory for each of the group of convolution parameters;
如是该组卷积参数的最后一个,进入步骤15,写控制单元的写控制计数器清零。If it is the last one of the group of convolution parameters, go to step 15, and the write control counter of the write control unit is cleared.
参考图5所示,基于FPGA的数据写入方法包括:As shown in Figure 5, the FPGA-based data writing method includes:
首先,判断是否数据输出21,若没有数据输出,进入步骤26,第一读控制计数器清零,第二读控制计数器清零。First, determine whether the data is output 21. If there is no data output, go to step 26, the first read control counter is cleared, and the second read control counter is cleared.
若有数据输出,进入步骤22,判断是否为输出的一组卷积参数的最后一个,若是该组卷积参数的最后一个,则进入步骤27,第一读控制计数器清零, 第二读控制计数器自增1。If there is data output, go to step 22 to determine whether it is the last one of the set of convolution parameters. If it is the last one of the set of convolution parameters, go to step 27, the first read control counter is cleared, and the second read control The counter increments by 1.
若不是最后一个,进入步骤23,所述第一随机读写存储器101根据地址输出该组卷积参数中的一个,同时进入步骤24,第一读控制计数器自增1;接着,重新进行步骤21。If it is not the last one, go to step 23, the first random read/write memory 101 outputs one of the group of convolution parameters according to the address, and at the same time go to step 24, the first read control counter increments by 1; then, repeat step 21 .
在一个优选例中,若是输出的一组卷积参数的最后一个,且未完成该组卷积参数的预订次数的输出,相当于步骤27,所述第一读控制计数器清零,所述第二读控制计数器自增1,表示完成了一个点的一组卷积参数的输出。In a preferred example, if it is the last one of a set of output convolution parameters, and the output of the set number of convolution parameters has not been completed, it is equivalent to step 27, the first read control counter is cleared, and the first read control counter is cleared. The second reading control counter increments by 1, indicating that the output of a set of convolution parameters for one point has been completed.
接着,进入步骤25,判断是否完成该组卷积参数的预定次数的输出,若未完成,则重新进入步骤21,判断是否输出数据。Then, go to step 25 to determine whether the output of the predetermined number of convolution parameters of the group of convolution parameters has been completed, and if not, enter step 21 again to determine whether to output data.
若完成该组卷积参数的预定次数的输出,则进入步骤26,所述第一读控制计数器、第二读控制计数器清零。If the output of the predetermined number of times of the set of convolution parameters is completed, then step 26 is entered, and the first read control counter and the second read control counter are cleared.
在一个优选例中,向所述第一随机读写存储器101中写入一组卷积参数的同时,第二随机读写存储器102输出另一组卷积参数;或,所述第一随机读写存储器101输出一组卷积参数的同时,向第二随机读写存储器102中写入另一组卷积参数。In a preferred example, while writing a set of convolution parameters to the first random read/write memory 101, the second random read/write memory 102 outputs another set of convolution parameters; or, the first random read While the write memory 101 outputs a set of convolution parameters, another set of convolution parameters is written into the second random access memory 102.
在一个优选例中,一组卷积参数输入完成后,此时第一随机读写存储器101中存储有一组卷积参数,之后还包括:判断是否为输入的另一组卷积参数的最后一个,若不是最后一个,则所述写控制计数器自增,在第二随机读写存储器中为该另一组卷积参数中的每个分配地址,将另一组卷积参数写入第二随机读写存储器,而第一随机读写存储器可以同时输出数据。In a preferred example, after the input of a set of convolution parameters is completed, a set of convolution parameters are stored in the first random read/write memory 101 at this time, and then further includes: determining whether it is the last one of the input another set of convolution parameters , If it is not the last one, the write control counter is incremented, an address is assigned to each of the other set of convolution parameters in the second random read/write memory, and another set of convolution parameters is written into the second random Read and write memory, and the first random read and write memory can output data at the same time.
在一个优选例中,一组卷积参数输出完成后,此时第一随机读写存储器101中输出完成一组卷积参数,之后还包括:判断是否为输出的另一组卷积参数的最后一个,若不是最后一个,第二随机读写存储器输出该另一组卷积参数中的一个,所述第一读控制计数器自增,从而第二随机读写存储器用于输出数据,而第一随机读写存储器可同时写入数据。In a preferred example, after the output of a set of convolution parameters is completed, at this time the output of a set of convolution parameters in the first random read/write memory 101 is completed, and then it also includes: judging whether it is the last set of output convolution parameters One, if it is not the last one, the second random read/write memory outputs one of the other set of convolution parameters, the first read control counter increments automatically, so that the second random read/write memory is used to output data, and the first Random read-write memory can write data at the same time.
由于加速装置内部使用两个RAM,数据可同时读写,一个RAM用于写 数据,另一个RAM用于读数据,从而可实现数据的并行处理,连续输出,长期保持在峰值状态。Because the acceleration device uses two RAMs, data can be read and written at the same time, one RAM is used to write data, and the other RAM is used to read data, so that parallel processing of data can be realized, continuous output, and long-term retention of the peak state.
第一实施方式是与本实施方式相对应的方法实施方式,第一实施方式中的技术细节可以应用于本实施方式,本实施方式中的技术细节也可以应用于第一实施方式。The first embodiment is a method embodiment corresponding to this embodiment. The technical details in the first embodiment can be applied to this embodiment, and the technical details in this embodiment can also be applied to the first embodiment.
需要说明的是,本领域技术人员应当理解,上述加速装置的实施方式中所示的各模块的实现功能可参照数据读写方法的相关描述而理解。加速装置的实施方式中所示的各模块的功能可通过运行于处理器上的程序(可执行指令)而实现,也可通过具体的逻辑电路而实现。本申请实施例加速装置如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。It should be noted that those skilled in the art should understand that the implementation functions of each module shown in the implementation of the acceleration device can be understood with reference to the relevant description of the data reading and writing method. The function of each module shown in the embodiment of the acceleration device can be realized by a program (executable instruction) running on the processor, or can be realized by a specific logic circuit. If the acceleration device of the embodiment of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application essentially or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, Read Only Memory (ROM, Read Only Memory), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific hardware and software combination.
相应地,本申请的另一实施方式通过一种FPGA可读存储介质中的配置文件实现。FPGA可读存储介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。FPGA配置文件的存储介质的例子包括但不限于,相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁 带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,FPGA可读存储介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Correspondingly, another implementation manner of the present application is implemented by a configuration file in an FPGA-readable storage medium. FPGA-readable storage media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of storage media for FPGA configuration files include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), Readable memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical Storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by computing devices. According to the definition in this article, FPGA-readable storage media does not include transitory media, such as modulated data signals and carrier waves.
需要说明的是,在本专利的申请文件中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。本专利的申请文件中,如果提到根据某要素执行某行为,则是指至少根据该要素执行该行为的意思,其中包括了两种情况:仅根据该要素执行该行为、和根据该要素和其它要素执行该行为。多个、多次、多种等表达包括2个、2次、2种以及2个以上、2次以上、2种以上。It should be noted that in the application documents of this patent, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is any such actual relationship or sequence between entities or operations. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or device that includes a series of elements includes not only those elements, but also includes Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the phrase "including one" does not exclude the existence of other same elements in the process, method, article, or equipment including the element. In the application documents of this patent, if it is mentioned that an act is performed based on a certain element, it means that the act is performed at least based on that element, which includes two situations: performing the act only based on the element, and performing the act based on the element and Other elements perform the behavior. Multiple, multiple, multiple, etc. expressions include two, two, two, and two or more, two or more, and two or more expressions.
在本说明书提及的所有文献都被认为是整体性地包括在本申请的公开内容中,以便在必要时可以作为修改的依据。此外应理解,以上所述仅为本说明书的较佳实施例而已,并非用于限定本说明书的保护范围。凡在本说明书一个或多个实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例的保护范围之内。All documents mentioned in this specification are considered to be included in the disclosure of this application as a whole, so that they can be used as a basis for modification when necessary. In addition, it should be understood that the above descriptions are only preferred embodiments of this specification, and are not intended to limit the protection scope of this specification. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of this specification shall be included in the protection scope of one or more embodiments of this specification.
在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。In some cases, the actions or steps described in the claims may be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown to achieve the desired result. In certain embodiments, multitasking and parallel processing are also possible or may be advantageous.

Claims (10)

  1. 一种基于FPGA的卷积参数数据读写方法,包括:An FPGA-based method for reading and writing convolution parameter data, including:
    判断是否为输入的一组卷积参数的最后一个,若不是最后一个,则写控制计数器自增,在第一随机读写存储器中为该组卷积参数中的每个分配地址;Judge whether it is the last one of a set of input convolution parameters, if it is not the last one, the write control counter is incremented, and an address is assigned to each of the set of convolution parameters in the first random read/write memory;
    判断是否为输出的一组卷积参数的最后一个,若不是最后一个,所述第一随机读写存储器根据地址输出该组卷积参数中的一个,第一读控制计数器自增;判断是否完成该组卷积参数的预定次数的输出,若完成,则所述第一读控制计数器、第二读控制计数器清零。It is judged whether it is the last one of a set of output convolution parameters. If it is not the last one, the first random read/write memory outputs one of the set of convolution parameters according to the address, and the first read control counter increments automatically; judge whether it is completed If the output of the set of convolution parameters for a predetermined number of times is completed, the first read control counter and the second read control counter are cleared.
  2. 如权利要求1所述的方法,其特征在于,若是输入的一组卷积参数的最后一个,则所述写控制计数器清零。The method according to claim 1, wherein if it is the last one of a set of input convolution parameters, the write control counter is cleared.
  3. 如权利要求1所述的方法,其特征在于,若是输出的一组卷积参数的最后一个,且未完成该组卷积参数的预订次数的输出,所述第一读控制计数器清零,所述第二读控制计数器自增1。The method of claim 1, wherein if it is the last one of a set of output convolution parameters, and the output of the set of convolution parameters is not completed, the first read control counter is cleared, so The second read control counter increments by 1.
  4. 如权利要求1所述的方法,其特征在于,向所述第一随机读写存储器中写入一组卷积参数的同时,第二随机读写存储器输出另一组卷积参数;或,所述第一随机读写存储器输出一组卷积参数的同时,向第二随机读写存储器中写入另一组卷积参数。The method according to claim 1, wherein while writing a set of convolution parameters to the first random access memory, the second random access memory outputs another set of convolution parameters; or, so While the first random access memory outputs a group of convolution parameters, another group of convolution parameters is written into the second random access memory.
  5. 如权利要求1所述的方法,其特征在于,一组卷积参数输入完成后还包括:判断是否为输入的另一组卷积参数的最后一个,若不是最后一个,则所述写控制计数器自增,在第二随机读写存储器中为该另一组卷积参数中的每个分配地址。The method of claim 1, wherein after the input of a set of convolution parameters is completed, the method further comprises: determining whether it is the last one of the input another set of convolution parameters, and if it is not the last one, the write control counter Self-increment, assign an address to each of the other set of convolution parameters in the second random access memory.
  6. 如权利要求1所述的方法,其特征在于,一组卷积参数输出完成后还包括:判断是否为输出的另一组卷积参数的最后一个,若不是最后一个,第二随机读写存储器输出该另一组卷积参数中的一个,所述第一读控制计数器自增。The method of claim 1, wherein after the output of a set of convolution parameters is completed, the method further comprises: determining whether it is the last one of another set of output convolution parameters, and if not the last one, the second random read/write memory One of the other set of convolution parameters is output, and the first read control counter is incremented.
  7. 一种基于FPGA的卷积参数加速装置,其特征在于,包括:An FPGA-based convolution parameter acceleration device, which is characterized in that it comprises:
    至少一个随机读写存储器,配置为存储卷积参数;At least one random read-write memory, configured to store convolution parameters;
    写地址控制单元,配置为判断是否为输入的一组卷积参数的最后一个,若不是最后一个,则写控制计数器自增,在第一随机读写存储器中为该组卷积参数中的每个分配地址;The write address control unit is configured to determine whether it is the last one of the input set of convolution parameters, if it is not the last one, the write control counter is incremented, and in the first random read/write memory, it is each set of convolution parameters. Allocation address
    读地址控制单元,判断是否为输出的一组卷积参数的最后一个,若不是最后一个,所述第一随机读写存储器根据地址输出该组卷积参数中的一个,第一读控制计数器自增;判断是否完成该组卷积参数的预定次数的输出,若完成,则所述第一读控制计数器、第二读控制计数器清零。The read address control unit judges whether it is the last one of a set of output convolution parameters. If it is not the last one, the first random read/write memory outputs one of the set of convolution parameters according to the address, and the first read control counter automatically Increment; It is judged whether the output of the predetermined number of times of the set of convolution parameters is completed, and if completed, the first read control counter and the second read control counter are cleared.
  8. 如权利要求7所述的装置,其特征在于,包括第一和第二随机读写存储器,向所述第一随机读写存储器中写入一组卷积参数的同时,第二随机读写存储器输出另一组卷积参数;或,所述第一随机读写存储器输出一组卷积参数的同时,向第二随机读写存储器中写入另一组卷积参数。The device according to claim 7, characterized in that it comprises a first and a second random read-write memory, while writing a set of convolution parameters into the first random read-write memory, the second random read-write memory Output another set of convolution parameters; or, when the first random read/write memory outputs a set of convolution parameters, write another set of convolution parameters into the second random read/write memory.
  9. 如权利要求7所述的装置,其特征在于,包括第一和第二随机读写存储器;所述写地址控制单元还被配置为:判断是否为输入的另一组卷积参数的最后一个,若不是最后一个,则在第二随机读写存储器中为该另一组卷积参数中的每个分配地址,所述写地址控制计数器自增。7. The device according to claim 7, characterized by comprising first and second random read/write memories; the write address control unit is further configured to: determine whether it is the last one of another set of input convolution parameters, If it is not the last one, an address is allocated to each of the other set of convolution parameters in the second random read/write memory, and the write address controls the counter to increment.
  10. 如权利要求7所述的装置,其特征在于,包括第一和第二随机读写存储器;读地址控制单元还被配置为:判断是否为输出的另一组卷积参数的最后一个,若不是最后一个,则第二随机读写存储器输出该另一组卷积参数中的一个,所述第一读控制计数器自增。The device according to claim 7, characterized in that it comprises a first and a second random read/write memory; the read address control unit is further configured to: determine whether it is the last one of another set of output convolution parameters, if not The last one, the second random read/write memory outputs one of the other set of convolution parameters, and the first read control counter increments automatically.
PCT/CN2019/126433 2019-08-01 2019-12-18 Fpga-based convolution parameter acceleration device and data read-write method WO2021017378A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910708612.9 2019-08-01
CN201910708612.9A CN110390392B (en) 2019-08-01 2019-08-01 Convolution parameter accelerating device based on FPGA and data reading and writing method

Publications (1)

Publication Number Publication Date
WO2021017378A1 true WO2021017378A1 (en) 2021-02-04

Family

ID=68288406

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126433 WO2021017378A1 (en) 2019-08-01 2019-12-18 Fpga-based convolution parameter acceleration device and data read-write method

Country Status (2)

Country Link
CN (1) CN110390392B (en)
WO (1) WO2021017378A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390392B (en) * 2019-08-01 2021-02-19 上海安路信息科技有限公司 Convolution parameter accelerating device based on FPGA and data reading and writing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341495A1 (en) * 2017-05-26 2018-11-29 Purdue Research Foundation Hardware Accelerator for Convolutional Neural Networks and Method of Operation Thereof
CN109409509A (en) * 2018-12-24 2019-03-01 济南浪潮高新科技投资发展有限公司 A kind of data structure and accelerated method for the convolutional neural networks accelerator based on FPGA
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN110390392A (en) * 2019-08-01 2019-10-29 上海安路信息科技有限公司 Deconvolution parameter accelerator, data read-write method based on FPGA

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5764374A (en) * 1996-02-05 1998-06-09 Hewlett-Packard Company System and method for lossless image compression having improved sequential determination of golomb parameter
EP1089475A1 (en) * 1999-09-28 2001-04-04 TELEFONAKTIEBOLAGET L M ERICSSON (publ) Converter and method for converting an input packet stream containing data with plural transmission rates into an output data symbol stream
CN100466601C (en) * 2005-04-28 2009-03-04 华为技术有限公司 Data read/write device and method
CN101257313B (en) * 2007-04-10 2010-05-26 深圳市同洲电子股份有限公司 Deconvolution interweave machine and method realized based on FPGA
CN104461934B (en) * 2014-11-07 2017-06-30 北京海尔集成电路设计有限公司 A kind of time solution convolutional interleave device and method of suitable DDR memory
CN106940815B (en) * 2017-02-13 2020-07-28 西安交通大学 Programmable convolutional neural network coprocessor IP core
CN108169727B (en) * 2018-01-03 2019-12-27 电子科技大学 Moving target radar scattering cross section measuring method based on FPGA
CN108154229B (en) * 2018-01-10 2022-04-08 西安电子科技大学 Image processing method based on FPGA (field programmable Gate array) accelerated convolutional neural network framework
CN109086867B (en) * 2018-07-02 2021-06-08 武汉魅瞳科技有限公司 Convolutional neural network acceleration system based on FPGA
CN109032781A (en) * 2018-07-13 2018-12-18 重庆邮电大学 A kind of FPGA parallel system of convolutional neural networks algorithm
CN109214281A (en) * 2018-07-30 2019-01-15 苏州神指微电子有限公司 A kind of CNN hardware accelerator for AI chip recognition of face
CN109359729B (en) * 2018-09-13 2022-02-22 深思考人工智能机器人科技(北京)有限公司 System and method for realizing data caching on FPGA
CN109711533B (en) * 2018-12-20 2023-04-28 西安电子科技大学 Convolutional neural network acceleration system based on FPGA

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341495A1 (en) * 2017-05-26 2018-11-29 Purdue Research Foundation Hardware Accelerator for Convolutional Neural Networks and Method of Operation Thereof
CN109409509A (en) * 2018-12-24 2019-03-01 济南浪潮高新科技投资发展有限公司 A kind of data structure and accelerated method for the convolutional neural networks accelerator based on FPGA
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN110390392A (en) * 2019-08-01 2019-10-29 上海安路信息科技有限公司 Deconvolution parameter accelerator, data read-write method based on FPGA

Also Published As

Publication number Publication date
CN110390392A (en) 2019-10-29
CN110390392B (en) 2021-02-19

Similar Documents

Publication Publication Date Title
US20210103818A1 (en) Neural network computing method, system and device therefor
CN110175140A (en) Fusion memory part and its operating method
US20180004690A1 (en) Efficient context based input/output (i/o) classification
CN104123171B (en) Virtual machine migrating method and system based on NUMA architecture
US10684946B2 (en) Method and device for on-chip repetitive addressing
EP3973401B1 (en) Interleaving memory requests to accelerate memory accesses
US20210295607A1 (en) Data reading/writing method and system in 3d image processing, storage medium and terminal
CN106910528A (en) A kind of optimization method and device of solid state hard disc data routing inspection
US11138104B2 (en) Selection of mass storage device streams for garbage collection based on logical saturation
WO2022199027A1 (en) Random write method, electronic device and storage medium
CN102413186A (en) Resource scheduling method and device based on private cloud computing, and cloud management server
TW202138999A (en) Data dividing method and processor for convolution operation
WO2021017378A1 (en) Fpga-based convolution parameter acceleration device and data read-write method
US11429299B2 (en) System and method for managing conversion of low-locality data into high-locality data
US11436486B2 (en) Neural network internal data fast access memory buffer
CN110569112B (en) Log data writing method and object storage daemon device
CN115759979B (en) Intelligent process processing method and system based on RPA and process mining
CN110618872A (en) Hybrid memory dynamic scheduling method and system
CN103761052A (en) Method for managing cache and storage device
TW201435586A (en) Flash memory apparatus, and method and device for managing data thereof
CN105912404B (en) A method of finding strong continune component in the large-scale graph data based on disk
CN111737190B (en) Dynamic software and hardware cooperation method of embedded system and embedded system
CN102541463B (en) Flash memory device and data access method thereof
US11442643B2 (en) System and method for efficiently converting low-locality data into high-locality data
CN112905239B (en) Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19939614

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19939614

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19939614

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.10.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19939614

Country of ref document: EP

Kind code of ref document: A1