CN110390392B

CN110390392B - Convolution parameter accelerating device based on FPGA and data reading and writing method

Info

Publication number: CN110390392B
Application number: CN201910708612.9A
Authority: CN
Inventors: 马向华; 马成森; 边立剑
Original assignee: Shanghai Anlogic Information Technology Co ltd
Current assignee: Shanghai Anlu Information Technology Co.,Ltd.
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2021-02-19
Anticipated expiration: 2039-08-01
Also published as: CN110390392A; WO2021017378A1

Abstract

The application discloses a convolution parameter accelerating device based on an FPGA and a data reading and writing method, wherein the method comprises the following steps: judging whether the convolution parameter is the last of a group of input convolution parameters or not, if not, automatically increasing a write control counter, and allocating an address to each convolution parameter in the group in a first random read-write memory; judging whether the convolution parameter is the last of the output group of convolution parameters, if not, the first random read-write memory outputs one of the group of convolution parameters according to the address, and the first read control counter is automatically increased; and judging whether the output of the set of convolution parameters for the preset times is finished or not, and if so, resetting the first reading control counter and the second reading control counter.

Description

Convolution parameter accelerating device based on FPGA and data reading and writing method

Technical Field

The invention relates to the technical field of integrated circuits, in particular to a convolution parameter accelerating device and a data reading and writing method based on an FPGA (field programmable gate array).

Background

The Convolutional Neural Network (CNN) is a mature technical scheme in the field of artificial intelligence, has the characteristic learning capacity, and can perform translation invariant classification on input information according to a hierarchical structure. With the proposal of deep learning theory and the improvement of numerical computation equipment, the CNN neural network is rapidly developed, is widely applied to the directions of computer vision, natural language processing and the like, is mainly used for the classification processing of targets, and has overwhelming advantages in the applications of image recognition, language recognition and the like.

At present, the implementation of the CNN neural network is mainly based on a computer platform, a CNN architecture is deployed on a development-end computer, then weight training is performed through mass data, and finally a proper weight coefficient is generated. If the product end needs to consider portability and practicability, a CNN neural network is not erected by a high-end server and a workstation generally, and embedded development becomes the first choice under the requirements of reducing cost and size.

Research on embedded development of CNN neural networks has been carried out in recent years. Digital Signal Processing (DSP) or arm (advanced RISC machines) is not considered due to the long computing time, so the research of the FPGA to design CNN neural network convolution accelerator becomes a hot direction, for example, FPGA parallel structure design of Convolutional Neural Network (CNN) algorithm (wang wei et al 2019.4 microelectronics and computer) and convolutional neural network accelerator design based on ZYNQ platform and its application research (dungshai 2018.5 beijing university of industry), the latter describes only a theoretical process, does not give an actual design model and performance analysis, the former proposes a specific neural network convolution accelerator model, which has a great improvement in performance through paper analysis, but also has a disadvantage of insufficient data throughput inside a chip for product level realization, and is difficult to realize application, for example, for image classification algorithm of yoloV2, 17.4G calculation times are required for each frame of image, and according to the design, the processing speed of only 1.15 frames/s can be realized under the condition of data seamless connection.

As shown in fig. 1, the conventional CNN neural network accelerator is not mature in technology, and has the main problems that the cost is high, the data throughput rate is low, the calculation delay is too long, and the real-time application and the low cost cannot be met.

Disclosure of Invention

The invention aims to provide a convolution parameter accelerating device and a data reading and writing method based on an FPGA (field programmable gate array), and solves the technical problems of slow data processing and insufficient data throughput in the prior art.

In order to solve the above problems, the present application discloses a convolution parameter data read-write method based on an FPGA, including:

judging whether the convolution parameter is the last of a group of input convolution parameters or not, if not, automatically increasing a write control counter, and allocating an address to each convolution parameter in the group in a first random read-write memory;

judging whether the convolution parameter is the last of the output group of convolution parameters, if not, the first random read-write memory outputs one of the group of convolution parameters according to the address, and the first read control counter is automatically increased; and judging whether the output of the set of convolution parameters for the preset times is finished or not, and if so, resetting the first reading control counter and the second reading control counter.

In a preferred embodiment, the write control counter is cleared if the last convolution parameter of the input set is the last convolution parameter of the input set.

In a preferred embodiment, if the last convolution parameter of the set of convolution parameters is output and the output of the set of convolution parameters for a predetermined number of times is not completed, the first read control counter is cleared and the second read control counter is self-incremented by 1.

In a preferred embodiment, while writing a set of convolution parameters into the first random access memory, the second random access memory outputs another set of convolution parameters; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.

In a preferred embodiment, the method further includes, after completing inputting the set of convolution parameters: and judging whether the current data is the last one of the other input set of convolution parameters, if not, automatically increasing the write control counter, and allocating an address to each of the other set of convolution parameters in the second random access memory.

In a preferred embodiment, the method further includes, after the outputting of the set of convolution parameters is completed: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.

The application also discloses a convolution parameter accelerating device based on FPGA includes:

at least one random access memory configured to store convolution parameters;

a write address control unit configured to determine whether the last of the set of input convolution parameters is present, and if not, the write control counter is incremented by itself, and an address is allocated to each of the set of convolution parameters in the first random access memory;

the read address control unit judges whether the output convolution parameter is the last convolution parameter, if not, the first random read-write memory outputs one convolution parameter of the group according to the address, and the first read control counter is self-increased; and judging whether the output of the set of convolution parameters for the preset times is finished or not, and if so, resetting the first reading control counter and the second reading control counter.

In a preferred embodiment, the device comprises a first random read-write memory and a second random read-write memory, wherein the second random read-write memory outputs another set of convolution parameters while writing a set of convolution parameters into the first random read-write memory; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.

In a preferred embodiment, the device comprises a first random access memory and a second random access memory; the write address control unit is further configured to: and judging whether the current volume parameter is the last volume parameter of the other input group, if not, allocating an address to each volume parameter of the other group in the second random read-write memory, and controlling the self-increment of the counter by the write address.

In a preferred embodiment, the device comprises a first random access memory and a second random access memory; the read address control unit is further configured to: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.

Compared with the prior art, the method has the following beneficial effects:

the convolution parameter accelerating device based on the FPGA uses the least logic resources to form the minimized convolution parameter management, the interface of the device is simple and easy to use, the resource occupation is less, the transplantation is easy, the input and output path is short, and two random read-write memories are used in the device, so that the data can be read and written at the same time, continuously output and kept in the peak state for a long time, the parallelism can be greatly improved, and the high throughput rate of the data can be realized.

Drawings

FIG. 1 illustrates a process diagram of a convolution technique in a CNN neural network model in the prior art;

FIG. 2 is a schematic diagram of an acceleration device in accordance with an embodiment of the present invention;

FIG. 3 shows a schematic view of an accelerator apparatus according to another embodiment of the invention;

FIG. 4 is a process diagram illustrating the writing of data in one embodiment of the invention;

FIG. 5 shows a process diagram for data output in one embodiment of the invention.

Detailed Description

In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Description of partial concepts:

CNN: convolutional Neural Networks

Convolution parameters: convolution kernel parameters in CNN

FPGA: field programmable logic gate array

RAM, Random Access Memory

Referring to fig. 2, the present application further discloses an FPGA-based convolution parameter acceleration apparatus, where the acceleration apparatus 100 includes:

at least one Random Access Memory (RAM), shown in fig. 2 as including a first RAM 101 configured to store convolution parameters;

a write address control unit 201, configured to determine whether the last convolution parameter is the last convolution parameter, and if the last convolution parameter is not the last convolution parameter, a write control counter (not shown in the figure) in the write address control unit 201 increments by 1, and allocates an address to each convolution parameter in the first random access memory 101;

the read address control unit 202 is configured to determine whether the output convolution parameter is the last convolution parameter, if the output convolution parameter is not the last convolution parameter, the first random access memory 101 outputs one convolution parameter of the set according to the address, a first read control counter in the read address control unit 202 increments by 1, and determines whether the output of the set of convolution parameters for a predetermined number of times is completed, and if the output convolution parameter is completed, a second read control counter (not shown in the figure) in the read address control unit 202 is cleared. In the embodiment, the minimum logic resources are used to form the minimum acceleration unit for managing the convolution parameters, the interface is simple and easy to use, the resource occupation is small, the migration is easy, and the input and output path is short.

In a preferred example, referring to fig. 3, the acceleration apparatus of the present application includes a first random access memory 101 and a second random access memory 102, where the second random access memory 102 outputs another set of convolution parameters while writing a set of convolution parameters into the first random access memory 101; or, while the first random access memory 101 outputs a set of convolution parameters, another set of convolution parameters is written into the second random access memory 102.

In a preferred embodiment, the device comprises a first random access memory 101 and a second random access memory 102; the write address control unit 201 is further configured to: and judging whether the current data is the last one of the other input set of convolution parameters, if not, allocating an address to each of the other set of convolution parameters in the second random access memory 202, wherein the write address controls the self-increment of the counter by 1.

In a preferred embodiment, the device comprises a first random access memory 101 and a second random access memory 102; the read address control unit 202 is further configured to: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random access memory 102, and self-incrementing a first read control counter in the read address control unit 202.

Because the accelerator uses two RAMs, the data can be read and written at the same time, one RAM is used for writing data, and the other RAM is used for reading data, thereby realizing the parallel processing of data, continuous output and long-term maintenance in a peak value state.

In another embodiment of the present application, a data read-write method based on an FPGA is further disclosed, including:

referring to fig. 4, the FPGA-based data writing method includes:

in step 11, judging whether data are input, if no data are input, entering step 15, and resetting a write control counter of the write control unit;

if data is input, the process proceeds to step 12, and determines whether the data is the last of a set of input convolution parameters, and if not, the process proceeds to step 13, where the write control counter is incremented by 1, the process proceeds to step S14, and an address is assigned to each of the set of input convolution parameters in the first random access memory;

if it is the last of the set of convolution parameters, the process proceeds to step 15, and the write control counter of the write control unit is cleared.

Referring to fig. 5, the FPGA-based data writing method includes:

first, it is determined whether data is output 21, and if no data is output, the process proceeds to step 26, where the first read control counter is cleared and the second read control counter is cleared.

If there is data output, the process proceeds to step 22, and determines whether it is the last convolution parameter of the output set, and if it is the last convolution parameter of the output set, the process proceeds to step 27, where the first read control counter is cleared and the second read control counter is incremented by 1.

If not, entering step 23, the first random access memory 101 outputs one of the convolution parameters according to the address, and simultaneously entering step 24, the first read control counter is self-incremented by 1; then, step 21 is performed again.

In a preferred embodiment, if the last convolution parameter of the set of convolution parameters is output and the output of the set of convolution parameters is not completed for a predetermined number of times, corresponding to step 27, the first read control counter is cleared, and the second read control counter is incremented by 1, which indicates that the output of the set of convolution parameters for one point is completed.

Then, the process proceeds to step 25 to determine whether the output of the set of convolution parameters for a predetermined number of times is completed, and if not, the process proceeds to step 21 again to determine whether to output data.

If the output of the set of convolution parameters for the predetermined number of times is completed, step 26 is entered, and the first read control counter and the second read control counter are cleared.

In a preferred embodiment, while writing a set of convolution parameters into the first random access memory 101, the second random access memory 102 outputs another set of convolution parameters; or, while the first random access memory 101 outputs a set of convolution parameters, another set of convolution parameters is written into the second random access memory 102.

In a preferred embodiment, after the inputting of a set of convolution parameters is completed, at this time, the first random access memory 101 stores a set of convolution parameters, and then the method further includes: and judging whether the data is the last one of the other input groups of convolution parameters, if not, automatically increasing the write control counter, allocating an address to each of the other groups of convolution parameters in the second random read-write memory, writing the other groups of convolution parameters into the second random read-write memory, and simultaneously outputting data by the first random read-write memory.

In a preferred embodiment, after the outputting of the set of convolution parameters is completed, the outputting of the set of convolution parameters in the first random access memory 101 is completed, and the method further includes: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and controlling the counter to increase automatically by the first random read-write memory, so that the second random read-write memory is used for outputting data, and the first random read-write memory can write data simultaneously.

The first embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the first embodiment may be applied to the present embodiment, and the technical details in the present embodiment may also be applied to the first embodiment.

It should be noted that, as will be understood by those skilled in the art, the implementation functions of the modules shown in the above embodiments of the acceleration apparatus can be understood by referring to the description of the data reading and writing method. The functions of the respective modules shown in the embodiment of the acceleration apparatus may be realized by a program (executable instructions) running on a processor, and may also be realized by a specific logic circuit. The acceleration device of the embodiment of the present application, if implemented in the form of a software functional module and sold or used as a standalone product, may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Accordingly, another embodiment of the present application is implemented by a configuration file in an FPGA-readable storage medium. FPGA-readable storage media, including persistent and non-persistent, removable and non-removable media, can implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for FPGA profiles include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, FPGA-readable storage media does not include transitory computer-readable media (transient media), such as modulated data signals and carrier waves.

It is noted that, in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that a certain action is executed according to a certain element, it means that the action is executed according to at least the element, and two cases are included: performing the action based only on the element, and performing the action based on the element and other elements. The expression of a plurality of, a plurality of and the like includes 2, 2 and more than 2, more than 2 and more than 2.

All documents mentioned in this specification are to be considered as being incorporated in their entirety into the disclosure of the present application so as to be subject to modification as necessary. It should be understood that the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.

In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims

1. A convolution parameter data read-write method based on FPGA includes:

judging whether data is input or not, and if not, resetting the write controller; if yes, judging whether the convolution parameter is the last convolution parameter of the input group, if not, automatically increasing a write control counter, allocating an address to each convolution parameter of the group in a first random read-write memory, and if the convolution parameter is the last convolution parameter, resetting a write controller;

judging whether data is output or not, if so, judging whether the data is the last convolution parameter of a group of output parameters, if not, outputting one convolution parameter of the group of convolution parameters by the first random read-write memory according to an address, and automatically increasing a first read control counter; if the convolution parameter is the last of the output group of convolution parameters, judging whether the output of the group of convolution parameters for a preset number of times is finished, if not, resetting the first read control counter and self-increasing the second read control counter by 1, and if so, resetting the first read control counter and the second read control counter;

further comprising: while writing a group of convolution parameters into the first random read-write memory, the second random read-write memory outputs another group of convolution parameters; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.

2. The method of claim 1, wherein the write control counter is cleared if it is the last of a set of convolution parameters entered.

3. The method of claim 1, wherein the outputting of the set of convolution parameters after completion further comprises: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.

4. A convolution parameter accelerating device based on FPGA is characterized by comprising:

at least one random access memory configured to store convolution parameters;

a write address control unit configured to: judging whether data is input or not, and if not, resetting the write controller; if yes, judging whether the convolution parameter is the last convolution parameter of the input group, if not, automatically increasing a write control counter, allocating an address to each convolution parameter of the group in a first random read-write memory, and if the convolution parameter is the last convolution parameter, resetting a write controller;

a read address control unit configured to: judging whether data is output or not, if so, judging whether the data is the last convolution parameter of a group of output parameters, if not, outputting one convolution parameter of the group of convolution parameters by the first random read-write memory according to an address, and automatically increasing a first read control counter; if the convolution parameter is the last of the output group of convolution parameters, judging whether the output of the group of convolution parameters for a preset number of times is finished, if not, resetting the first read control counter and self-increasing the second read control counter by 1, and if so, resetting the first read control counter and the second read control counter;

the device comprises a first random read-write memory and a second random read-write memory, wherein the second random read-write memory outputs another group of convolution parameters while writing a group of convolution parameters into the first random read-write memory; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.

5. The apparatus of claim 4, comprising first and second random access memories; the write address control unit is further configured to: and judging whether the current volume parameter is the last volume parameter of the other input group, if not, allocating an address to each volume parameter of the other group in the second random read-write memory, and controlling the self-increment of the counter by the write address.

6. The apparatus of claim 4, comprising first and second random access memories; the read address control unit is further configured to: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.