CN110390392B - Convolution parameter accelerating device based on FPGA and data reading and writing method - Google Patents
Convolution parameter accelerating device based on FPGA and data reading and writing method Download PDFInfo
- Publication number
- CN110390392B CN110390392B CN201910708612.9A CN201910708612A CN110390392B CN 110390392 B CN110390392 B CN 110390392B CN 201910708612 A CN201910708612 A CN 201910708612A CN 110390392 B CN110390392 B CN 110390392B
- Authority
- CN
- China
- Prior art keywords
- convolution
- group
- read
- parameter
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
- Complex Calculations (AREA)
- Advance Control (AREA)
Abstract
The application discloses a convolution parameter accelerating device based on an FPGA and a data reading and writing method, wherein the method comprises the following steps: judging whether the convolution parameter is the last of a group of input convolution parameters or not, if not, automatically increasing a write control counter, and allocating an address to each convolution parameter in the group in a first random read-write memory; judging whether the convolution parameter is the last of the output group of convolution parameters, if not, the first random read-write memory outputs one of the group of convolution parameters according to the address, and the first read control counter is automatically increased; and judging whether the output of the set of convolution parameters for the preset times is finished or not, and if so, resetting the first reading control counter and the second reading control counter.
Description
Technical Field
The invention relates to the technical field of integrated circuits, in particular to a convolution parameter accelerating device and a data reading and writing method based on an FPGA (field programmable gate array).
Background
The Convolutional Neural Network (CNN) is a mature technical scheme in the field of artificial intelligence, has the characteristic learning capacity, and can perform translation invariant classification on input information according to a hierarchical structure. With the proposal of deep learning theory and the improvement of numerical computation equipment, the CNN neural network is rapidly developed, is widely applied to the directions of computer vision, natural language processing and the like, is mainly used for the classification processing of targets, and has overwhelming advantages in the applications of image recognition, language recognition and the like.
At present, the implementation of the CNN neural network is mainly based on a computer platform, a CNN architecture is deployed on a development-end computer, then weight training is performed through mass data, and finally a proper weight coefficient is generated. If the product end needs to consider portability and practicability, a CNN neural network is not erected by a high-end server and a workstation generally, and embedded development becomes the first choice under the requirements of reducing cost and size.
Research on embedded development of CNN neural networks has been carried out in recent years. Digital Signal Processing (DSP) or arm (advanced RISC machines) is not considered due to the long computing time, so the research of the FPGA to design CNN neural network convolution accelerator becomes a hot direction, for example, FPGA parallel structure design of Convolutional Neural Network (CNN) algorithm (wang wei et al 2019.4 microelectronics and computer) and convolutional neural network accelerator design based on ZYNQ platform and its application research (dungshai 2018.5 beijing university of industry), the latter describes only a theoretical process, does not give an actual design model and performance analysis, the former proposes a specific neural network convolution accelerator model, which has a great improvement in performance through paper analysis, but also has a disadvantage of insufficient data throughput inside a chip for product level realization, and is difficult to realize application, for example, for image classification algorithm of yoloV2, 17.4G calculation times are required for each frame of image, and according to the design, the processing speed of only 1.15 frames/s can be realized under the condition of data seamless connection.
As shown in fig. 1, the conventional CNN neural network accelerator is not mature in technology, and has the main problems that the cost is high, the data throughput rate is low, the calculation delay is too long, and the real-time application and the low cost cannot be met.
Disclosure of Invention
The invention aims to provide a convolution parameter accelerating device and a data reading and writing method based on an FPGA (field programmable gate array), and solves the technical problems of slow data processing and insufficient data throughput in the prior art.
In order to solve the above problems, the present application discloses a convolution parameter data read-write method based on an FPGA, including:
judging whether the convolution parameter is the last of a group of input convolution parameters or not, if not, automatically increasing a write control counter, and allocating an address to each convolution parameter in the group in a first random read-write memory;
judging whether the convolution parameter is the last of the output group of convolution parameters, if not, the first random read-write memory outputs one of the group of convolution parameters according to the address, and the first read control counter is automatically increased; and judging whether the output of the set of convolution parameters for the preset times is finished or not, and if so, resetting the first reading control counter and the second reading control counter.
In a preferred embodiment, the write control counter is cleared if the last convolution parameter of the input set is the last convolution parameter of the input set.
In a preferred embodiment, if the last convolution parameter of the set of convolution parameters is output and the output of the set of convolution parameters for a predetermined number of times is not completed, the first read control counter is cleared and the second read control counter is self-incremented by 1.
In a preferred embodiment, while writing a set of convolution parameters into the first random access memory, the second random access memory outputs another set of convolution parameters; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.
In a preferred embodiment, the method further includes, after completing inputting the set of convolution parameters: and judging whether the current data is the last one of the other input set of convolution parameters, if not, automatically increasing the write control counter, and allocating an address to each of the other set of convolution parameters in the second random access memory.
In a preferred embodiment, the method further includes, after the outputting of the set of convolution parameters is completed: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.
The application also discloses a convolution parameter accelerating device based on FPGA includes:
at least one random access memory configured to store convolution parameters;
a write address control unit configured to determine whether the last of the set of input convolution parameters is present, and if not, the write control counter is incremented by itself, and an address is allocated to each of the set of convolution parameters in the first random access memory;
the read address control unit judges whether the output convolution parameter is the last convolution parameter, if not, the first random read-write memory outputs one convolution parameter of the group according to the address, and the first read control counter is self-increased; and judging whether the output of the set of convolution parameters for the preset times is finished or not, and if so, resetting the first reading control counter and the second reading control counter.
In a preferred embodiment, the device comprises a first random read-write memory and a second random read-write memory, wherein the second random read-write memory outputs another set of convolution parameters while writing a set of convolution parameters into the first random read-write memory; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.
In a preferred embodiment, the device comprises a first random access memory and a second random access memory; the write address control unit is further configured to: and judging whether the current volume parameter is the last volume parameter of the other input group, if not, allocating an address to each volume parameter of the other group in the second random read-write memory, and controlling the self-increment of the counter by the write address.
In a preferred embodiment, the device comprises a first random access memory and a second random access memory; the read address control unit is further configured to: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.
Compared with the prior art, the method has the following beneficial effects:
the convolution parameter accelerating device based on the FPGA uses the least logic resources to form the minimized convolution parameter management, the interface of the device is simple and easy to use, the resource occupation is less, the transplantation is easy, the input and output path is short, and two random read-write memories are used in the device, so that the data can be read and written at the same time, continuously output and kept in the peak state for a long time, the parallelism can be greatly improved, and the high throughput rate of the data can be realized.
Drawings
FIG. 1 illustrates a process diagram of a convolution technique in a CNN neural network model in the prior art;
FIG. 2 is a schematic diagram of an acceleration device in accordance with an embodiment of the present invention;
FIG. 3 shows a schematic view of an accelerator apparatus according to another embodiment of the invention;
FIG. 4 is a process diagram illustrating the writing of data in one embodiment of the invention;
FIG. 5 shows a process diagram for data output in one embodiment of the invention.
Detailed Description
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Description of partial concepts:
CNN: convolutional Neural Networks
Convolution parameters: convolution kernel parameters in CNN
FPGA: field programmable logic gate array
RAM, Random Access Memory
Referring to fig. 2, the present application further discloses an FPGA-based convolution parameter acceleration apparatus, where the acceleration apparatus 100 includes:
at least one Random Access Memory (RAM), shown in fig. 2 as including a first RAM 101 configured to store convolution parameters;
a write address control unit 201, configured to determine whether the last convolution parameter is the last convolution parameter, and if the last convolution parameter is not the last convolution parameter, a write control counter (not shown in the figure) in the write address control unit 201 increments by 1, and allocates an address to each convolution parameter in the first random access memory 101;
the read address control unit 202 is configured to determine whether the output convolution parameter is the last convolution parameter, if the output convolution parameter is not the last convolution parameter, the first random access memory 101 outputs one convolution parameter of the set according to the address, a first read control counter in the read address control unit 202 increments by 1, and determines whether the output of the set of convolution parameters for a predetermined number of times is completed, and if the output convolution parameter is completed, a second read control counter (not shown in the figure) in the read address control unit 202 is cleared. In the embodiment, the minimum logic resources are used to form the minimum acceleration unit for managing the convolution parameters, the interface is simple and easy to use, the resource occupation is small, the migration is easy, and the input and output path is short.
In a preferred example, referring to fig. 3, the acceleration apparatus of the present application includes a first random access memory 101 and a second random access memory 102, where the second random access memory 102 outputs another set of convolution parameters while writing a set of convolution parameters into the first random access memory 101; or, while the first random access memory 101 outputs a set of convolution parameters, another set of convolution parameters is written into the second random access memory 102.
In a preferred embodiment, the device comprises a first random access memory 101 and a second random access memory 102; the write address control unit 201 is further configured to: and judging whether the current data is the last one of the other input set of convolution parameters, if not, allocating an address to each of the other set of convolution parameters in the second random access memory 202, wherein the write address controls the self-increment of the counter by 1.
In a preferred embodiment, the device comprises a first random access memory 101 and a second random access memory 102; the read address control unit 202 is further configured to: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random access memory 102, and self-incrementing a first read control counter in the read address control unit 202.
Because the accelerator uses two RAMs, the data can be read and written at the same time, one RAM is used for writing data, and the other RAM is used for reading data, thereby realizing the parallel processing of data, continuous output and long-term maintenance in a peak value state.
In another embodiment of the present application, a data read-write method based on an FPGA is further disclosed, including:
referring to fig. 4, the FPGA-based data writing method includes:
in step 11, judging whether data are input, if no data are input, entering step 15, and resetting a write control counter of the write control unit;
if data is input, the process proceeds to step 12, and determines whether the data is the last of a set of input convolution parameters, and if not, the process proceeds to step 13, where the write control counter is incremented by 1, the process proceeds to step S14, and an address is assigned to each of the set of input convolution parameters in the first random access memory;
if it is the last of the set of convolution parameters, the process proceeds to step 15, and the write control counter of the write control unit is cleared.
Referring to fig. 5, the FPGA-based data writing method includes:
first, it is determined whether data is output 21, and if no data is output, the process proceeds to step 26, where the first read control counter is cleared and the second read control counter is cleared.
If there is data output, the process proceeds to step 22, and determines whether it is the last convolution parameter of the output set, and if it is the last convolution parameter of the output set, the process proceeds to step 27, where the first read control counter is cleared and the second read control counter is incremented by 1.
If not, entering step 23, the first random access memory 101 outputs one of the convolution parameters according to the address, and simultaneously entering step 24, the first read control counter is self-incremented by 1; then, step 21 is performed again.
In a preferred embodiment, if the last convolution parameter of the set of convolution parameters is output and the output of the set of convolution parameters is not completed for a predetermined number of times, corresponding to step 27, the first read control counter is cleared, and the second read control counter is incremented by 1, which indicates that the output of the set of convolution parameters for one point is completed.
Then, the process proceeds to step 25 to determine whether the output of the set of convolution parameters for a predetermined number of times is completed, and if not, the process proceeds to step 21 again to determine whether to output data.
If the output of the set of convolution parameters for the predetermined number of times is completed, step 26 is entered, and the first read control counter and the second read control counter are cleared.
In a preferred embodiment, while writing a set of convolution parameters into the first random access memory 101, the second random access memory 102 outputs another set of convolution parameters; or, while the first random access memory 101 outputs a set of convolution parameters, another set of convolution parameters is written into the second random access memory 102.
In a preferred embodiment, after the inputting of a set of convolution parameters is completed, at this time, the first random access memory 101 stores a set of convolution parameters, and then the method further includes: and judging whether the data is the last one of the other input groups of convolution parameters, if not, automatically increasing the write control counter, allocating an address to each of the other groups of convolution parameters in the second random read-write memory, writing the other groups of convolution parameters into the second random read-write memory, and simultaneously outputting data by the first random read-write memory.
In a preferred embodiment, after the outputting of the set of convolution parameters is completed, the outputting of the set of convolution parameters in the first random access memory 101 is completed, and the method further includes: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and controlling the counter to increase automatically by the first random read-write memory, so that the second random read-write memory is used for outputting data, and the first random read-write memory can write data simultaneously.
Because the accelerator uses two RAMs, the data can be read and written at the same time, one RAM is used for writing data, and the other RAM is used for reading data, thereby realizing the parallel processing of data, continuous output and long-term maintenance in a peak value state.
The first embodiment is a method embodiment corresponding to the present embodiment, and the technical details in the first embodiment may be applied to the present embodiment, and the technical details in the present embodiment may also be applied to the first embodiment.
It should be noted that, as will be understood by those skilled in the art, the implementation functions of the modules shown in the above embodiments of the acceleration apparatus can be understood by referring to the description of the data reading and writing method. The functions of the respective modules shown in the embodiment of the acceleration apparatus may be realized by a program (executable instructions) running on a processor, and may also be realized by a specific logic circuit. The acceleration device of the embodiment of the present application, if implemented in the form of a software functional module and sold or used as a standalone product, may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
Accordingly, another embodiment of the present application is implemented by a configuration file in an FPGA-readable storage medium. FPGA-readable storage media, including persistent and non-persistent, removable and non-removable media, can implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for FPGA profiles include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, FPGA-readable storage media does not include transitory computer-readable media (transient media), such as modulated data signals and carrier waves.
It is noted that, in the present patent application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element. In the present patent application, if it is mentioned that a certain action is executed according to a certain element, it means that the action is executed according to at least the element, and two cases are included: performing the action based only on the element, and performing the action based on the element and other elements. The expression of a plurality of, a plurality of and the like includes 2, 2 and more than 2, more than 2 and more than 2.
All documents mentioned in this specification are to be considered as being incorporated in their entirety into the disclosure of the present application so as to be subject to modification as necessary. It should be understood that the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present disclosure should be included in the scope of protection of one or more embodiments of the present disclosure.
In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Claims (6)
1. A convolution parameter data read-write method based on FPGA includes:
judging whether data is input or not, and if not, resetting the write controller; if yes, judging whether the convolution parameter is the last convolution parameter of the input group, if not, automatically increasing a write control counter, allocating an address to each convolution parameter of the group in a first random read-write memory, and if the convolution parameter is the last convolution parameter, resetting a write controller;
judging whether data is output or not, if so, judging whether the data is the last convolution parameter of a group of output parameters, if not, outputting one convolution parameter of the group of convolution parameters by the first random read-write memory according to an address, and automatically increasing a first read control counter; if the convolution parameter is the last of the output group of convolution parameters, judging whether the output of the group of convolution parameters for a preset number of times is finished, if not, resetting the first read control counter and self-increasing the second read control counter by 1, and if so, resetting the first read control counter and the second read control counter;
further comprising: while writing a group of convolution parameters into the first random read-write memory, the second random read-write memory outputs another group of convolution parameters; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.
2. The method of claim 1, wherein the write control counter is cleared if it is the last of a set of convolution parameters entered.
3. The method of claim 1, wherein the outputting of the set of convolution parameters after completion further comprises: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.
4. A convolution parameter accelerating device based on FPGA is characterized by comprising:
at least one random access memory configured to store convolution parameters;
a write address control unit configured to: judging whether data is input or not, and if not, resetting the write controller; if yes, judging whether the convolution parameter is the last convolution parameter of the input group, if not, automatically increasing a write control counter, allocating an address to each convolution parameter of the group in a first random read-write memory, and if the convolution parameter is the last convolution parameter, resetting a write controller;
a read address control unit configured to: judging whether data is output or not, if so, judging whether the data is the last convolution parameter of a group of output parameters, if not, outputting one convolution parameter of the group of convolution parameters by the first random read-write memory according to an address, and automatically increasing a first read control counter; if the convolution parameter is the last of the output group of convolution parameters, judging whether the output of the group of convolution parameters for a preset number of times is finished, if not, resetting the first read control counter and self-increasing the second read control counter by 1, and if so, resetting the first read control counter and the second read control counter;
the device comprises a first random read-write memory and a second random read-write memory, wherein the second random read-write memory outputs another group of convolution parameters while writing a group of convolution parameters into the first random read-write memory; or, the first random read-write memory outputs a group of convolution parameters and writes another group of convolution parameters into the second random read-write memory at the same time.
5. The apparatus of claim 4, comprising first and second random access memories; the write address control unit is further configured to: and judging whether the current volume parameter is the last volume parameter of the other input group, if not, allocating an address to each volume parameter of the other group in the second random read-write memory, and controlling the self-increment of the counter by the write address.
6. The apparatus of claim 4, comprising first and second random access memories; the read address control unit is further configured to: and judging whether the output is the last convolution parameter of the other output group, if not, outputting one convolution parameter of the other output group by the second random read-write memory, and automatically increasing the first read control counter.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910708612.9A CN110390392B (en) | 2019-08-01 | 2019-08-01 | Convolution parameter accelerating device based on FPGA and data reading and writing method |
PCT/CN2019/126433 WO2021017378A1 (en) | 2019-08-01 | 2019-12-18 | Fpga-based convolution parameter acceleration device and data read-write method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910708612.9A CN110390392B (en) | 2019-08-01 | 2019-08-01 | Convolution parameter accelerating device based on FPGA and data reading and writing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110390392A CN110390392A (en) | 2019-10-29 |
CN110390392B true CN110390392B (en) | 2021-02-19 |
Family
ID=68288406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910708612.9A Active CN110390392B (en) | 2019-08-01 | 2019-08-01 | Convolution parameter accelerating device based on FPGA and data reading and writing method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110390392B (en) |
WO (1) | WO2021017378A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390392B (en) * | 2019-08-01 | 2021-02-19 | 上海安路信息科技有限公司 | Convolution parameter accelerating device based on FPGA and data reading and writing method |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5764374A (en) * | 1996-02-05 | 1998-06-09 | Hewlett-Packard Company | System and method for lossless image compression having improved sequential determination of golomb parameter |
EP1089475A1 (en) * | 1999-09-28 | 2001-04-04 | TELEFONAKTIEBOLAGET L M ERICSSON (publ) | Converter and method for converting an input packet stream containing data with plural transmission rates into an output data symbol stream |
CN100466601C (en) * | 2005-04-28 | 2009-03-04 | 华为技术有限公司 | Data read/write device and method |
CN101257313B (en) * | 2007-04-10 | 2010-05-26 | 深圳市同洲电子股份有限公司 | Deconvolution interweave machine and method realized based on FPGA |
CN104461934B (en) * | 2014-11-07 | 2017-06-30 | 北京海尔集成电路设计有限公司 | A kind of time solution convolutional interleave device and method of suitable DDR memory |
CN106940815B (en) * | 2017-02-13 | 2020-07-28 | 西安交通大学 | Programmable convolutional neural network coprocessor IP core |
US11775313B2 (en) * | 2017-05-26 | 2023-10-03 | Purdue Research Foundation | Hardware accelerator for convolutional neural networks and method of operation thereof |
CN108169727B (en) * | 2018-01-03 | 2019-12-27 | 电子科技大学 | Moving target radar scattering cross section measuring method based on FPGA |
CN108154229B (en) * | 2018-01-10 | 2022-04-08 | 西安电子科技大学 | Image processing method based on FPGA (field programmable Gate array) accelerated convolutional neural network framework |
CN109086867B (en) * | 2018-07-02 | 2021-06-08 | 武汉魅瞳科技有限公司 | Convolutional neural network acceleration system based on FPGA |
CN109032781A (en) * | 2018-07-13 | 2018-12-18 | 重庆邮电大学 | A kind of FPGA parallel system of convolutional neural networks algorithm |
CN109214281A (en) * | 2018-07-30 | 2019-01-15 | 苏州神指微电子有限公司 | A kind of CNN hardware accelerator for AI chip recognition of face |
CN109359729B (en) * | 2018-09-13 | 2022-02-22 | 深思考人工智能机器人科技(北京)有限公司 | System and method for realizing data caching on FPGA |
CN109711533B (en) * | 2018-12-20 | 2023-04-28 | 西安电子科技大学 | Convolutional neural network acceleration system based on FPGA |
CN109409509A (en) * | 2018-12-24 | 2019-03-01 | 济南浪潮高新科技投资发展有限公司 | A kind of data structure and accelerated method for the convolutional neural networks accelerator based on FPGA |
CN109784489B (en) * | 2019-01-16 | 2021-07-30 | 北京大学软件与微电子学院 | Convolutional neural network IP core based on FPGA |
CN110390392B (en) * | 2019-08-01 | 2021-02-19 | 上海安路信息科技有限公司 | Convolution parameter accelerating device based on FPGA and data reading and writing method |
-
2019
- 2019-08-01 CN CN201910708612.9A patent/CN110390392B/en active Active
- 2019-12-18 WO PCT/CN2019/126433 patent/WO2021017378A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
CN110390392A (en) | 2019-10-29 |
WO2021017378A1 (en) | 2021-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11836610B2 (en) | Concurrent training of functional subnetworks of a neural network | |
US10417555B2 (en) | Data-optimized neural network traversal | |
CN111325664B (en) | Style migration method and device, storage medium and electronic equipment | |
CN104615594B (en) | A kind of data-updating method and device | |
CN112329680A (en) | Semi-supervised remote sensing image target detection and segmentation method based on class activation graph | |
US11928580B2 (en) | Interleaving memory requests to accelerate memory accesses | |
US20210326702A1 (en) | Processing device for executing convolutional neural network computation and operation method thereof | |
DE102021126634A1 (en) | Storage device, storage system and method of operation | |
CN112734106A (en) | Method and device for predicting energy load | |
CN110390392B (en) | Convolution parameter accelerating device based on FPGA and data reading and writing method | |
CN111009034B (en) | Three-dimensional model monomer method, system, storage medium and equipment | |
CN110019784A (en) | A kind of file classification method and device | |
CN109597982A (en) | Summary texts recognition methods and device | |
TWI751931B (en) | Processing device and processing method for executing convolution neural network computation | |
US11436486B2 (en) | Neural network internal data fast access memory buffer | |
CN113641872B (en) | Hashing method, hashing device, hashing equipment and hashing medium | |
CN114758191A (en) | Image identification method and device, electronic equipment and storage medium | |
CN113052292B (en) | Convolutional neural network technique method, device and computer readable storage medium | |
Wu et al. | Hetero layer fusion based architecture design and implementation for of deep learning accelerator | |
CN112308762A (en) | Data processing method and device | |
CN112905239B (en) | Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment | |
Ali et al. | A New Merging Numerous Small Files Approach for Hadoop Distributed File System | |
CN110858121B (en) | Background operation scheduling method and device | |
CN118012631B (en) | Operator execution method, processing device, storage medium and program product | |
US20230010180A1 (en) | Parafinitary neural learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 200434 Room 202, building 5, No. 500, Memorial Road, Hongkou District, Shanghai Patentee after: Shanghai Anlu Information Technology Co.,Ltd. Address before: Floor 4, no.391-393, dongdaming Road, Hongkou District, Shanghai 200080 (centralized registration place) Patentee before: SHANGHAI ANLOGIC INFORMATION TECHNOLOGY Co.,Ltd. |