WO2019076108A1 - 卷积神经网络运算电路 - Google Patents
卷积神经网络运算电路 Download PDFInfo
- Publication number
- WO2019076108A1 WO2019076108A1 PCT/CN2018/099596 CN2018099596W WO2019076108A1 WO 2019076108 A1 WO2019076108 A1 WO 2019076108A1 CN 2018099596 W CN2018099596 W CN 2018099596W WO 2019076108 A1 WO2019076108 A1 WO 2019076108A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- convolution
- data
- image
- memory
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/955—Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- the present application relates to the field of image processing, and in particular to a convolutional neural network operation circuit.
- Convolutional Neural Network As a kind of artificial neural network, has become a research hotspot in the field of speech analysis and image recognition.
- the convolutional neural network is a multi-layered neural network. Each layer is composed of multiple two-dimensional planes, and each plane is convoluted by different convolution kernels.
- the layer after convolution is pooled (Pooling) After processing, a feature map is generated and transmitted to the lower layer network.
- Convolutional neural networks have a large amount of convolution operations, and each layer of network requires convolution operations.
- To do one recognition requires multi-layer convolution kernels and multi-plane convolutions.
- Ordinary CPUs and GPUs do such convolution operations. It takes a long time.
- convolution of different layers and different planes will occupy a huge system bandwidth, and the performance requirements of the system are very high.
- the embodiment of the present application provides a convolutional neural network operation circuit to at least solve the technical problem that the occupied system bandwidth is large due to the large convolution operation of the convolutional neural network.
- a convolutional neural network operation circuit including: an external memory for storing an image to be processed; a direct access unit connected to an external memory for reading an image to be processed, And transferring the read data to the control unit; the control unit is connected to the direct access unit for storing data to the internal memory; the internal memory is connected to the control unit for buffering the data; the arithmetic unit, and the internal memory Connection for reading data from internal memory and performing convolution pooling operations.
- the number of arithmetic units is at least two.
- the data of the nth layer is buffered into the internal memory by the convolution pool operation of the nth operation unit, and the n+1th operation is performed.
- the unit takes out the calculated data and performs a convolutional pooling operation on the n+1th layer, where n is a positive integer.
- each of the arithmetic units respectively processes the partial images of the image to be processed, and each of the arithmetic units performs the parallel convolution pooling operation using the same convolution kernel.
- each operation unit separately performs different feature extraction on the image to be processed, and each operation unit performs parallel convolution pool operation using different convolution kernels.
- the two operation units respectively extract contour information and detail information of the image to be processed.
- the operation unit includes a convolution operation unit, a pooling operation unit, a buffer unit, and a buffer control unit.
- a convolution operation unit is configured to perform convolution operation on the data, and transmit the obtained convolution result to the pooling operation unit;
- the pool operation unit is connected to the convolution operation unit, and is used for convolution result The pooling operation is performed, and the obtained pooling result is stored in the buffer unit;
- the buffer control unit is configured to store the pooling result to the internal memory through the buffer unit or to the external memory through the direct access unit.
- the external memory comprises at least one of the following: double rate synchronous dynamic random access memory, synchronous dynamic random access memory.
- the internal memory comprises a static memory array comprising a plurality of static memories, each static memory for storing different data.
- the external memory is used to store the image to be processed;
- the direct access unit reads the image to be processed in a row order, and transfers the read data to the control unit;
- the control unit stores the data to the internal memory;
- the operation unit reads data from the internal memory and performs convolution pooling, and buffers the data through the internal memory, so that the convolution operation can be performed by reading only one frame of image in the external memory without repeating
- the purpose of reading the data of one frame of image is to achieve the technical effect of effectively saving the system bandwidth, thereby solving the technical problem that the system bandwidth is large due to the large convolution operation of the convolutional neural network.
- FIG. 1 is a schematic structural diagram of an optional convolutional neural network operation circuit according to an embodiment of the present application
- FIG. 2 is a schematic structural diagram of another optional convolutional neural network operation circuit according to an embodiment of the present application.
- FIG. 3 is a schematic structural diagram of still another optional convolutional neural network operation circuit according to an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of still another optional convolutional neural network operation circuit according to an embodiment of the present application.
- FIG. 1 is a convolutional neural network operation circuit according to an embodiment of the present application. As shown in FIG. 1, the convolutional neural network operation circuit is shown in FIG. The external memory 10, the direct access unit 12, the control unit 14, the internal memory 16, and the arithmetic unit 18 are included.
- the external memory 10 is configured to store an image to be processed; the direct access unit 12 is connected to the external memory 10 for reading an image to be processed, and transmitting the read data to the control unit; the control unit 14
- the direct access unit 12 is connected for storing data to the internal memory 16; the internal memory 16 is connected to the control unit 14 for buffering data; and the arithmetic unit 18 is connected to the internal memory 16 for reading from the internal memory 16. Data and convolution pooling operations.
- the image to be processed is stored in an external memory, and is directly accessed by DMA (Direct Memory Access) (ie, The above-mentioned direct access unit 12) reads the image to be processed (for example, reads in the row order of the image to be processed), and transfers it to the SRAM CTRL (Static RAM CTRL) module, that is, the above-mentioned control unit 14,
- SRAM CTRL Static RAM CTRL
- the SRAM CTRL is also stored in the SRAM ARRAY (the internal memory 16) in the row order, assuming that the SRAM ARRAY 1 in FIG.
- the 2 is composed of three SRAMs, each of which has a storage capacity of one line of image (in terms of For example, the 1920x1080 image has a storage capacity of 1920 Bytes.
- the three SRAMs store the data of the Nth row, the N+1 row, and the N+2 row, respectively.
- the CNN unit ie, the above
- the BUFFER CTRL of the arithmetic unit 18 ie, the subsequent buffer control unit
- simultaneously reads three rows of data and stores them in a 3x3 array for convolution operation, and the convolved result is sent to the pooling operation unit for pooling operation, after pooling Knot
- the storage unit to the SRAM ARRAY through the buffer or stored via DMA into the external memory.
- the line image data of the SRAM ARRAY buffer can make the convolution operation only need to read one frame of image in the external memory, without repeatedly reading a few lines of data of one frame of image. Therefore, the system bandwidth is effectively saved; the CNN unit can calculate a convolution operation and a pool operation in one cycle, thereby greatly improving the calculation speed of the convolutional neural network.
- the external memory is used to store the image to be processed;
- the direct access unit reads the image to be processed in a row order, and transfers the read data to the control unit;
- the control unit stores the data to the internal memory;
- the operation unit reads data from the internal memory and performs convolution pooling, and buffers the data through the internal memory, so that the convolution operation can be performed by reading only one frame of image in the external memory without repeating
- the purpose of reading the data of one frame of image is to achieve the technical effect of effectively saving the system bandwidth, thereby solving the technical problem that the system bandwidth is large due to the large convolution operation of the convolutional neural network.
- the number of arithmetic units 18 is at least two.
- the convolutional neural network operation circuit of this embodiment there are at least two CNN operation units (ie, the above-mentioned operation unit 18), and the at least two CNN operation units can be cascaded or connected according to actual requirements to reduce system bandwidth and Increase the speed of calculation.
- the data of the nth layer is buffered into the internal memory by the convolution pool operation of the nth operation unit, and the n+1th operation is performed.
- the unit takes out the calculated data and performs a convolutional pooling operation on the n+1th layer, where n is a positive integer.
- each CNN unit In convolutional neural networks, multi-level cascaded neuron structures are often used. When two or more layers of network structures are used, the cascaded structure of each CNN unit can effectively reduce system bandwidth and increase computation speed. . If there is only one CNN unit, when the first layer of image is convolved, the image needs to be read from the external memory, stored in the SRAM ARRAY, and then convolved and stored in the external memory. When the second layer is convolved, the data processed by the first layer needs to be read from the external memory, and the convolution is pooled and then stored in the external memory.
- the system bandwidth consumed by the two-layer convolution operation is 1920x1080x4 (two reads and two writes) of 8MB.
- the first layer of image data is externally DMA.
- the solid arrow is first stored in the SRAM ARRAY 1, and then enters the CNN operation unit 1 for calculation.
- the processed image is not stored in the external memory, but is stored in the SRAM ARRAY 2 along the dotted arrow, also after the buffer operation. It is sent to the CNN operation unit 2 for convolution pool processing, and the second layer data is stored back to the external memory after processing.
- the system bandwidth of this structure is 1920x1080x2 (one read and one write) 4MB, half the bandwidth is reduced, and two CNN operations are performed.
- the unit can work at the same time.
- the time for processing two layers of data is equal to the time that one CNN unit handles one layer, and the calculation speed is doubled.
- each operation unit separately processes a partial image of the image to be processed, and each operation unit performs the parallel convolution pool operation using the same convolution kernel.
- the parallel structure of two CNN computing units in this embodiment can process the same frame image in parallel to improve the calculation speed, and divide one frame image into two parts.
- the upper half image is stored in the SRAM ARRAY 1 along the solid arrow by DMA, and then the convolution operation of the CNN operation unit 1 is performed, and the processing structure is stored back to the external memory while the lower half image is along the dotted arrow.
- the DMA is stored in the SRAM ARRAY 2, and the convolution operation is performed by the CNN operation unit 2, and the processing result is stored back to the external memory.
- the parallel structure of the two CNN operation units can double the calculation speed.
- each operation unit separately performs different feature extraction on the image to be processed, and each operation unit performs parallel convolution pool operation using different convolution kernels.
- CNN operation unit 1 adopts a convolution kernel coefficient
- CNN operation unit 2 uses another convolution kernel coefficient
- one frame image is read into the SRAM through DMA.
- ARRAY1 it is simultaneously sent to CNN operation unit 1 and CNN operation unit 2.
- Both convolution operations are performed simultaneously, and the two frames of processed images are stored back to the external memory.
- the bandwidth of this structure is 1920x1080x6 (one read and two writes) 6MB. Compared with a CNN unit, the system bandwidth is reduced by 25% and the calculation speed is doubled.
- the two operation units respectively extract contour information and detail information of the image to be processed.
- one operation unit extracts contour information of the image to be processed by the same type of two-dimensional sampling, and the other extracts detailed information of the image to be processed by the same type of two-dimensional sampling.
- Similar two-dimensional sampling Generally, the image of different resolutions contains different details or contour information. For large-sized images (ie, large-resolution images), there are more detailed information, while for small-sized images (ie, small) Resolution image) General outline information is more comprehensive, such as leaves, large resolution images are generally clearer for the leaf details of the leaves, while small resolution images contain more information about the outline of the leaves. Images for different resolutions can be stored by sampling the image details to generate a two-dimensional function f(x, y), where x, y represent the image position and f(x, y) represents the detail information.
- the operation unit 18 includes a convolution operation unit, a pooling operation unit, a buffer unit, and a buffer control unit.
- a convolution operation unit is configured to perform convolution operation on the data, and transmit the obtained convolution result to the pooling operation unit;
- the pool operation unit is connected to the convolution operation unit, and is used for convolution result The pooling operation is performed, and the obtained pooling result is stored in the buffer unit;
- the buffer control unit is configured to store the pooling result to the internal memory through the buffer unit or to the external memory through the direct access unit.
- the external memory comprises at least one of the following: double rate synchronous dynamic random access memory, synchronous dynamic random access memory.
- the external memory is composed of SDRAM (Synchronous Dynamic Random Access Memory) or DDR (Double Data Rata SDRAM), and has a large storage capacity for storing one frame or several frames of images.
- SDRAM Serial Dynamic Random Access Memory
- DDR Double Data Rata SDRAM
- the internal memory comprises a static memory array comprising a plurality of static memories, each static memory for storing different data.
- SRAM ARRAY static memory array
- SRAM ARRAY static memory array
- the convolutional neural network operation circuit includes SRAM ARRAY (SRAM array), SRAMCRTL (SRAM control logic), CNN operation unit, DMA and external memory (DDR/SDRAM); CNN operation unit is composed of convolution operation unit
- SRAM ARRAY SRAM array
- SRAMCRTL SRAM control logic
- CNN operation unit is composed of convolution operation unit
- the pooling arithmetic unit, the output buffer unit, and the BUFFER CTRL (buffer controller) are composed of four modules; taking the CNN operation unit as two examples, when the two CNN operation units adopt the cascade structure, the data of the first layer passes through A CNN unit is processed and buffered into SRAM (static memory), taken out by the second CNN unit and subjected to the second layer of convolution pooling, and finally stored back to the external memory (DDR/SDRAM).
- SRAM static memory
- the system bandwidth is reduced by half, and the calculation speed is doubled.
- the two CNN units respectively process the upper and lower parts of the same image.
- the convolution kernel is the same, parallel operation, which doubles the calculation speed compared to the system architecture of a CNN unit; when two CNN units adopt a parallel structure, two CNNs are shipped. Units using different convolution kernels, parallel computing, on the same frame image different feature extraction, 25% of the system bandwidth, double speed up calculations.
- the disclosed technical contents may be implemented in other manners.
- the device embodiments described above are only schematic.
- the division of the unit may be a logical function division.
- there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
- the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Neurology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Complex Calculations (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种卷积神经网络运算电路,包括:外部存储器,用于存储待处理图像;直接存取单元,与所述外部存储器连接,用于读取所述待处理图像,并将读取到的数据传输至控制单元;控制单元,与所述直接存取单元连接,用于将所述数据存储至内部存储器;内部存储器,与所述控制单元连接,用于缓存所述数据;运算单元,与所述内部存储器连接,用于从所述内部存储器读取所述数据并进行卷积池化运算。
- 根据权利要求1所述的电路,其中,所述运算单元的数量至少为两个。
- 根据权利要求2所述的电路,其中,在各个运算单元之间采用级联结构连接的情况下,第n层的数据经过第n个运算单元的卷积池化运算后缓存到所述内部存储器中,由第n+1个运算单元将运算后的数据取出并进行第n+1层的卷积池化运算,其中,n为正整数。
- 根据权利要求2所述的电路,其中,在各个运算单元之间采用并联结构连接的情况下,各个运算单元分别处理所述待处理图像的部分图像,各个运算单元采用相同的卷积核进行并行卷积池化运算。
- 根据权利要求2所述的电路,其中,在各个运算单元之间采用并联结构连接的情况下,各个运算单元分别对所述待处理图像进行不同的特征提取,各个运算单元采用不同的卷积核进行并行卷积池化运算。
- 根据权利要求2所述的电路,其中,在所述运算单元的数量为两个的情况下,两个运算单元分别提取所述待处理图像的轮廓信息和细节信息。
- 根据权利要求1至6中任一项所述的电路,其中,所述运算单元包括卷积运算单元、池化运算单元、缓冲单元及缓冲控制单元。
- 根据权利要求7所述的电路,其中,所述卷积运算单元,用于对所述数据进行卷积运算,并将得到的卷积结果传输至所述池化运算单元;所述池化运算单元,与所述卷积运算单元连接,用于对所述卷积结果进行池化运算,并将得到的池化结果存储至所述缓冲单元;所述缓冲控制单元,用于将所述池化结果通过所述缓冲单元存储至所述内部存储器或通过所述直接存取单元存储至所述外部存储器。
- 根据权利要求1所述的电路,其中,所述外部存储器包括以下至少之一:双倍速率同步动态随机存储器、同步动态随机存储器。
- 根据权利要求1所述的电路,其中,所述内部存储器包括静态存储器阵列,所述静态存储器阵列包括多个静态存储器,每个静态存储器用于存储不同的数据。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/627,674 US20210158068A1 (en) | 2017-10-19 | 2018-08-09 | Operation Circuit of Convolutional Neural Network |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710983547.1A CN107704923A (zh) | 2017-10-19 | 2017-10-19 | 卷积神经网络运算电路 |
CN201710983547.1 | 2017-10-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019076108A1 true WO2019076108A1 (zh) | 2019-04-25 |
Family
ID=61182655
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/099596 WO2019076108A1 (zh) | 2017-10-19 | 2018-08-09 | 卷积神经网络运算电路 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20210158068A1 (zh) |
CN (1) | CN107704923A (zh) |
WO (1) | WO2019076108A1 (zh) |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704923A (zh) * | 2017-10-19 | 2018-02-16 | 珠海格力电器股份有限公司 | 卷积神经网络运算电路 |
CN110321999B (zh) * | 2018-03-30 | 2021-10-01 | 赛灵思电子科技(北京)有限公司 | 神经网络计算图优化方法 |
CN110321064A (zh) * | 2018-03-30 | 2019-10-11 | 北京深鉴智能科技有限公司 | 用于神经网络的计算平台实现方法及系统 |
CN108537329B (zh) * | 2018-04-18 | 2021-03-23 | 中国科学院计算技术研究所 | 一种利用Volume R-CNN神经网络进行运算的方法和装置 |
CN110399977A (zh) * | 2018-04-25 | 2019-11-01 | 华为技术有限公司 | 池化运算装置 |
CN108958938B (zh) * | 2018-06-29 | 2020-01-14 | 百度在线网络技术(北京)有限公司 | 数据处理方法、装置及设备 |
US10705967B2 (en) * | 2018-10-15 | 2020-07-07 | Intel Corporation | Programmable interface to in-memory cache processor |
DE102020100209A1 (de) * | 2019-01-21 | 2020-07-23 | Samsung Electronics Co., Ltd. | Neuronale Netzwerkvorrichtung, neuronales Netzwerksystem und Verfahren zur Verarbeitung eines neuronalen Netzwerkmodells durch Verwenden eines neuronalen Netzwerksystems |
CN110009103B (zh) * | 2019-03-26 | 2021-06-29 | 深兰科技(上海)有限公司 | 一种深度学习卷积计算的方法和装置 |
JP7278150B2 (ja) * | 2019-05-23 | 2023-05-19 | キヤノン株式会社 | 画像処理装置、撮像装置、画像処理方法 |
CN110276444B (zh) * | 2019-06-04 | 2021-05-07 | 北京清微智能科技有限公司 | 基于卷积神经网络的图像处理方法及装置 |
CN110674934B (zh) * | 2019-08-26 | 2023-05-09 | 陈小柏 | 一种神经网络池化层及其运算方法 |
CN110688616B (zh) * | 2019-08-26 | 2023-10-20 | 陈小柏 | 一种基于乒乓ram的条带阵列的卷积模块及其运算方法 |
CN112784973A (zh) * | 2019-11-04 | 2021-05-11 | 北京希姆计算科技有限公司 | 卷积运算电路、装置以及方法 |
CN112470138A (zh) * | 2019-11-29 | 2021-03-09 | 深圳市大疆创新科技有限公司 | 计算装置、方法、处理器和可移动设备 |
CN111752879B (zh) * | 2020-06-22 | 2022-02-22 | 深圳鲲云信息科技有限公司 | 一种基于卷积神经网络的加速系统、方法及存储介质 |
CN111984189B (zh) * | 2020-07-22 | 2022-05-17 | 深圳云天励飞技术股份有限公司 | 神经网络计算装置和数据读取、数据存储方法及相关设备 |
CN113742266B (zh) * | 2021-09-10 | 2024-02-06 | 中科寒武纪科技股份有限公司 | 集成电路装置、电子设备、板卡和计算方法 |
CN113570612B (zh) * | 2021-09-23 | 2021-12-17 | 苏州浪潮智能科技有限公司 | 一种图像处理方法、装置及设备 |
CN115456149B (zh) * | 2022-10-08 | 2023-07-25 | 鹏城实验室 | 脉冲神经网络加速器学习方法、装置、终端及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160063359A1 (en) * | 2014-08-29 | 2016-03-03 | Google Inc. | Processing images using deep neural networks |
WO2016099780A1 (en) * | 2014-12-19 | 2016-06-23 | Intel Corporation | Storage device and method for performing convolution operations |
CN107169563A (zh) * | 2017-05-08 | 2017-09-15 | 中国科学院计算技术研究所 | 应用于二值权重卷积网络的处理系统及方法 |
CN107239824A (zh) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | 用于实现稀疏卷积神经网络加速器的装置和方法 |
CN107704923A (zh) * | 2017-10-19 | 2018-02-16 | 珠海格力电器股份有限公司 | 卷积神经网络运算电路 |
CN207352655U (zh) * | 2017-10-19 | 2018-05-11 | 珠海格力电器股份有限公司 | 卷积神经网络运算电路 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5428567A (en) * | 1994-05-09 | 1995-06-27 | International Business Machines Corporation | Memory structure to minimize rounding/trunction errors for n-dimensional image transformation |
CN106203619B (zh) * | 2015-05-29 | 2022-09-13 | 三星电子株式会社 | 数据优化的神经网络遍历 |
KR101788829B1 (ko) * | 2015-08-24 | 2017-10-20 | (주)뉴로컴즈 | 콘볼루션 신경망 컴퓨팅 장치 |
-
2017
- 2017-10-19 CN CN201710983547.1A patent/CN107704923A/zh active Pending
-
2018
- 2018-08-09 WO PCT/CN2018/099596 patent/WO2019076108A1/zh active Application Filing
- 2018-08-09 US US16/627,674 patent/US20210158068A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160063359A1 (en) * | 2014-08-29 | 2016-03-03 | Google Inc. | Processing images using deep neural networks |
WO2016099780A1 (en) * | 2014-12-19 | 2016-06-23 | Intel Corporation | Storage device and method for performing convolution operations |
CN107239824A (zh) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | 用于实现稀疏卷积神经网络加速器的装置和方法 |
CN107169563A (zh) * | 2017-05-08 | 2017-09-15 | 中国科学院计算技术研究所 | 应用于二值权重卷积网络的处理系统及方法 |
CN107704923A (zh) * | 2017-10-19 | 2018-02-16 | 珠海格力电器股份有限公司 | 卷积神经网络运算电路 |
CN207352655U (zh) * | 2017-10-19 | 2018-05-11 | 珠海格力电器股份有限公司 | 卷积神经网络运算电路 |
Also Published As
Publication number | Publication date |
---|---|
US20210158068A1 (en) | 2021-05-27 |
CN107704923A (zh) | 2018-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019076108A1 (zh) | 卷积神经网络运算电路 | |
JP6857286B2 (ja) | ニューラルネットワークアレイの性能の改善 | |
CN108427990B (zh) | 神经网络计算系统和方法 | |
US10943167B1 (en) | Restructuring a multi-dimensional array | |
US10936937B2 (en) | Convolution operation device and convolution operation method | |
KR102499396B1 (ko) | 뉴럴 네트워크 장치 및 뉴럴 네트워크 장치의 동작 방법 | |
EP3664093B1 (en) | Semiconductor memory device employing processing in memory (pim) and method of operating the semiconductor memory device | |
US20210224125A1 (en) | Operation Accelerator, Processing Method, and Related Device | |
CN111897579B (zh) | 图像数据处理方法、装置、计算机设备和存储介质 | |
WO2018205708A1 (zh) | 应用于二值权重卷积网络的处理系统及方法 | |
US9411726B2 (en) | Low power computation architecture | |
US20180189643A1 (en) | Convolution circuit, application processor including the same, and operating method thereof | |
WO2022037257A1 (zh) | 卷积计算引擎、人工智能芯片以及数据处理方法 | |
US20190197656A1 (en) | Processor, information processing apparatus, and operation method of processor | |
CN111783967B (zh) | 一种适用于专用神经网络加速器的数据双层缓存方法 | |
CN110688616B (zh) | 一种基于乒乓ram的条带阵列的卷积模块及其运算方法 | |
WO2022007265A1 (zh) | 一种膨胀卷积加速计算方法及装置 | |
WO2021143569A1 (zh) | 一种基于fpga的稠密光流计算系统及方法 | |
CN112703511B (zh) | 运算加速器和数据处理方法 | |
CN110766127A (zh) | 神经网络计算专用电路及其相关计算平台与实现方法 | |
WO2020046643A1 (en) | Method and system for performing parallel computation | |
JP7492555B2 (ja) | 複数の入力データセットのための処理 | |
CN111028136B (zh) | 一种人工智能处理器处理二维复数矩阵的方法和设备 | |
CN109416743B (zh) | 一种用于识别人为动作的三维卷积装置 | |
Langemeyer et al. | Using SDRAMs for two-dimensional accesses of long 2 n× 2 m-point FFTs and transposing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18867637 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18867637 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.10.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18867637 Country of ref document: EP Kind code of ref document: A1 |