WO2022027818A1 - Data batch processing method and batch processing apparatus thereof, and storage medium - Google Patents
Data batch processing method and batch processing apparatus thereof, and storage medium Download PDFInfo
- Publication number
- WO2022027818A1 WO2022027818A1 PCT/CN2020/120177 CN2020120177W WO2022027818A1 WO 2022027818 A1 WO2022027818 A1 WO 2022027818A1 CN 2020120177 W CN2020120177 W CN 2020120177W WO 2022027818 A1 WO2022027818 A1 WO 2022027818A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- batch processing
- strips
- reorganized
- channel data
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 37
- 238000013528 artificial neural network Methods 0.000 claims description 29
- 238000000034 method Methods 0.000 claims description 19
- 230000008521 reorganization Effects 0.000 claims description 11
- 239000000872 buffer Substances 0.000 description 10
- 230000008569 process Effects 0.000 description 6
- 230000006798 recombination Effects 0.000 description 6
- 238000005215 recombination Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/781—On-chip cache; Off-chip memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7817—Specially adapted for signal processing, e.g. Harvard architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention belongs to the technical field of data processing, and in particular, relates to a data batch processing method for neural networks, a batch processing device thereof, and a computer-readable storage medium.
- neural network With the promotion of big data and artificial intelligence technologies, deep learning algorithms based on artificial neural networks have achieved remarkable results in the fields of computer vision, natural language processing, and autonomous decision-making by agents, relying on their powerful feature extraction capabilities.
- the structure of neural network is becoming more and more complex, which is accompanied by a sharp increase in the amount of parameters and calculation, which has higher requirements on the data bandwidth and computing power of the hardware platform.
- the technical problem solved by the present invention is: how to reduce the number of times of reading data from the memory.
- a data batch processing method for neural networks comprising:
- the original channel data of described N continuous frame images is spliced, and multiple reorganization data strips are formed, wherein each part of the reorganized data strip includes the original channel data of described N continuous frame images on the same pixel position;
- a plurality of reconstituted data strips are sequentially input to the parallel computing unit array for convolution operation, wherein all the original channel data of the same reconstituted data strip enter the computing unit at the same time.
- each piece of the restructured data strip further includes zero-padding data, and the data bit width of each piece of the restructured data strip is equal to the memory bandwidth.
- the data batch processing method further includes:
- the plurality of reconstituted data strips are stored in the memory.
- the method for sequentially inputting multiple pieces of recombined data strips into the parallel computing unit array for convolution operation includes:
- the memory bandwidth is 128 bits, N is 5, and the original channel data on each pixel position of each consecutive frame image includes red channel data, green channel data and blue channel data.
- the present application also discloses a data batch processing device for a neural network, the data batch processing device comprising:
- a data acquisition module for acquiring memory bandwidth and selecting the original channel data of N consecutive frame images according to the memory bandwidth
- a data reorganization module for splicing the original channel data of the N continuous frame images to form multiple reorganized data strips, wherein each reorganized data strip includes the original channels of the N continuous frame images at the same pixel position data;
- the convolution calculation module is used to read multiple reorganized data strips and perform convolution operation on the multiple reorganized data strips in sequence, wherein all the original channel data of the same reorganized data strip are read by the convolution calculation module at the same time Pick.
- the data batch processing device further includes a memory, and the memory is used for receiving and storing a plurality of reorganized data strips formed by the data reorganization module.
- the convolution calculation module includes:
- a multiplier-adder unit used for multiplying and adding the original channel data of each continuous frame image in each of the recombined data strips with the same weight data respectively;
- the storage unit is used for storing the result of the multiplication and addition operation of each successive frame image in each piece of the recombined data strip.
- the present invention also discloses a computer-readable storage medium, where the computer-readable storage medium stores a data batch processing program for neural networks, and when the data batch processing program for neural networks is executed by a processor, the above-mentioned Data batching methods for neural networks.
- the invention discloses a data batch processing method for neural network, which has the following technical effects compared with the traditional calculation method:
- the optimized data structure for three-dimensional arrays can realize fast data buffering and avoid repeated weight reading between different frame images, thereby greatly reducing the number of off-chip memory accesses;
- FIG. 1 is a flowchart of a data batch processing method for a neural network according to Embodiment 1 of the present invention
- Fig. 2 is the flow chart of the convolution calculation of Embodiment 1 of the present invention.
- FIG. 3 is a schematic diagram of a data splicing process according to Embodiment 1 of the present invention.
- FIG. 4 is a schematic diagram of a data batch processing apparatus according to Embodiment 2 of the present invention.
- FIG. 5 is a schematic diagram of a parallel computing unit array according to Embodiment 2 of the present invention.
- FIG. 6 is a schematic diagram of a computer device according to an embodiment of the present invention.
- the convolution calculation is performed on each frame of pictures in turn, and it is necessary to repeatedly read the weight data and read the image data multiple times, It will cause a waste of computing resources.
- the weight data corresponding to the same pixel position is the same in the convolution calculation process, and the image channel data of the adjacent multi-frame pictures at the same pixel position is rewritten, and the reorganized data is in the same pixel position. It is input into the computing unit at all times, and the convolution operation is performed with the same weight data, which can reduce the number of readings of the weight data and image data, and greatly reduce the computing energy consumption.
- the data batch processing method for a neural network in the first embodiment includes the following steps:
- Step S10 obtaining the memory bandwidth and selecting the original channel data of N consecutive frame images according to the memory bandwidth;
- Step S20 splicing the original channel data of the N continuous frame images to form multiple reorganized data strips, wherein each reorganized data strip includes the original channel data of the N continuous frame images at the same pixel position;
- Step S30 Inputting multiple pieces of recombined data strips into the parallel computing unit array in sequence for convolution operation, wherein all the original channel data of the same piece of recombined data strips enter the computing unit at the same time.
- step S10 taking the memory bandwidth equal to 128 bits as an example, in the prior art, when reading data from the memory, the original channel data of one pixel point is read only each time, including the red channel data R and the green channel data G And blue channel data B, each color channel data occupies 8 bits, a total of 24 bits, so only 24 bits of data are read each time, wasting memory bandwidth.
- the original channel data of N consecutive frame images are selected and spliced, so that each time the data is read from the memory, it can be Read more channel data and improve the efficiency of memory bandwidth usage.
- step S20 Splicing the original channel data of 5 consecutive frame images at the same pixel position to form repeated data strips. For example, splicing the original channel data of 5 consecutive frame images at the first pixel point to form a data bit width of 120 bits. Reorganize the data bar. As a preferred embodiment, zero-fill processing is performed on the formed reconstituted data bar, so that the data bit width of the reconstituted data bar is equal to the memory bandwidth. A reconstituted strip of bits.
- the block memory (block memory) of Xilinx company model 128-32 is used to read the original channel data of each image, but the block memory can only read four color channel data each time, namely 32-bit data, but really need to use 24-bit data, so the data after the block memory read needs to continue to be reorganized.
- 0 G 0 B 0 is spliced and zero-filled to form a reorganized data bar of the first pixel, that is, R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 0, and set the second register to store the reorganized data bar, so as to complete the reorganization of the original channel data of the first pixel.
- the B 2 read for the third time and the R 2 G 2 stored in the first register are spliced to form the recombination of the third pixel point Data bar, namely R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 0, and R 3 of each image read for the third time G 3 B 3 is spliced to form the recombined data bar of the fourth pixel, namely R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 0 , so that after three readings, the recombination of the original channel data of the four pixel points can be completed to form four repeated data strips.
- a 32*64 parallel computing unit array is taken as an example, including 32*64 computing units, 64 data cache TBs and 32 weight caches WB, wherein each data cache TB stores multiple pieces of reorganized data The weight data stored in each weight cache WB is shared by 64 data caches.
- each data buffer stores the recombined data strips of four adjacent pixels, that is, R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 0, R 1 G 1 B 1 R 1 G 1 B 1 R 1 G 1 B 1 R 1 G 1 B 1 0, R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 0 and R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 G 3 B 3 G 3 B 3 0.
- writing the same reconstituted data bar into the computing unit at the same time can improve the efficiency of memory bandwidth utilization on the one hand, and reduce the number of memory reads on the other hand.
- a method for sequentially inputting multiple pieces of recombined data strips into a parallel computing unit array for convolution operation includes:
- Step S31 Multiply and add the original channel data of each continuous frame image in each of the recombined data strips with the same weight data respectively;
- Step S32 Store the result of the multiplication and addition operation of each successive frame image in each piece of the recombined data strip into different registers.
- each piece of reconstructed data bar includes the original channel data of 5 pixels of the 5 consecutive frame images, wherein 5 third registers are set, respectively used to store 5 consecutive frames.
- the multiplication and addition operation result is Stored in the corresponding third register, and so on, each calculation result is stored in a different third register.
- the sharing of weight data can be realized, and there is no need to repeatedly read the weight data. Reading into the computing unit also avoids repeated reading of image data and reduces the number of memory accesses.
- the apparatus for batch processing data for neural networks includes a data acquisition module 100, a data reorganization module 200 and a convolution calculation module 300.
- the data acquisition module 100 is used for acquiring memory bandwidth and according to the The memory bandwidth selects the original channel data of N continuous frame images;
- the data reorganization module 200 is used for splicing the original channel data of the N continuous frame images to form multiple reorganized data strips, wherein each reorganized data strip includes the described The original channel data of N consecutive frame images at the same pixel position;
- the convolution calculation module 300 is used to read multiple pieces of reconstructed data strips and perform convolution operation on the multiple pieces of reconstructed data strips in sequence, wherein the same piece of reconstructed data strips All raw channel data are read by the convolution calculation module at the same time.
- the data batch processing apparatus further includes a memory 400, and the memory 400 is used for receiving and storing a plurality of reorganized data strips formed by the data reorganization module 200.
- the data acquisition module 100 includes a plurality of buffers, and the plurality of buffers are used to read and temporarily store the original channel data of the corresponding image from the memory module 400 according to the data of the memory bandwidth.
- the memory bandwidth equal to 128 bits and N equal to 5 as an example
- 5 different buffers are used to read and store the original channel data of 5 consecutive frame images from the memory, and the color channel data are arranged in sequence according to the pixel position, that is R 0 G 0 B 0 R 1 G 1 B 1 R 2 G 2 B 2 R 3 G 3 B 3 .
- the data reorganization module 200 includes a block memory, a first register, a second register and a counter.
- the block memory adopts the Block Memory of Xilinx Company, whose model is 128-32.
- the block memory read can only read four color channel data each time, that is, 32-bit data, and the real need is 24-bit data, so the data after the block memory read needs to continue to be reorganized.
- the block memory reads the data from each buffer respectively as R 0 G 0 B 0 R 1 . At this time, R 1 is stored in the first register .
- 0 B 0 is spliced and zero-filled to form a reorganized data bar of the first pixel, that is, R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 R 0 G 0 B 0 0, and store it in the second register, while setting the value of the counter to 0.
- the data read from the respective buffers by the block memory is G 1 B 1 R 2 G 2 .
- R 2 G 2 is stored in the first register, and the The pre-stored R 1 in the first register and the G 1 B 1 read for the second time are spliced and zero-filled to form the reorganized data bar of the second pixel, that is, R 1 G 1 B 1 R 1 G 1 B 1 --R 1 G 1 B 1 R 1 G 1 B 1 R 1 G 1 B 1 R 1 G 1 B 1 0, and store it in the second register, while setting the value of the counter to 1.
- the data read from the respective buffers by the block memory is B 2 R 3 G 3 B 3 .
- B 2 is spliced and zero-filled to form a reorganized data bar of the third pixel, namely R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 R 2 G 2 B 2 0 and store it in the second register, while setting the value of the counter to 2.
- splicing the R 3 G 3 B 3 read for the third time, and performing zero-fill processing to form a reorganized data bar of the fourth pixel point, that is, R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 R 3 G 3 B 3 0, and store it in the second register, while setting the value of the counter to 3.
- the recombination of the original channel data of the four pixel points can be completed to form four duplicate data strips.
- the above steps are repeated until the recombination of the original channel data of all the pixels of the 5 consecutive frame images is completed, and all the recombined data strips obtained are stored in the memory for use in subsequent calculations.
- each data cache TB contains There are multiple pieces of reorganized data stored, and the weight data stored in each weight buffer WB is shared by 64 data buffers, and the convolution calculation module is the calculation unit PE.
- the convolution calculation module includes a multiplier-adder unit and a storage unit, and the multiplier-adder unit is used for multiplying and adding the original channel data of each continuous frame image in each of the recombined data strips with the same weight data respectively,
- the storage unit is used to store the result of the multiplication and addition operation of each successive frame image in each piece of the reconstructed data strip.
- the multiplier-adder unit includes a multiplier and an adder
- the storage unit includes a data selector 301 , a data distributor 302 and five third registers 303 .
- the multiplier Calculate W 00 *R 0 and use the data selector 301 to read the data from the corresponding third register 303
- the adder After calculation, since the initial value of the third register is zero, the calculation result of the adder is W 00 *R 0 , and then the calculation result W 00 *R 0 is stored in the third register 303 through the data distributor 302 .
- the present application also discloses a computer-readable storage medium, where the computer-readable storage medium stores a data batch processing program for neural networks, and when the data batch processing program for neural networks is executed by a processor, the above-mentioned Data batching methods for neural networks.
- the present application also discloses a computer device.
- the terminal includes a processor 12 , an internal bus 13 , a network interface 14 , and a computer-readable storage medium 11 .
- the processor 12 reads the corresponding computer program from the computer-readable storage medium and then executes it, forming a request processing device on a logical level.
- the computer-readable storage medium 11 stores a data batch program for a neural network, and when the data batch program for a neural network is executed by a processor, implements the above-mentioned data batch method for a neural network.
- Computer-readable storage media includes both persistent and non-permanent, removable and non-removable media, and storage of information can be implemented by any method or technology.
- Information may be computer readable instructions, data structures, modules of programs, or other data.
- Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage , magnetic cassettes, disk storage, quantum memory, graphene-based storage media or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by computing devices.
- PRAM phase-change memory
- SRAM static random access memory
- DRAM dynamic random access memory
- RAM random access memory
- ROM read-only memory Memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Neurology (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (13)
- 一种用于神经网络的数据批处理方法,其中,所述数据批处理方法包括:A data batch processing method for a neural network, wherein the data batch processing method comprises:获取内存带宽并根据所述内存带宽选取N张连续帧图像的原始通道数据;Obtain memory bandwidth and select the original channel data of N continuous frame images according to the memory bandwidth;将所述N张连续帧图像的原始通道数据进行拼接,形成多份重组数据条,其中每份重组数据条包括所述N张连续帧图像在同一像素位置上的原始通道数据;The original channel data of the N continuous frame images are spliced to form a plurality of reorganized data strips, wherein each reorganized data strip includes the original channel data of the N continuous frame images at the same pixel position;将多份重组数据条依序输入至并行计算单元阵列进行卷积运算,其中同一份重组数据条的全部原始通道数据在同一时刻进入至计算单元。A plurality of reconstituted data strips are sequentially input to the parallel computing unit array for convolution operation, wherein all the original channel data of the same reconstituted data strip enter the computing unit at the same time.
- 根据权利要求1所述的用于神经网络的数据批处理方法,其中,每份所述重组数据条还包括补零数据,且每份所述重组数据条的数据位宽等于所述内存带宽。The data batch processing method for neural networks according to claim 1, wherein each piece of the restructured data strip further includes zero-padding data, and the data bit width of each piece of the restructured data strip is equal to the memory bandwidth.
- 根据权利要求2所述的用于神经网络的数据批处理方法,其中,所述数据批处理方法还包括:The data batch processing method for neural networks according to claim 2, wherein the data batch processing method further comprises:将所述多份重组数据条存储至内存中。The plurality of reconstituted data strips are stored in the memory.
- 根据权利要求1所述的用于神经网络的数据批处理方法,其中,将多份重组数据条依序输入至并行计算单元阵列进行卷积运算的方法包括:The data batch processing method for neural networks according to claim 1, wherein the method for sequentially inputting multiple pieces of recombined data strips into a parallel computing unit array for convolution operation comprises:将每份所述重组数据条中的各张连续帧图像的原始通道数据分别与同一权重数据进行乘加运算;Multiply and add the original channel data of each continuous frame image in each of the reorganized data strips with the same weight data respectively;将每份所述重组数据条中的各张连续帧图像的乘加运算的结果存储至不同的寄存器中。The results of the multiplication and addition operations of the successive frame images in each of the recombined data strips are stored in different registers.
- 根据权利要求1所述的用于神经网络的数据批处理方法,其中,所述内存带宽为128比特,N为5,每张所述连续帧图像的每个像素位置上的原始通道数据包括红色通道数据、绿色通道数据和蓝色通道数据。The data batch processing method for a neural network according to claim 1, wherein the memory bandwidth is 128 bits, N is 5, and the original channel data on each pixel position of each of the continuous frame images includes red Channel data, green channel data, and blue channel data.
- 一种用于神经网络的数据批处理装置,其中,所述数据批处理装置包括:A data batch processing device for a neural network, wherein the data batch processing device comprises:数据获取模块,用于获取内存带宽并根据所述内存带宽选取N张连续帧图像的原始通道数据;a data acquisition module for acquiring memory bandwidth and selecting the original channel data of N consecutive frame images according to the memory bandwidth;数据重组模块,用于将所述N张连续帧图像的原始通道数据进行拼接,形成多份重组数据条,其中每份重组数据条包括所述N张连续帧图像在同一像素位置上的原始通道数据;A data reorganization module for splicing the original channel data of the N continuous frame images to form multiple reorganized data strips, wherein each reorganized data strip includes the original channels of the N continuous frame images at the same pixel position data;卷积计算模块,用于读取多份重组数据条并依序对多份重组数据条进行卷积运算,其中同一份重组数据条的全部原始通道数据在同一时刻被所述卷积计算模块读取。The convolution calculation module is used to read multiple reorganized data strips and perform convolution operation on the multiple reorganized data strips in sequence, wherein all the original channel data of the same reorganized data strip are read by the convolution calculation module at the same time Pick.
- 根据权利要求6所述的用于神经网络的数据批处理装置,其中,所述数据批处理装置还包括内存,所述内存用于接收并存储所述数据重组模块形成的多份重组数据条。The data batch processing device for neural networks according to claim 6, wherein the data batch processing device further comprises a memory, and the memory is used for receiving and storing a plurality of reorganized data strips formed by the data reorganization module.
- 根据权利要求6所述的用于神经网络的数据批处理装置,其中,所述卷积计算模块包括:The data batch processing apparatus for neural networks according to claim 6, wherein the convolution calculation module comprises:乘加器单元,用于将每份所述重组数据条中的各张连续帧图像的原始通道数据分别与同一权重数据进行乘加运算;A multiplier-adder unit, used for multiplying and adding the original channel data of each continuous frame image in each of the recombined data strips with the same weight data respectively;存储单元,用于存储每份所述重组数据条中的各张连续帧图像的乘加运算的结果。The storage unit is used for storing the result of the multiplication and addition operation of each successive frame image in each piece of the recombined data strip.
- 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有用于神经网络的数据批处理程序,所述用于神经网络的数据批处理程序被处理器执行时实现权利要求1所述的用于神经网络的数据批处理方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a data batch processing program for a neural network, and the data batch processing program for a neural network implements the method of claim 1 when executed by a processor Data batching methods for neural networks.
- 根据权利要求9所述的计算机可读存储介质,其中,每份所述重组数据条还包括补零数据,且每份所述重组数据条的数据位宽等于所述内存带宽。The computer-readable storage medium of claim 9, wherein each of the reconstituted data stripes further includes zero-padded data, and a data bit width of each of the reconstituted data strips is equal to the memory bandwidth.
- 根据权利要求10所述的计算机可读存储介质,其中,所述数据批处理方法还包括:The computer-readable storage medium of claim 10, wherein the data batch processing method further comprises:将所述多份重组数据条存储至内存中。The plurality of reconstituted data strips are stored in the memory.将多份重组数据条依序输入至并行计算单元阵列进行卷积运算的方法包括:The method for sequentially inputting multiple pieces of recombined data strips into the parallel computing unit array for convolution operation includes:
- 根据权利要求9所述的计算机可读存储介质,其中,将每份所述重组数据条中的各张连续帧图像的原始通道数据分别与同一权重数据进行乘加运算;The computer-readable storage medium according to claim 9, wherein the original channel data of each continuous frame image in each of the recombined data strips is respectively multiplied and added with the same weight data;将每份所述重组数据条中的各张连续帧图像的乘加运算的结果存储至不同的寄存器中。The results of the multiplication and addition operations of the successive frame images in each of the recombined data strips are stored in different registers.
- 根据权利要求9所述的计算机可读存储介质,其中,所述内存带宽为128比特,N为5,每张所述连续帧图像的每个像素位置上的原始通道数据包括红色通道数据、绿色通道数据和蓝色通道数据。The computer-readable storage medium according to claim 9, wherein the memory bandwidth is 128 bits, N is 5, and the original channel data on each pixel position of each successive frame image comprises red channel data, green Channel data and blue channel data.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010791617.5 | 2020-08-07 | ||
CN202010791617.5A CN114065905A (en) | 2020-08-07 | 2020-08-07 | Data batch processing method and batch processing device thereof, storage medium and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022027818A1 true WO2022027818A1 (en) | 2022-02-10 |
Family
ID=80119905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/120177 WO2022027818A1 (en) | 2020-08-07 | 2020-10-10 | Data batch processing method and batch processing apparatus thereof, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114065905A (en) |
WO (1) | WO2022027818A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108388537A (en) * | 2018-03-06 | 2018-08-10 | 上海熠知电子科技有限公司 | A kind of convolutional neural networks accelerator and method |
US20190147299A1 (en) * | 2016-10-31 | 2019-05-16 | Tencent Technology (Shenzhen) Company Limited | Data processing method and apparatus for convolutional neural network |
CN110136066A (en) * | 2019-05-23 | 2019-08-16 | 北京百度网讯科技有限公司 | Super-resolution method, device, equipment and storage medium towards video |
CN110211205A (en) * | 2019-06-14 | 2019-09-06 | 腾讯科技(深圳)有限公司 | Image processing method, device, equipment and storage medium |
CN110782393A (en) * | 2019-10-10 | 2020-02-11 | 江南大学 | Image resolution compression and reconstruction method based on reversible network |
CN110895801A (en) * | 2019-11-15 | 2020-03-20 | 北京金山云网络技术有限公司 | Image processing method, device, equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108876813B (en) * | 2017-11-01 | 2021-01-26 | 北京旷视科技有限公司 | Image processing method, device and equipment for detecting object in video |
CN110009102B (en) * | 2019-04-12 | 2023-03-24 | 南京吉相传感成像技术研究院有限公司 | Depth residual error network acceleration method based on photoelectric computing array |
CN111199273B (en) * | 2019-12-31 | 2024-03-26 | 深圳云天励飞技术有限公司 | Convolution calculation method, device, equipment and storage medium |
CN111459856B (en) * | 2020-03-20 | 2022-02-18 | 中国科学院计算技术研究所 | Data transmission device and transmission method |
-
2020
- 2020-08-07 CN CN202010791617.5A patent/CN114065905A/en active Pending
- 2020-10-10 WO PCT/CN2020/120177 patent/WO2022027818A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147299A1 (en) * | 2016-10-31 | 2019-05-16 | Tencent Technology (Shenzhen) Company Limited | Data processing method and apparatus for convolutional neural network |
CN108388537A (en) * | 2018-03-06 | 2018-08-10 | 上海熠知电子科技有限公司 | A kind of convolutional neural networks accelerator and method |
CN110136066A (en) * | 2019-05-23 | 2019-08-16 | 北京百度网讯科技有限公司 | Super-resolution method, device, equipment and storage medium towards video |
CN110211205A (en) * | 2019-06-14 | 2019-09-06 | 腾讯科技(深圳)有限公司 | Image processing method, device, equipment and storage medium |
CN110782393A (en) * | 2019-10-10 | 2020-02-11 | 江南大学 | Image resolution compression and reconstruction method based on reversible network |
CN110895801A (en) * | 2019-11-15 | 2020-03-20 | 北京金山云网络技术有限公司 | Image processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114065905A (en) | 2022-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11775430B1 (en) | Memory access for multiple circuit components | |
US10936937B2 (en) | Convolution operation device and convolution operation method | |
US10545559B2 (en) | Data processing system and method | |
WO2020062284A1 (en) | Convolutional neural network-based image processing method and device, and unmanned aerial vehicle | |
WO2022110386A1 (en) | Data processing method and artificial intelligence processor | |
CN113792621B (en) | FPGA-based target detection accelerator design method | |
WO2020233709A1 (en) | Model compression method, and device | |
WO2022007265A1 (en) | Dilated convolution acceleration calculation method and apparatus | |
CN112799599A (en) | Data storage method, computing core, chip and electronic equipment | |
CN116720549A (en) | FPGA multi-core two-dimensional convolution acceleration optimization method based on CNN input full cache | |
CN113301221B (en) | Image processing method of depth network camera and terminal | |
Cadenas et al. | Parallel pipelined array architectures for real-time histogram computation in consumer devices | |
WO2022027818A1 (en) | Data batch processing method and batch processing apparatus thereof, and storage medium | |
CN107085827B (en) | Super-resolution image restoration method based on hardware platform | |
CN109416743A (en) | A kind of Three dimensional convolution device artificially acted for identification | |
WO2020029181A1 (en) | Three-dimensional convolutional neural network-based computation device and related product | |
US6771271B2 (en) | Apparatus and method of processing image data | |
CN116051345A (en) | Image data processing method, device, computer equipment and readable storage medium | |
CN113160321B (en) | Geometric mapping method and device for real-time image sequence | |
CN112001492B (en) | Mixed running water type acceleration architecture and acceleration method for binary weight DenseNet model | |
CN115456858B (en) | Image processing method, device, computer equipment and computer readable storage medium | |
CN113222831B (en) | Feature memory forgetting unit, network and system for removing image stripe noise | |
WO2022000456A1 (en) | Image processing method and apparatus, integrated circuit, and device | |
CN109509218A (en) | The method, apparatus of disparity map is obtained based on FPGA | |
CN116681588A (en) | Super-resolution implementation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20948605 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20948605 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 030723) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20948605 Country of ref document: EP Kind code of ref document: A1 |