CN114065905A - Data batch processing method and batch processing device thereof, storage medium and computer equipment - Google Patents

Data batch processing method and batch processing device thereof, storage medium and computer equipment Download PDF

Info

Publication number
CN114065905A
CN114065905A CN202010791617.5A CN202010791617A CN114065905A CN 114065905 A CN114065905 A CN 114065905A CN 202010791617 A CN202010791617 A CN 202010791617A CN 114065905 A CN114065905 A CN 114065905A
Authority
CN
China
Prior art keywords
data
recombined
neural network
original channel
continuous frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010791617.5A
Other languages
Chinese (zh)
Inventor
王峥
雷明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhongke Yuanwuxin Technology Co ltd
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202010791617.5A priority Critical patent/CN114065905A/en
Priority to PCT/CN2020/120177 priority patent/WO2022027818A1/en
Publication of CN114065905A publication Critical patent/CN114065905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7817Specially adapted for signal processing, e.g. Harvard architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data batch processing method, a batch processing device, a storage medium and computer equipment thereof. The data batch processing method comprises the following steps: acquiring a memory bandwidth and selecting original channel data of N continuous frame images according to the memory bandwidth; splicing the original channel data of the N continuous frame images to form a plurality of recombined data strips, wherein each recombined data strip comprises the original channel data of the N continuous frame images on the same pixel position; and sequentially inputting a plurality of recombined data strips into the parallel computing unit array for convolution operation, wherein all original channel data of the same recombined data strip enter the computing unit at the same time. And the image channel data of the adjacent multi-frame pictures at the same pixel position is re-processed, the recombined data is input into the calculation unit at the same time and is subjected to convolution operation with the same weight data, so that the reading times of the weight data and the image data can be reduced, and the calculation energy consumption is greatly reduced.

Description

Data batch processing method and batch processing device thereof, storage medium and computer equipment
Technical Field
The present invention belongs to the technical field of data processing, and in particular, to a data batch processing method for a neural network, a batch processing apparatus, a computer-readable storage medium, and a computer device thereof.
Background
With the popularization of big data and artificial intelligence technology, the deep learning algorithm based on the artificial neural network obtains remarkable achievements in the fields of computer vision, natural language processing, intelligent agent autonomous decision and the like by means of strong feature extraction capability. However, the neural network structure is becoming more complex, and the number of parameters and the amount of computation increase sharply, which has higher requirements on the data bandwidth and the computing power of the hardware platform.
Among them, continuous image processing techniques, such as target recognition, tracking, super-resolution reconstruction, etc., in video streams, are dominant in intelligent applications. The current mainstream deep learning accelerator has a good acceleration effect on the intelligent processing of a single-frame image, however, for video application, the direct application of a single-frame acceleration technology can cause great computing resource waste, and particularly, a large amount of repeated read-write operations of an off-chip memory are caused. The core reason is the non-optimized memory operations such as repeated weight reading and discrete data reading among different frame images.
Disclosure of Invention
(I) technical problems to be solved by the invention
The technical problem solved by the invention is as follows: how to reduce the number of times data is read from memory.
(II) the technical scheme adopted by the invention
A data batching method for a neural network, the data batching method comprising:
acquiring a memory bandwidth and selecting original channel data of N continuous frame images according to the memory bandwidth;
splicing the original channel data of the N continuous frame images to form a plurality of recombined data strips, wherein each recombined data strip comprises the original channel data of the N continuous frame images at the same pixel position;
and sequentially inputting a plurality of recombined data strips into the parallel computing unit array for convolution operation, wherein all original channel data of the same recombined data strip enter the computing unit at the same time.
Preferably, each of the reassembled data strips further includes zero padding data, and a data bit width of each of the reassembled data strips is equal to the memory bandwidth.
Preferably, the data batch processing method further includes:
and storing the multiple recombined data strips into a memory.
Preferably, the method for sequentially inputting the plurality of recombined data strips into the parallel computing unit array to perform convolution operation includes:
performing multiply-add operation on the original channel data of each continuous frame image in each recombined data strip and the same weight data respectively;
and storing the result of the multiply-add operation of each continuous frame image in each recombined data strip into different registers.
Preferably, the memory bandwidth is 128 bits, N is 5, and the raw channel data at each pixel position of each of the consecutive frame images includes red channel data, green channel data, and blue channel data.
The present application also discloses a data batch processing apparatus for a neural network, the data batch processing apparatus comprising:
the data acquisition module is used for acquiring the memory bandwidth and selecting the original channel data of N continuous frame images according to the memory bandwidth;
the data recombination module is used for splicing the original channel data of the N continuous frame images to form a plurality of recombined data strips, wherein each recombined data strip comprises the original channel data of the N continuous frame images at the same pixel position;
and the convolution calculation module is used for reading the multiple recombined data strips and carrying out convolution operation on the multiple recombined data strips in sequence, wherein all original channel data of the same recombined data strip are read by the convolution calculation module at the same time.
Preferably, the data batch processing device further comprises a memory, and the memory is used for receiving and storing the multiple recombined data strips formed by the data recombination module.
Preferably, the convolution calculation module includes:
the multiplier-adder unit is used for respectively carrying out multiplication-addition operation on the original channel data of each continuous frame image in each recombined data strip and the same weight data;
and the storage unit is used for storing the result of the multiply-add operation of each continuous frame image in each recombined data strip.
The invention also discloses a computer readable storage medium, which stores the data batch processing program for the neural network, and the data batch processing program for the neural network realizes the data batch processing method for the neural network when being executed by a processor.
The invention also discloses a computer device, which comprises a computer readable storage medium, a processor and a data batch processing program for the neural network stored in the computer readable storage medium, wherein the data batch processing program for the neural network realizes the data batch processing method for the neural network when being executed by the processor.
(III) advantageous effects
The invention discloses a data batch processing method for a neural network, which has the following technical effects compared with the traditional calculation method:
(1) the optimized data structure facing the three-dimensional array can realize the rapid buffering of data and avoid the reading of repeated weights among different frame images, thereby greatly reducing the access times of an off-chip memory;
(2). The method is novel in idea, and starts from the characteristics of input data, the method can exert great effect on the situation that the input data such as background images, monitoring videos and other images of the first layer of convolutional layer are single and unchanged, and has great potential on the situation that the depth of a convolutional kernel in a deep neural network is great.
Drawings
Fig. 1 is a flowchart of a data batch processing method for a neural network according to a first embodiment of the present invention;
FIG. 2 is a flowchart of convolution calculation according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram of a data splicing process according to a first embodiment of the present invention;
FIG. 4 is a diagram of a data batch processing apparatus according to a second embodiment of the present invention;
FIG. 5 is a diagram of a parallel computing unit array according to a second embodiment of the present invention;
FIG. 6 is a schematic diagram of a computer apparatus according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Before describing in detail the various embodiments of the present application, the inventive concepts of the present application are first briefly described: in the prior art, convolution calculation is carried out on each frame of picture in sequence, weight data and image data need to be read repeatedly, and calculation resource waste can be caused.
Example one
Specifically, as shown in fig. 1, the data batch processing method for the neural network according to the first embodiment includes the following steps:
step S10: acquiring a memory bandwidth and selecting original channel data of N continuous frame images according to the memory bandwidth;
step S20: splicing the original channel data of the N continuous frame images to form a plurality of recombined data strips, wherein each recombined data strip comprises the original channel data of the N continuous frame images at the same pixel position;
step S30: and sequentially inputting a plurality of recombined data strips into the parallel computing unit array for convolution operation, wherein all original channel data of the same recombined data strip enter the computing unit at the same time.
In step S10, taking the memory bandwidth equal to 128 bits as an example, in the prior art, when reading data from the memory, each time, the original channel data of one pixel point is read, which includes the red channel data R, the green channel data G, and the blue channel data B, each color channel data occupies 8 bits, and 24 bits in total, so that only 24 bits of data are read each time, which wastes the memory bandwidth. According to the characteristics of the convolution calculation process, the characteristics of the image data are combined, the original channel data of N continuous frame images are selected according to the size of the actually used memory bandwidth, splicing is carried out, more channel data can be read when the data are read from the memory each time, and the use efficiency of the memory bandwidth is improved.
Illustratively, the splicing process in step S20 is described in detail by taking the example that the memory bandwidth is equal to 128 bits and N is equal to 5. The original channel data of 5 continuous frame images at the same pixel position are spliced to form a repeated data strip, for example, the original channel data of 5 continuous frame images at a first pixel point are spliced to form a recombined data strip with a data bit width of 120 bits. As a preferred embodiment, zero padding is performed on the formed reassembled data strip, so that the data bit width of the reassembled data strip is equal to the memory bandwidth, for example, 8 0 s are padded at the end of the reassembled data strip with a data bit width of 120 bits, and a 128-bit reassembled data strip is formed.
As a preferred embodiment, a Block Memory (model 128-32) of Xilinx is adopted to read the original channel data of each image, but the Block Memory can only read four color channel data at a time, namely 32-bit data, and what is really needed is 24-bit data, so that the data after being read by the Block Memory needs to be further recombined. Illustratively, the original channel data of 5 consecutive frame images are temporarily stored in 5 buffers, respectively, and as shown in fig. 3, the color channel data, i.e., R, are sequentially arranged in the order of pixel position0G0B0R1G1B1R2G2B2R3G3B3... when reading with a block memory, the R's are read sequentially in the first cache0G0B0R1、G1B1R2G2、B2 R3G3B3Setting a first register to read R for the first time1Storing in the first register for next splicing to obtain R of 5 images0G0B0Splicing and zero filling are carried out to form a recombined data strip of a first pixel point, namely R0G0B0R0G0B0R0G0B0R0G0B0R0G0B0And 0, setting a second register to store the recombined data strip, thereby completing the recombination of the original channel data of the first pixel point. Similarly, when the original channel data of the second pixel point is recombined, G read for the second time is obtained1B1And R in the first register1Recombining to form a recombined data strip of a second pixel, namely R1G1B1R1G1B1R1G1B1R1G1B1R1G1B10, and R read for the second time2G2Stored in the first register for use in the next reassembly. Similarly, when the original channel data of the third pixel point is recombined, the read B for the third time is used2With R stored in a first register2G2Splicing to form a recombined data strip of a third pixel point, namely R2G2B2R2G2B2R2G2B2R2G2B2R2G2B20, and R of each image read for the third time3G3B3Splicing to form a recombined data strip of a fourth pixel point, namely R3G3B3R3G3B3R3G3B3R3G3B3R3G3B3And 0, finishing the recombination of the original channel data of the four pixel points by reading for three times to form four repeated data strips.
And repeating the steps until the original channel data of all the pixel points of the 5 continuous frame images are recombined, and storing all the obtained recombined data strips into a memory so as to facilitate the subsequent calculation and use.
In step S30, the 32 × 64 parallel computing unit array is exemplified to include 32 × 64 computing units, 64 data caches TB and 32 weight caches WB, where each data cache TB stores multiple sets of reassembled data strips, and the weight data stored in each weight cache WB is shared by 64 data caches.
As a preferred embodiment, taking the size of the sliding window equal to 2 × 2 in the convolution calculation process as an example, each data cache stores a recombined data strip of four adjacent pixels, that is, R0G0B0R0G0B0R0G0B0R0G0B0R0G0B00、R1G1B1R1G1B1R1G1B1R1G1B1R1G1B10、R2G2B2R2G2B2R2G2B2R2G2B2R2G2B20 and R3G3B3R3G3B3R3G3B3R3G3B3R3G3B30. When convolution calculation is carried out, the same recombined data strip is written into the calculation unit at the same time, so that the memory bandwidth utilization efficiency can be improved on one hand, and the memory reading times can be reduced on the other hand.
Specifically, as shown in fig. 2, the method for sequentially inputting multiple recombined data strips into the parallel computing unit array to perform convolution operation includes:
step S31: performing multiply-add operation on the original channel data of each continuous frame image in each recombined data strip and the same weight data respectively;
step S32: and storing the result of the multiply-add operation of each continuous frame image in each recombined data strip into different registers.
Illustratively, taking 5 consecutive frame images as an example, each reconstructed data strip includes original channel data of 5 pixels of the 5 consecutive frame images, wherein 5 third registers are provided for storing the result of the multiply-add operation of the 5 consecutive frame images, respectively. As shown in FIG. 5, for example, for the original channel data of the first pixel of the first continuous frame image, the result of the multiply-add operation is F0=W00*R0+W01*G0+W02*B0And storing the calculation result in a corresponding third register, and further adding the multiplication and addition calculation results after the multiplication and addition calculation results of all pixel points in the sliding window of the first continuous frame image are obtained. Similarly, the result of the multiply-add operation of the first pixel point of the second continuous frame image is F1=W00*R0+W01*G0+W02*B0And storing the multiplication and addition operation result into a corresponding third register, and so on, and storing each calculation result into a different third register. In the convolution calculation process, the same recombined data strip corresponds to the same weight data, so that the sharing of the weight data can be realized, the repeated reading of the weight data for many times is not needed, and the repeated reading of the image data is avoided and the memory access times are reduced because all the original channel data of the same recombined data strip are read into the calculation unit at one time.
Example two
As shown in fig. 4, the data batch processing apparatus for a neural network in the second embodiment includes a data obtaining module 100, a data reconstructing module 200, and a convolution calculating module 300, where the data obtaining module 100 is configured to obtain a memory bandwidth and select original channel data of N consecutive frame images according to the memory bandwidth; the data reorganization module 200 is configured to splice original channel data of the N consecutive frame images to form multiple reorganization data strips, where each reorganization data strip includes original channel data of the N consecutive frame images at the same pixel position; the convolution calculation module 300 is configured to read a plurality of recombined data strips and perform convolution operation on the plurality of recombined data strips in sequence, where all original channel data of the same recombined data strip are read by the convolution calculation module at the same time. The data batch processing apparatus further includes a memory 400, and the memory 400 is configured to receive and store multiple recombined data strips formed by the data recombining module 200.
Specifically, the data obtaining module 100 includes a plurality of buffers, and the plurality of buffers are configured to read and temporarily store the original channel data of the corresponding image from the memory module 400 according to the data of the memory bandwidth. Taking the example that the memory bandwidth is equal to 128 bits, and N is equal to 5, 5 different buffers are adopted to read and store the original channel data of 5 continuous frame images from the memory, wherein the color channel data, namely R, are sequentially arranged according to the pixel position sequence0G0B0R1G1B1R2G2B2R3G3B3
The data reorganization module 200 includes a Block Memory, illustratively a Block Memory model 128-32 from Xilinx corporation, a first register, a second register, and a counter. The block memory read can only read four color channel data at a time, namely 32-bit data, while the data really needs to be 24-bit data, so that the data after the block memory read also needs to be recombined. When the data is read for the first time, the block memory reads R data from each buffer respectively0G0B0R1At this time, R is1R stored in a first register for each image0G0B0Splicing and zero filling processing are carried out to form a recombined data strip of a first pixel point, namely R0G0B0R0G0B0R0G0B0R0G0B0R0G0B00 and stored in the second register while setting the counter value to 0. Similarly, when reading data for the second time, the data read by the block memory from each buffer is G1B1R2G2At this time, R is2G2Storing in a first register, and storing R previously stored in the first register1And G of the second reading1B1Splicing and zero filling processing are carried out to form a recombined data strip of a second pixel point, namely R1G1B1R1G1B1--R1G1B1R1G1B1R1G1B10 and stored in the second register while setting the counter value to 1. When the data is read for the third time, the data read from each buffer by the block memory is B2R3G3B3R stored in advance in the first register2G2And B of the third reading2Splicing and zero filling processing are carried out to form a recombined data strip of a third pixel point, namely R2G2B2R2G2B2R2G2B2R2G2B2R2G2B20 and stored in the second register while setting the counter value to 2. Then read the R for the third time3G3B3Splicing and zero filling processing are carried out to form a recombined data strip of a fourth pixel point, namely R3G3B3R3G3B3R3G3B3R3G3B3R3G3B30 and stored in the second register while setting the counter value to 3. Therefore, the original channel data of the four pixel points can be recombined every three times of reading to form four repeated data strips. Repeating the above steps untilAnd finishing the recombination of the original channel data of all the pixel points of the 5 continuous frame images, and storing all the obtained recombined data strips into a memory so as to be convenient for subsequent calculation and use.
Further, as shown in fig. 5, taking a 32 × 64 parallel computing unit array as an example, the parallel computing unit array includes 32 × 64 computing units PE, 64 data caches TB and 32 weight caches WB, where each data cache TB stores multiple sets of reassembly data strips, weight data stored in each weight cache WB is shared by 64 data caches, and the convolution computing module is the computing unit PE.
The convolution calculation module comprises a multiplier-adder unit and a storage unit, wherein the multiplier-adder unit is used for respectively carrying out multiplication-addition operation on the original channel data of each continuous frame image in each recombined data strip and the same weight data, and the storage unit is used for storing the multiplication-addition operation result of each continuous frame image in each recombined data strip.
Illustratively, the multiplier-adder unit includes multipliers and adders, and the storage unit includes a data selector 301, a data distributor 302, and 5 third registers 303. For example, the multiplier is used for the original channel data of the first pixel point of the first continuous frame image
Figure BDA0002623943650000081
Calculating W00*R0And reads data from the corresponding third register 303 using the data selector 301, using the adder
Figure BDA0002623943650000082
In the calculation, since the initial value of the third register is zero, the calculation result of the adder is W00*R0The result of this calculation W is then passed through the data distributor 30200*R0To the third register 303. Continued use of multipliers
Figure BDA0002623943650000083
Calculating W01*G0And reads the data W from the corresponding third register 303 using the data selector 30100*R0Using adders
Figure BDA0002623943650000084
Calculation, adder
Figure BDA0002623943650000085
The calculation result is W00*R0+W01*G0The result of this calculation W is then passed through the data distributor 30200*R0+W01*G0To the third register 303. Finally, the multiplier is continuously utilized
Figure BDA0002623943650000086
Calculating W02*B0And reads the data W from the corresponding third register 303 using the data selector 30100*R0+W01*G0Using adders
Figure BDA0002623943650000087
Calculation, adder
Figure BDA0002623943650000088
The calculation result is F0=W00*R0+W01*G0+W02*B0The result of this calculation F is then passed through the data distributor 3020To the third register 303. And by analogy, the convolution calculation of each original channel data is completed. Since the same recombined data strip corresponds to the same weight data, e.g. W00W01 W02The reuse is needed five times, and the weight data can be multiplexed by setting an additional address pointer and a counter, and controlling the address pointer when a group of weight data is used for less than 5 times.
The application also discloses a computer readable storage medium, which stores a data batch processing program for the neural network, and the data batch processing program for the neural network realizes the data batch processing method for the neural network when being executed by a processor.
The present application also discloses a computer device, and on the hardware level, as shown in fig. 6, the terminal includes a processor 12, an internal bus 13, a network interface 14, and a computer-readable storage medium 11. The processor 12 reads a corresponding computer program from the computer-readable storage medium and then runs, forming a request processing apparatus on a logical level. Of course, besides software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combinations of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices. The computer-readable storage medium 11 stores thereon a data batching program for a neural network, which when executed by a processor implements the data batching method for a neural network described above.
Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents, and that such changes and modifications are intended to be within the scope of the invention.

Claims (10)

1. A data batching method for a neural network, the data batching method comprising:
acquiring a memory bandwidth and selecting original channel data of N continuous frame images according to the memory bandwidth;
splicing the original channel data of the N continuous frame images to form a plurality of recombined data strips, wherein each recombined data strip comprises the original channel data of the N continuous frame images at the same pixel position;
and sequentially inputting a plurality of recombined data strips into the parallel computing unit array for convolution operation, wherein all original channel data of the same recombined data strip enter the computing unit at the same time.
2. The data batching method for the neural network as recited in claim 1, wherein each said reorganized data strip further comprises zero padding data, and a data bit width of each said reorganized data strip is equal to said memory bandwidth.
3. The data batching method for a neural network as claimed in claim 2, further comprising:
and storing the multiple recombined data strips into a memory.
4. The data batch processing method for the neural network as claimed in claim 1, wherein the method for sequentially inputting the plurality of recombined data strips into the parallel computing unit array to perform convolution operation comprises:
performing multiply-add operation on the original channel data of each continuous frame image in each recombined data strip and the same weight data respectively;
and storing the result of the multiply-add operation of each continuous frame image in each recombined data strip into different registers.
5. The data batch processing method for the neural network according to claim 1, wherein the memory bandwidth is 128 bits, N is 5, and the raw channel data at each pixel position of each of the successive frame images includes red channel data, green channel data, and blue channel data.
6. A data batching device for a neural network, the data batching device comprising:
the data acquisition module is used for acquiring the memory bandwidth and selecting the original channel data of N continuous frame images according to the memory bandwidth;
the data recombination module is used for splicing the original channel data of the N continuous frame images to form a plurality of recombined data strips, wherein each recombined data strip comprises the original channel data of the N continuous frame images at the same pixel position;
and the convolution calculation module is used for reading the multiple recombined data strips and carrying out convolution operation on the multiple recombined data strips in sequence, wherein all original channel data of the same recombined data strip are read by the convolution calculation module at the same time.
7. The data batch processing device for the neural network according to claim 6, further comprising a memory, wherein the memory is used for receiving and storing the plurality of recombined data strips formed by the data recombination module.
8. The data batching device for the neural network as recited in claim 6, wherein said convolution calculating module comprises:
the multiplier-adder unit is used for respectively carrying out multiplication-addition operation on the original channel data of each continuous frame image in each recombined data strip and the same weight data;
and the storage unit is used for storing the result of the multiply-add operation of each continuous frame image in each recombined data strip.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a data batching program for a neural network, which when executed by a processor implements the data batching method for a neural network according to any one of claims 1 to 5.
10. A computer device comprising a computer readable storage medium, a processor, and a data batching program for a neural network stored in the computer readable storage medium, the data batching program for a neural network implementing the data batching method for a neural network of any one of claims 1 to 5 when executed by the processor.
CN202010791617.5A 2020-08-07 2020-08-07 Data batch processing method and batch processing device thereof, storage medium and computer equipment Pending CN114065905A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010791617.5A CN114065905A (en) 2020-08-07 2020-08-07 Data batch processing method and batch processing device thereof, storage medium and computer equipment
PCT/CN2020/120177 WO2022027818A1 (en) 2020-08-07 2020-10-10 Data batch processing method and batch processing apparatus thereof, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010791617.5A CN114065905A (en) 2020-08-07 2020-08-07 Data batch processing method and batch processing device thereof, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN114065905A true CN114065905A (en) 2022-02-18

Family

ID=80119905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010791617.5A Pending CN114065905A (en) 2020-08-07 2020-08-07 Data batch processing method and batch processing device thereof, storage medium and computer equipment

Country Status (2)

Country Link
CN (1) CN114065905A (en)
WO (1) WO2022027818A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876813A (en) * 2017-11-01 2018-11-23 北京旷视科技有限公司 Image processing method, device and equipment for object detection in video
CN110009102A (en) * 2019-04-12 2019-07-12 南京吉相传感成像技术研究院有限公司 A kind of accelerated method of the depth residual error network based on photoelectricity computing array
CN111199273A (en) * 2019-12-31 2020-05-26 深圳云天励飞技术有限公司 Convolution calculation method, device, equipment and storage medium
CN111459856A (en) * 2020-03-20 2020-07-28 中国科学院计算技术研究所 Data transmission device and transmission method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107742150B (en) * 2016-10-31 2020-05-12 腾讯科技(深圳)有限公司 Data processing method and device of convolutional neural network
CN108388537B (en) * 2018-03-06 2020-06-16 上海熠知电子科技有限公司 Convolutional neural network acceleration device and method
CN110136066B (en) * 2019-05-23 2023-02-24 北京百度网讯科技有限公司 Video-oriented super-resolution method, device, equipment and storage medium
CN110211205B (en) * 2019-06-14 2022-12-13 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN110782393A (en) * 2019-10-10 2020-02-11 江南大学 Image resolution compression and reconstruction method based on reversible network
CN110895801A (en) * 2019-11-15 2020-03-20 北京金山云网络技术有限公司 Image processing method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876813A (en) * 2017-11-01 2018-11-23 北京旷视科技有限公司 Image processing method, device and equipment for object detection in video
CN110009102A (en) * 2019-04-12 2019-07-12 南京吉相传感成像技术研究院有限公司 A kind of accelerated method of the depth residual error network based on photoelectricity computing array
CN111199273A (en) * 2019-12-31 2020-05-26 深圳云天励飞技术有限公司 Convolution calculation method, device, equipment and storage medium
CN111459856A (en) * 2020-03-20 2020-07-28 中国科学院计算技术研究所 Data transmission device and transmission method

Also Published As

Publication number Publication date
WO2022027818A1 (en) 2022-02-10

Similar Documents

Publication Publication Date Title
WO2020073211A1 (en) Operation accelerator, processing method, and related device
Liu et al. Switchable temporal propagation network
CN111709516B (en) Compression method and compression device, storage medium and equipment of neural network model
WO2019084788A1 (en) Computation apparatus, circuit and relevant method for neural network
CN114418853B (en) Image super-resolution optimization method, medium and equipment based on similar image retrieval
CN112884650B (en) Image mixing super-resolution method based on self-adaptive texture distillation
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
Yang et al. Ensemble learning priors driven deep unfolding for scalable video snapshot compressive imaging
WO2021147276A1 (en) Data processing method and apparatus, and chip, electronic device and storage medium
CN114359039A (en) Knowledge distillation-based image super-resolution method
CN113222129B (en) Convolution operation processing unit and system based on multi-level cache cyclic utilization
WO2022007265A1 (en) Dilated convolution acceleration calculation method and apparatus
CN112184587B (en) Edge data enhancement model, and efficient edge data enhancement method and system based on model
Wang et al. Image super-resolution via lightweight attention-directed feature aggregation network
CN114065905A (en) Data batch processing method and batch processing device thereof, storage medium and computer equipment
CN116681631A (en) Dual-network-based low-quality film image restoration and enhancement method and system
CN114758209B (en) Convolution result obtaining method and device, computer equipment and storage medium
CN111915492B (en) Multi-branch video super-resolution method and system based on dynamic reconstruction
CN115082306A (en) Image super-resolution method based on blueprint separable residual error network
CN110399881B (en) End-to-end quality enhancement method and device based on binocular stereo image
WO2020063225A1 (en) Data processing method and apparatus
Ying et al. Accurate stereo image super-resolution using spatial-attention-enhance residual network
KR20200023154A (en) Method and apparatus for processing convolution neural network
CN113379046B (en) Acceleration calculation method for convolutional neural network, storage medium and computer equipment
CN115456858B (en) Image processing method, device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240124

Address after: 518102 18D1, Block C, Central Avenue, Intersection of Xixiang Avenue and Baoyuan Road, Labor Community, Xixiang Street, Bao'an District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen Zhongke Yuanwuxin Technology Co.,Ltd.

Country or region after: China

Address before: 1068 No. 518055 Guangdong city in Shenzhen Province, Nanshan District City Xili University School Avenue

Applicant before: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY

Country or region before: China