CN112633462A - Block type inference method and system for memory optimization of convolution neural network - Google Patents

Block type inference method and system for memory optimization of convolution neural network Download PDF

Info

Publication number
CN112633462A
CN112633462A CN202010922472.8A CN202010922472A CN112633462A CN 112633462 A CN112633462 A CN 112633462A CN 202010922472 A CN202010922472 A CN 202010922472A CN 112633462 A CN112633462 A CN 112633462A
Authority
CN
China
Prior art keywords
block
layer
input
data
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010922472.8A
Other languages
Chinese (zh)
Inventor
黄朝宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CN112633462A publication Critical patent/CN112633462A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Abstract

The invention provides a block type inference method and a block type inference system for memory optimization of a convolutional neural network. The block inference step drives the arithmetic processing unit to perform multi-layer convolution operation on each input block data to generate output block data. The block inference step selects the ith layer to recalculate the features along the scan line feed direction according to the position of the output block data. The block inference step selects the i-th layer recycling features along the block scanning direction according to the i-th layer recalculated input feature block data. The convolution operation step executes convolution operation according to the recalculation characteristics of the ith layer and the recycling characteristics of the ith layer. Therefore, by using the calculation mode with different characteristics in different directions, the bandwidth requirement of the external memory can be greatly reduced without increasing excessive calculation amount and internal block registers.

Description

Block type inference method and system for memory optimization of convolution neural network
Technical Field
The present invention relates to a block inference method and system, and more particularly, to a block inference method and system for memory optimization of convolutional neural network.
Background
When a convolutional neural network is used in image processing applications, the bandwidth requirement of its external memory may be quite high, and using a block-wise inference procedure can greatly reduce this bandwidth requirement. However, there are overlapped eigenvectors between blocks, and two different processing methods are known, one is a recalculation method, and the other is a recycling method. The former increases the amount of computation and decreases the amount of output pixels, while the latter requires a large number of block registers to store the reused eigenvectors. Therefore, a block inference method and system for memory optimization of convolutional neural network that can greatly reduce the bandwidth requirement of the external memory without increasing too much computation and block register are not available in the market, and therefore, related practitioners seek solutions to this problem.
Disclosure of Invention
Therefore, the present invention provides a block inference method and system for memory optimization of convolutional neural network, wherein when block inference is performed, the calculated features are repeatedly used in the forward direction of the block, and a re-calculation mode is used in the other direction, so that the block inference can still greatly reduce the bandwidth requirement of the external memory without increasing too much calculation amount and the block register.
One embodiment of a method according to the present invention provides a memory-optimized block-wise inference method for convolutional neural networks, which is used for processing an input image. The block type inference method for memory optimization of the convolutional neural network comprises a parameter setting step, a segmentation step, a block inference step and a temporary storage step, wherein the parameter setting step is to set an inference parameter group, and the inference parameter group comprises a convolution depth, a block width, a block height and a multilayer convolution kernel size. The dividing step is to drive an operation processing unit to divide the input image into a plurality of input block data according to the convolution depth, the block width, the block height and the sizes of the layer convolution kernels, wherein each input block data has an input block size. The block inference step is to drive the operation processing unit to execute a multi-layer convolution operation on each input block data to generate output block data, the multi-layer convolution operation comprises a first direction data selection step, a second direction data selection step and a convolution operation step, wherein the first direction data selection step selects a plurality of i-th layer recalculation characteristics along a scanning line-changing direction according to a position of the output block data, and then selects an i-th layer recalculation input characteristic block data according to the position of the output block data and the i-th layer recalculation characteristics, wherein i is one of a plurality of positive integers from 1 to the convolution depth. The second direction data selecting step selects a plurality of i-th layer recycling characteristics along a block scanning direction according to the i-th layer recalculation input characteristic block data, and combines the i-th layer recalculation input characteristic block data and the i-th layer recycling characteristics to generate i-th layer recycling input characteristic block data. In addition, the convolution operation step is to select a plurality of i-th layer sub-block input feature groups from the i-th layer repeated utilization input feature block data according to the i-th layer convolution kernel size, then to execute a convolution operation on each i-th layer sub-block input feature group and a convolution parameter group to generate an i-th layer sub-block output feature, and to combine the i-th layer sub-block output features corresponding to the i-th layer sub-block input feature groups to form i-th layer output feature block data. The temporary storage step is to drive a block temporary storage to temporarily store the ith layer output characteristic block data and the ith layer recycling characteristics.
Therefore, the block type inference method for memory optimization of the convolutional neural network uses calculation modes with different characteristics in different directions, so that the block type inference can still greatly reduce the bandwidth requirement of an external memory on the premise of not increasing excessive calculation amount and a block temporary storage.
Other examples of the foregoing embodiments are as follows: when the aforementioned i is equal to 1, the i-th layer recalculates the input feature block data to be equal to each input block data. When i is equal to the convolution depth, the i-th layer output feature block data is equal to the output block data.
Other examples of the foregoing embodiments are as follows: the i-th layer recalculated input feature block data has an i-th layer recalculated input feature block size and an i-th layer recalculated input feature block channel number, and the i-th layer output feature block data has an i-th layer output feature block size and an i-th layer output feature block channel number. The size of the ith layer output feature block is larger than that of the ith layer recalculated input feature block, and the number of channels of the ith layer recalculated input feature block is equal to that of the ith layer output feature block.
Other examples of the foregoing embodiments are as follows: the block scanning direction is vertical to the scanning line-changing direction, the block width is larger than the block height, and the extending direction of the block height is parallel to the block scanning direction.
Other examples of the foregoing embodiments are as follows: the convolution depth, block width and block height are positive integers, and the i-th layer convolution kernel has a size of kWi×kHi. The i-th layer reuse features have a reuse feature number along the block scanning direction, and the reuse feature number is equal to kHi-1。
Other examples of the foregoing embodiments are as follows: the block width is represented as BWWith convolution depth denoted D and block height denoted BH. Input block size equal to BW×BH. The output block data has an output block size equal to (B)W-2D)×BH. The i-th layer recalculated input feature block data has an i-th layer recalculated input feature block size equal to (B)W-2i+2)×BH. The ith layer recycling input feature block data has an ith layer recycling input feature block size equal to (B)W-2i+2)×(BH+2). The ith layer output characteristic block data has an ith layer output characteristic block size equal to (B)W-2i)×BH. The convolution depth is less than half the block width.
Other examples of the foregoing embodiments are as follows: when at least one of the input features of an ith sub-block input feature group is located in the outer region of the ith recycling input feature block data, the input features of the ith sub-block input feature group comprise a plurality of outer block features and a plurality of first inner block features. The outer region features represent calculated features, and the first inner region features represent non-calculated features. Furthermore, when the input features of an i-th layer sub-block input feature group are all located in the inner region of the i-th layer recycling input feature block data, the input features of the i-th layer sub-block input feature group only comprise a plurality of second inner block features. The arrangement sequence of the input feature block data along the block scanning direction is repeatedly utilized by the ith layer to form an outer area and an inner area.
Other examples of the foregoing embodiments are as follows: the external block characteristics are stored in a block temporary storage device, the block temporary storage device is provided with a temporary storage space, and the temporary storage space is obtained by recalculating the width, the convolution depth, the number of layers, the number of channels and the size of the i-th layer convolution kernel of the input characteristic block data through the i-th layer. The width of the i-th layer recalculated input feature block data is represented as BWiThe convolution depth is represented as D, the number of layers is represented as i, the number of channels is represented as C, and the size of the convolution kernel of the ith layer is kWi×kHiThe scratch space is denoted LBS and conforms to the following equation:
Figure BDA0002667195500000031
according to an embodiment of the present invention, a memory-optimized block-based inference system for convolutional neural networks is provided for processing an input image, the memory-optimized block-based inference system for convolutional neural networks includes a block register for accessing ith layer output feature block data and a plurality of ith layer recycling features and an operation processing unit. The arithmetic processing unit is electrically connected to the block register, receives the input image and is configured to perform operations comprising: parameter setting step, dividing step and block deducing step. The parameter setting step is to set an inference parameter set, wherein the inference parameter set comprises convolution depth, block width, block height and multi-layer convolution kernel size. The dividing step divides the input image into a plurality of input block data according to the convolution depth, the block width, the block height and the sizes of the layer convolution kernels, wherein each input block data has an input block size. In addition, the block inference step is to perform a multi-layer convolution operation on each input block data to generate output block data, and the multi-layer convolution operation includes a first direction data selection step, a second direction data selection step, and a convolution operation step. The first direction data selection step is to select a plurality of i-th layer recalculation characteristics along the scanning line feed direction according to the position of the output block data, and then select the i-th layer recalculation input characteristic block data according to the position of the output block data and the i-th layer recalculation characteristics, wherein i is one of a plurality of positive integers from 1 to the convolution depth. The second direction data selecting step selects the i-th layer recycling characteristics along the block scanning direction according to the i-th layer recalculation input characteristic block data, and combines the i-th layer recalculation input characteristic block data and the i-th layer recycling characteristics to generate i-th layer recycling input characteristic block data. The convolution operation step is to select a plurality of i-th layer sub-block input feature groups from the i-th layer repeated utilization input feature block data according to the size of the i-th layer convolution kernel, then carry out convolution operation on each i-th layer sub-block input feature group and a convolution parameter group to generate i-th layer sub-block output features, and combine the i-th layer sub-block output features corresponding to the i-th layer sub-block input feature groups to form i-th layer output feature block data.
Therefore, the block type inference system for optimizing the memory of the convolutional neural network uses calculation modes with different characteristics in different directions, so that the block type inference can still greatly reduce the bandwidth requirement of an external memory on the premise of not increasing excessive calculation amount and a block temporary storage.
Other examples of the foregoing embodiments are as follows: when the aforementioned i is equal to 1, the i-th layer recalculates the input feature block data to be equal to each input block data. And when i is equal to the convolution depth, the i-th layer output characteristic block data is equal to the output block data.
Other examples of the foregoing embodiments are as follows: the i-th layer recalculated input feature block data comprises i-th layer recalculated input feature block size and i-th layer recalculated input feature block channel number, and the i-th layer output feature block data comprises i-th layer output feature block size and i-th layer output feature block channel number. The size of the ith layer output feature block is larger than that of the ith layer recalculated input feature block, and the number of channels of the ith layer recalculated input feature block is equal to that of the ith layer output feature block.
Other examples of the foregoing embodiments are as follows: the block scanning direction is vertical to the scanning line-changing direction, the block width is larger than the block height, and the extending direction of the block height is parallel to the block scanning direction.
Other examples of the foregoing embodiments are as follows: the convolution depth, block width and block height are positive integers, and the i-th layer convolution kernel has a size of kWi×kHiThe i-th layer of recycling features has a recycling feature number along the block scanning direction, and the recycling feature number is equal to kHi-1。
Other examples of the foregoing embodiments are as follows: the block width is represented as BWWith convolution depth denoted D and block height denoted BH. Input block size equal to BW×BH. The output block data has an output block size equal to (B)W-2D)×BH. The i-th layer recalculated input feature block data has an i-th layer recalculated input feature block size equal to (B)W-2i+2)×BH. The ith layer recycling input feature block data has an ith layer recycling input feature block size equal to (B)W-2i+2)×(BH+2). The ith layer output characteristic block data has an ith layer output characteristic block size equal to (B)W-2i)×BH. The convolution depth is less than half the block width.
Other examples of the foregoing embodiments are as follows: when at least one of the input features of an ith sub-block input feature group is located in the outer region of the ith recycling input feature block data, the input features of the ith sub-block input feature group comprise a plurality of outer block features and a plurality of first inner block features. The outer region features represent calculated features, and the first inner region features represent non-calculated features. Furthermore, when the input features of an i-th layer sub-block input feature group are all located in the inner region of the i-th layer recycling input feature block data, the input features of the i-th layer sub-block input feature group only comprise a plurality of second inner block features. The arrangement sequence of the input feature block data along the block scanning direction is repeatedly utilized by the ith layer to form an outer area and an inner area.
Other examples of the foregoing embodiments are as follows: the external block characteristics are stored in a block temporary storage device, the block temporary storage device is provided with a temporary storage space, and the temporary storage space is obtained by recalculating the width, the convolution depth, the number of layers, the number of channels and the size of the convolution kernel of the ith layer of the input characteristic block data through the ith layer. The width of the i-th layer recalculated input feature block data is represented as BWiThe convolution depth is represented as D, the number of layers is represented as i, the number of channels is represented as C, and the size of the convolution kernel of the ith layer is kWi×kHiThe scratch space is denoted LBS and conforms to the following equation:
Figure BDA0002667195500000061
drawings
FIG. 1 is a flow diagram illustrating a block-wise inference method of memory optimization for convolutional neural networks of a first embodiment of the present invention;
FIG. 2 is a schematic diagram showing the segmentation step of FIG. 1;
FIG. 3 is a schematic perspective view of input block data and output block data of a multi-layer convolution operation of the block inference step of FIG. 1;
FIG. 4 is a schematic diagram illustrating a first direction data selection step of FIG. 1;
FIG. 5 is a diagram illustrating a second direction data selection step of FIG. 1;
FIG. 6 is a diagram illustrating the layer 1 reuse of input feature block data of FIG. 3;
FIG. 7 is a schematic diagram showing a second embodiment of the channel shuffle of the present invention;
FIG. 8 is a block diagram illustrating a memory-optimized, segmented inference system for convolutional neural networks of a third embodiment of the present invention;
FIG. 9 is a flowchart showing a multilayer convolution operation with a 3 × 3 filter according to a fourth embodiment of the present invention; and
FIG. 10 is a diagram showing the results of a recalculation, reuse, and recalculation and reuse simulation of the present invention.
Description of reference numerals:
100: block type inference method for memory optimization of convolution neural network
S02: parameter setting step
S04: step of dividing
S06: block inference procedure
S062: first direction data selection step
S064: second direction data selection step
S066: convolution operation step
S08: temporary storage step
110: outputting the image
200: memory optimized block-based inference system for convolutional neural networks
212: inference parameter set
214: convolution parameter set
220: block register
230: arithmetic processing unit
232: convolution engine
BWW1, W2, W3: block width
BHH1, H2, H3: block height
C1: i-th layer reusing input characteristic block channel number
C2: number of channels of i-th layer middle feature block
C3: number of channels of i-th layer output feature block
D: depth of convolution
Dmax: maximum support convolution depth
D1: scanning line feed direction
D2: block scanning direction
FC: recalculation
FU: can be repeatedly used
FCFU: recalculation and reuse
IB: inputting block data
IR: inner area
k-1: reuse of feature quantity
L1: layer 1
L1 FC: layer 1 recalculation feature
L1FC _ I: layer 1 recalculation of input feature block data
L1 FU: layer 1 recycling feature
L1FU _ I: layer 1 reuse of input feature block data
L1_ O: layer 1 output feature block data
L2: layer 2
L2 FC: layer 2 recalculation feature
L2FC _ I: layer 2 recalculation of input feature block data
L2 FU: layer 2 recycling feature
L2FU _ I: layer 2 reuse of input feature block data
L2_ O: layer 2 output feature block data
L3: layer 3
L3 FC: layer 3 recalculation feature
L3FC _ I: layer 3 recalculation of input feature block data
L3 FU: layer 3 recycling feature
L3FU _ I: layer 3 reuse of input feature block data
L3_ O: layer 3 output feature block data
LD: layer D
LiFi _ I: i-th layer reusing input characteristic block data
Li _ M: layer i middle block data
Li _ O: i-th layer output characteristic block data
NTR: normalized throughput rate
OB: outputting block data
OR: outer region
S: block register size limitation
SBG1, SBG11, SBG 12: layer 1 subblock input feature group
SBG 2: layer 2 sub-block input feature group
SBG 3: layer 3 subblock input feature group
Detailed Description
Various embodiments of the present invention will be described below with reference to the accompanying drawings. For the purpose of clarity, numerous implementation details are set forth in the following description. It should be understood, however, that these implementation details are not to be interpreted as limiting the invention. That is, in some embodiments of the invention, these implementation details are not necessary. In addition, some conventional structures and elements are shown in simplified schematic form in the drawings for the sake of simplifying the drawings; and repeated elements will likely be referred to using the same reference numerals.
In addition, when an element (or a unit or a module, etc.) is "connected" to another element, it can mean that the element is directly connected to the other element or that the element is indirectly connected to the other element, i.e., that there is another element between the element and the other element. When an element is explicitly described as being "directly connected" to another element, it is not intended that another element be interposed between the element and the other element. The terms first, second, third and the like are used for describing different elements only, and the elements themselves are not limited, so that the first element can be also called the second element. And the combination of elements/units/circuits herein is not a commonly known, conventional or existing combination in this field, and cannot be easily determined by a person skilled in the art whether the combination is easily accomplished by the person skilled in the art based on whether the elements/units/circuits are existing.
Referring to fig. 1, fig. 1 is a flowchart illustrating a block-based inference method 100 for memory optimization of a convolutional neural network according to a first embodiment of the present invention. The block inference method 100 for memory optimization of convolutional neural network is used for processing an input image to generate an output image, and includes a parameter setting step S02, a segmentation step S04, a block inference step S06, and a temporary storage step S08.
In the parameter setting step S02, an inference parameter set is set, which includes convolution depth (depth), block width, block height, and multi-layer convolution kernel size (kernel size). The number of layers of such layer convolution kernel sizes is equal to the convolution depth.
In the dividing step S04, the driving arithmetic processing unit divides the input image into a plurality of input block data according to the convolution depth, the block width, the block height and the sizes of the layer convolution kernels, wherein each input block data has an input block size.
The block inference step S06 is to drive the arithmetic processing unit to perform a multi-layer convolution operation on each input block data to generate output block data, wherein the multi-layer convolution operation includes a first direction data selection step S062, a second direction data selection step S064, and a convolution operation step S066. In the first direction data selecting step S062, a plurality of i-th layer recalculation features are selected along the scan line feed direction according to the position of the output block data, and then an i-th layer recalculation input feature block data is selected according to the position of the output block data and the i-th layer recalculation features, where i is one of a plurality of positive integers from 1 to the convolution depth. In addition, the second direction data selecting step S064 selects a plurality of i-th layer recycling features along the block scanning direction according to the i-th layer recalculation input feature block data, and combines the i-th layer recalculation input feature block data and the i-th layer recycling features to generate an i-th layer recycling input feature block data. In addition, in the convolution operation step S066, a plurality of i-th layer sub-block input feature groups are selected from the i-th layer repeated use input feature block data according to the i-th layer convolution kernel size, then convolution operation is performed on each i-th layer sub-block input feature group and the convolution parameter group to generate i-th layer sub-block output features, and the i-th layer sub-block output features corresponding to the i-th layer sub-block input feature groups are combined to form i-th layer output feature block data. The convolution parameter set includes a weight parameter (weight parameter) and a bias parameter (bias parameter).
In the temporary storage step S08, a Block buffer bank (Block buffer bank) is used to temporarily store the i-th layer output feature Block data and the i-th layer recycling features.
Therefore, the block inference method 100 for memory optimization of convolutional neural network of the present invention uses different feature calculation modes in different directions, so that the block inference can still greatly reduce the bandwidth requirement of the external memory without increasing too much calculation amount and block register. The details of the above steps will be explained below by means of more detailed examples.
Referring to fig. 1 to 6, fig. 2 is a schematic diagram illustrating the dividing step S04 of fig. 1; fig. 3 is a schematic perspective view illustrating the input tile data IB and the output tile data OB of the multi-layer convolution operation of the tile inference step S06 of fig. 1; fig. 4 is a schematic view illustrating the first direction data selection step S062 of fig. 1; fig. 5 is a schematic view illustrating a second direction data extracting step S064 of fig. 1; and fig. 6 is a diagram illustrating the layer 1 reuse input feature patch data L1FU _ I of fig. 3. As shown, in this embodiment, the first direction data selecting step S062, the second direction data selecting step S064 and the convolution operation step S066 are performed for each layer (i.e., i of the ith layer is 1 to D). Convolution depth D, block width BWAnd a block height BHAre all positive integers. The ith layer convolution kernel size is kWi×kHiWherein k isWi、kHiAre all positive integers. The scan line feed direction D1 is horizontal, and the block scan direction D2 is vertical; in other words, the block scanning direction D2 is perpendicular to the scanning line feed direction D1. Block width BWGreater than block height BHAnd a block height BHIs parallel to the block scanning direction D2. Input block size equal to BW×BH. Output block data OB toolHas an output block size equal to (B)W-2D)×BH. The i-th layer recalculated input feature block data has an i-th layer recalculated input feature block size equal to (B)W-2i+2)×BH. The ith layer recycling input feature block data has an ith layer recycling input feature block size equal to (B)W-2i+2)×(BH+2). The ith layer output characteristic block data has an ith layer output characteristic block size equal to (B)W-2i)×BH. The ith layer output feature block data represents the output features of the ith layer after performing convolution operation, and is used for recalculation of the next layer (i +1 th layer) of the same block. Convolution depth D is less than block width BWHalf of that. Furthermore, the i-th layer recycling feature has a recycling feature number along the block scanning direction D2, and the recycling feature number is equal to kHi-1 (i.e., k-1). The i-th layer reuse characteristic is the reuse of the same layer (i-th layer) for the next block. When i is equal to 1, recalculating the input characteristic block data equal to each input block data IB by the ith layer; when i is equal to the convolution depth D, the i-th layer output feature block data is equal to the output block data OB.
In fig. 3 to 6, the convolution depth D is 3 and the block width B isWIs 10, block height BHAt 4, the i-th layer convolution kernel size is 3 × 3, i.e., kWi=kHiK and are all 3. A convolution depth D of 3 represents a 3-layer convolution operation, so the multi-layer convolution operation includes a 1 st layer convolution operation, a 2 nd layer convolution operation, and a 3 rd layer convolution operation (i.e., i is 1, 2, and 3).
The layer 1 convolution operation (i ═ 1) includes a first direction data extraction step S062, a second direction data extraction step S064, and a convolution operation step S066. The first direction data selecting step S062 selects 6 layer 1 recalculation features L1FC (i.e., (D-i +1) × (k-1)) along the scan line feed direction D1 according to the position of the output block data OB (i.e., the layer 3 output feature block data L3_ O), and then selects the layer 1 recalculation features L1 and L1 recalculation features according to the position of the output block data OB and the layer 1 recalculation featuresThe recalc feature L1FC selects a layer 1 recalculation input feature block data L1FC _ I. The layer 1 recalculated input feature block data L1FC _ I is equal to the input block data IB, the input block size of the input block data IB is equal to the layer 1 recalculated input feature block size of the layer 1 recalculated input feature block data L1FC _ I, and is equal to (B)W-2i+2)×BHAs shown in fig. 3, layer 1L 1, layer 1L 1 and fig. 6, when the value is (10-2+2) × 4 is 10 × 4. Furthermore, the second direction data selecting step S064 selects 2 layer 1 recycling features L1FU along the block scanning direction D2 according to the layer 1 re-calculated input feature block data L1FC _ I, and combines the layer 1 re-calculated input feature block data L1FC _ I and the layer 1 recycling features L1FU to generate layer 1 recycling input feature block data L1FU _ I. The layer 1 reuse input feature block size of the layer 1 reuse input feature block data L1FU _ I is equal to (B)W-2i+2)×(BHThe +2) × (4+2) × (10 × 6 is (10-2+2) × (10 × 6), as shown in the 1 st layer L1 in fig. 3, the 1 st layer L1 in fig. 5, and fig. 6. In addition, the convolution operation step S066 is to select a plurality of layer 1 sub-block input feature groups SBG1 (i.e., 3 × 3 features) from the layer 1 repeated utilization input feature block data L1FU _ I according to the I-th layer convolution kernel size (i.e., 3 × 3) and then perform convolution operation on each layer 1 sub-block input feature group SBG1 and the convolution parameter set to generate layer 1 sub-block output features, and combine the layer 1 sub-block output features corresponding to the layer 1 sub-block input feature groups SBG1 to form layer 1 output feature block data L1_ O. The layer 1 output feature chunk size of the layer 1 output feature chunk data L1_ O is equal to (B)W-2i)×BHAs shown in fig. 3 and fig. 5, layer 1L 1 is (10-2) × 4 is 8 × 4.
The layer 2 convolution operation (i ═ 2) includes a first direction data extraction step S062, a second direction data extraction step S064, and a convolution operation step S066. The first direction data selecting step S062 selects 4 layer 2 recalculation features L2FC (i.e., (D-i +1) × (k-1)) along the scan line direction D1 according to the position of the output block data OB (i.e., the layer 3 output feature block data L3_ O), and then re-selects the layer 2 recalculation features according to the position of the output block data OB and the layer 2 recalculation featuresThe computation feature L2FC selects a layer 2 recomputed input feature block data L2FC _ I. The layer 2 re-computed input feature chunk data L2FC _ I is equal to the layer 1 output feature chunk data L1_ O. Layer 2 recalculated input feature chunk size for layer 2 recalculated input feature chunk data L2FC _ I equal to (B)W-2i+2)×BHAs shown in fig. 3 and fig. 4, layer 2L 2, is (10-4+2) × 4 is 8 × 4. Furthermore, the second direction data selecting step S064 selects 2 layer 2 recycling features L2FU along the block scanning direction D2 according to the layer 2 re-calculated input feature block data L2FC _ I, and combines the layer 2 re-calculated input feature block data L2FC _ I and the layer 2 recycling features L2FU to generate a layer 2 recycling input feature block data L2FU _ I. The layer 2 recycling input feature block size of the layer 2 recycling input feature block data L2FU _ I is equal to (B)W-2i+2)×(BHThe +2) × (4+2) × (10-4+ 6) × (4+2) × 8 × 6, as shown in fig. 3 and the 2 nd layer L2 of fig. 5. In addition, the convolution operation step S066 selects a plurality of layer 2 sub-block input feature groups SBG2 (i.e., 3 × 3 features) from the layer 2 repeated utilization input feature block data L2FU _ I according to the I-th layer convolution kernel size (i.e., 3 × 3) and then performs convolution operation on each layer 2 sub-block input feature group SBG2 and the convolution parameter set to generate layer 2 sub-block output features, and combines the layer 2 sub-block output features corresponding to the layer 2 sub-block input feature groups SBG2 to form layer 2 output feature block data L2_ O. The layer 2 output feature chunk size of the layer 2 output feature chunk data L2_ O is equal to (B)W-2i)×BHAs shown in fig. 3 and fig. 5, layer 2L 2 is (10-4) × 4 is 6 × 4.
The layer 3 convolution operation (i ═ 3) includes a first direction data extraction step S062, a second direction data extraction step S064, and a convolution operation step S066. In the first direction data selecting step S062, 2 layer 3 recalculation features L3FC (i.e., (D-I +1) × (k-1)) are selected along the scan line direction D1 according to the position of the output tile data OB (i.e., the layer 3 output feature tile data L3_ O), and then a layer 3 recalculation input feature tile data L3FC _ I is selected according to the position of the output tile data OB and the layer 3 recalculation features L3 FC. Layer 3 weightThe newly calculated input feature chunk data L3FC _ I is equal to the layer 2 output feature chunk data L2_ O. Layer 3 recalculated input feature chunk size for layer 3 recalculated input feature chunk data L3FC _ I equal to (B)W-2i+2)×BHAs shown in fig. 3 and fig. 4, layer 3L 3 is (10-6+2) × 4 is 6 × 4. Furthermore, the second direction data selecting step S064 selects 2 layer 3 recycling features L3FU along the block scanning direction D2 according to the layer 3 re-calculated input feature block data L3FC _ I, and combines the layer 3 re-calculated input feature block data L3FC _ I and the layer 3 recycling features L3FU to generate layer 3 recycling input feature block data L3FU _ I. The layer 3 recycling input feature block size of the layer 3 recycling input feature block data L3FU _ I is equal to (B)W-2i+2)×(BHThe +2) × (4+2) × (10-6+2) × (6 × 6 is shown in fig. 3 and layer 3L 3 of fig. 5. In addition, the convolution operation step S066 is to select a plurality of layer 3 sub-block input feature groups SBG3 (i.e., 3 × 3 features) from the layer 3 recycled input feature block data L3FU _ I according to the I-th layer convolution kernel size (i.e., 3 × 3) and then perform convolution operation on each layer 3 sub-block input feature group SBG3 and the convolution parameter set to generate layer 3 sub-block output features, and combine the layer 3 sub-block output features corresponding to the layer 3 sub-block input feature groups SBG3 to form layer 3 output feature block data L3_ O. The layer 3 output characteristic tile data L3_ O is equal to the output tile data OB. The layer 3 output feature chunk size of the layer 3 output feature chunk data L3_ O is equal to (B)W-2i)×BHThe output block size of the output block data OB is equal to (B) 4 × 4 (10-6) × 4W-2D)×BHAs shown in fig. 3 and fig. 5, layer 3L 3 is (10-6) × 4 ═ 4 × 4.
In the memory-optimized block-wise inference method 100 of convolutional neural network of the present invention, when at least one of the input features of an i-th layer of the input feature cluster of the sub-block is located in the outer region of the i-th layer of the reusable input feature block data, the input features of the i-th layer of the input feature cluster of the sub-block include a plurality of outer block features and a plurality of first inner block features. The outer region feature represents a feature that was operated on by the previous region, and the first inner region feature represents a feature that was not operated on by the current region. In addition, when the input features of an i-th sub-block input feature group are all located in the inner region of the i-th recycled input feature block data, the input features of the i-th sub-block input feature group only comprise a plurality of second inner block features, and the second inner block features represent the non-operated features of the current block. The i-th layer reuses the input feature block data into an outer region and an inner region in the arrangement order along the block scanning direction D2. For example, referring to fig. 6, when 6 of the 9 input features of the layer 1 sub-tile input feature group SBG11 are located in the outer region OR of the layer 1 recycling input feature block data L1FU _ I, the 9 input features of the layer 1 sub-tile input feature group SBG11 include 6 outer tile features and 3 inner tile features. The outer region features represent computed features and are located in the outer region OR, while the inner region features represent un-computed features and are located in the inner region IR. In addition, when the 9 input features of the layer 1 sub-block input feature group SBG12 are all located in the inner region IR of the layer 1 recycling input feature block data L1FU _ I, the 9 input features of the layer 1 sub-block input feature group SBG12 only include 9 inner block features, that is, the 9 input features are all inner block features. In addition, the layer 1 recycling input feature patch data L1FU _ I are arranged in the order of the outer region OR and the inner region IR along the patch scanning direction D2.
It should be noted that in the temporary storage step S08, the lowest k of the I-th layer LiFC _ IHi-1 column into the block register for the next block to become the next block of LiFi. For example, after the layer 1 convolution operation of the block inference step S06 is performed, a temporary storage step S08 is performed, in which the layer 1 recalculates the bottom k of the input feature block data L1FC _ IHi-1 column is stored in the block register for the next block, i.e. layer 1 recycling feature L1FU, which becomes the next block. After the layer 2 convolution operation of the block inference step S06 is performed, a temporary storage step S08 is performed, which recalculates the bottom k of the input feature block data L2FC _ I for the layer 2HiThe-1 column is stored in the block register for the next block, i.e., the layer 2 recycling feature L2FU that becomes the next block. When block pushingAfter the convolution operation of layer 3 in step S06 is performed, a temporary storage step S08 is performed to recalculate the lowest k of the input feature block data L3FC _ I for layer 3Hi-1 column into the block register for the next block, i.e. layer 3 recycling feature L3FU, to become the next block. Therefore, the calculation amount can be greatly reduced.
Referring to fig. 1 to 7 together, fig. 7 is a schematic diagram illustrating a channel shuffle (shuffle) according to a second embodiment of the present invention. The inference process of the present invention is applicable to the channel shuffling operation, and the I-th reuse input feature block data LiFU _ I has an I-th reuse input feature block size W1 × H1 and an I-th reuse input feature block channel number C1. The i-th inter block data Li _ M has an i-th inter feature block size W2 XH 2 and an i-th inter feature block channel number C2. The i-th layer output feature block data Li _ O has an i-th layer output feature block size W3 XH 3 and an i-th layer output feature block channel number C3. The i-th layer output feature block size W3 × H3 is larger than the i-th layer reuse input feature block size W1 × H1, and the i-th layer reuse input feature block size W1 × H1 is larger than the i-th layer intermediate feature block size W2 × H2. Where W1, W2, and W3 are block widths, and H1, H2, and H3 are block heights. In addition, the number of i-th-layer recycled input feature block channels C1 is equal to the number of i-th-layer output feature block channels C3, and the number of i-th-layer intermediate feature block channels C2 is greater than the number of i-th-layer recycled input feature block channels C1. For example, the i-th reuse input feature block size W1 × H1, the i-th inter feature block size W2 × H2, and the i-th output feature block size W3 × H3 may be 10 × 10, 8 × 8, and 16 × 16, respectively, and the i-th reuse input feature block channel number C1, the i-th inter feature block channel number C2, and the i-th output feature block channel number C3 may be 32, 128, and 32, respectively, but the invention is not limited thereto.
Therefore, the invention can realize specific multilayer convolution operation, when block type inference is carried out, the calculated characteristics are repeatedly utilized in the forward direction of the block (namely the block scanning direction D2), and a recalculation mode is adopted in the other direction (namely the scanning line feed direction D1), so that the block type inference can still greatly reduce the bandwidth requirement of an external memory on the premise of not increasing excessive calculation amount and the block register.
Referring to fig. 1, fig. 2, fig. 8 and fig. 9, wherein fig. 8 is a block diagram illustrating a block-based inference system 200 for memory optimization of a convolutional neural network according to a third embodiment of the present invention; and FIG. 9 is a flowchart illustrating a multilayer convolution operation with a 3 × 3 filter according to a third embodiment of the present invention. As shown, the memory-optimized block-based inference system 200 of convolutional neural network is used for processing an input image to generate an output image 110, and includes a block register 220 and an operation processing unit 230. The input block data IB, the inference parameter set 212 and the convolution parameter set 214 are input to the arithmetic processing unit 230, and the output block data OB is output to form the output image 110. The block register 220 is used to access the ith layer output feature block data and the ith layer reuse features, and the two types of temporary storage are temporary storage using different locations in the block register 220. In addition, the arithmetic processing unit 230 is electrically connected to the block register 220, and the arithmetic processing unit 230 receives the input image and is configured to implement the block-based inference method 100 for memory optimization of the convolutional neural network of fig. 1. The arithmetic processing unit 230 includes a Convolution Engine 232(Convolution Engine), and the Convolution Engine 232 is used for performing Convolution operation. The arithmetic processing unit 230 can be a microprocessor, a central processing unit, or an image processor, but the invention is not limited thereto. L1, L2, and LD represent the 1 st, 2 nd, and D th layers, respectively, and the 1 st, L1 through the D th layers LD are all operated by the convolution engine 232 of the operation processing unit 230. In addition, the block register 220 can store the outer block feature, and the block register 220 has a temporary storage space, which can recalculate the width B of the input feature block data through the ith layerWiConvolution depth D, layer number i, channel number C and i-th layer convolution kernel size kWi×kHiAnd (6) calculating to obtain. The staging space is denoted lbs (line Buffer size) and conforms to the following formula (1):
Figure BDA0002667195500000151
for example, if the first direction data selection step S062, the second direction data selection step S064, and the convolution operation step S066 are performed for each layer (i.e., i of the ith layer is 1 to D), and k isWi=kHiK and equal to 3, the temporary storage space follows the following equation (2):
Figure BDA0002667195500000161
therefore, the block type inference system 200 for memory optimization of convolutional neural network of the present invention uses different feature calculation modes in different directions, so that the block type inference can still greatly reduce the bandwidth requirement of the external memory on the input block data IB and the output block data OB without increasing too much calculation amount and the block register 220.
Referring to fig. 1 and 10 together, fig. 10 is a schematic diagram showing the comparison results of the recalculation (FC), the recycling (FU) and the recalculation and recycling (FCFU) of the present invention. The parameter setting condition is that the product value A is set to 642The size of the output image 110 is 960 × 540, kWi=kHiK. The product value A is the block width BWAnd a block height BHMinimum value of multiplied values. The multi-layer convolution operation of the present invention has a Normalized Throughput Rate (NTR) obtained by the convolution depth D and the Normalized Computation Rate (NCR) calculated by the block width BWBlock height BHCalculating the convolution depth D and the variable h. The normalized throughput rate NTR and the normalized calculation rate NCR for the present invention conform to the following equations (3) and (4), respectively:
Figure BDA0002667195500000162
Figure BDA0002667195500000163
as can be seen from FIG. 10, if there is a block register size limit S for the block register 220, the maximum supported convolution depth D that the FU can support is reusedmaxThe shallowest of the three; in contrast, recalculating FC can support a wide range of model convolution depths, but requires a high computational complexity, resulting in a significant reduction in the normalized throughput NTR. The recalculated and recycled FCFU of the present invention not only supports a wider range of model convolution depths than recycled FUs, but also provides a better normalized throughput NTR than recalculated FCs.
As can be seen from the above embodiments, the present invention has the following advantages: first, the block inference method for memory optimization of convolutional neural network of the present invention uses different feature calculation modes in different directions, so that the block inference can still greatly reduce the bandwidth requirement of the external memory without increasing too much calculation amount and block register. Secondly, the block type inference system for memory optimization of the convolutional neural network of the present invention uses the calculation mode with different characteristics in different directions, so that the block type inference can still greatly reduce the bandwidth requirement of the external memory on the premise of not increasing too much calculation amount and block temporary storage. Thirdly, the recalculation and reuse of the present invention not only supports a wider range of model convolution depths than reuse, but also provides a better normalized throughput rate for recalculation.
Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims (16)

1. A memory optimized block-wise inference method of a convolutional neural network for processing an input image, the memory optimized block-wise inference method of the convolutional neural network comprising the steps of:
a parameter setting step, setting an inference parameter group, wherein the inference parameter group comprises a convolution depth, a block width, a block height and a multilayer convolution kernel size;
a dividing step of driving an arithmetic processing unit to divide the input image into a plurality of input block data according to the convolution depth, the block width, the block height and the size of the multi-layer convolution kernel, wherein each input block data has an input block size;
a block inference step of driving the arithmetic processing unit to perform a multi-layer convolution operation on each input block data to generate an output block data, wherein the multi-layer convolution operation includes:
a first direction data selection step, selecting a plurality of i-th layer recalculation characteristics along a scanning line-feed direction according to a position of the output block data, and then selecting an i-th layer recalculation input characteristic block data according to the position of the output block data and the i-th layer recalculation characteristics, wherein i is one of a plurality of positive integers from 1 to the convolution depth;
a second direction data selection step of selecting a plurality of i-th layer reuse characteristics along a block scanning direction according to the i-th layer recalculated input characteristic block data, and combining the i-th layer recalculated input characteristic block data and the i-th layer reuse characteristics to generate i-th layer reuse input characteristic block data; and
a convolution operation step, selecting a plurality of i-th layer sub-block input feature groups from the i-th layer repeated utilization input feature block data according to the size of an i-th layer convolution kernel, then executing convolution operation on each i-th layer sub-block input feature group and a convolution parameter group to generate an i-th layer sub-block output feature, and combining the i-th layer sub-block output features corresponding to the i-th layer sub-block input feature groups to form i-th layer output feature block data; and
a temporary storage step, driving a block temporary storage to temporarily store the ith layer output characteristic block data and the ith layer recycling characteristics.
2. The memory-optimized, block-wise inference method of convolutional neural networks of claim 1,
when i is equal to 1, recalculating input feature block data equal to each input block data by the i-th layer; and
when i is equal to the convolution depth, the i-th layer output feature block data is equal to the output block data.
3. The method of claim 1, wherein the i-th layer recalculated input feature block data has an i-th layer recalculated input feature block size and an i-th layer recalculated input feature block channel number, the i-th layer output feature block data has an i-th layer output feature block size and an i-th layer output feature block channel number, the i-th layer output feature block size is larger than the i-th layer recalculated input feature block size, and the i-th layer recalculated input feature block channel number is equal to the i-th layer output feature block channel number.
4. The method of claim 1, wherein the block scan direction is perpendicular to the scan column direction, the block width is greater than the block height, and an extension direction of the block height is parallel to the block scan direction.
5. The method of claim 1, wherein the convolution depth, the block width, and the block height are positive integers, and the ith layer of convolution kernel has a size kWi×kHiThe i-th layer recycling features have a recycling feature number along the block scanning direction, and the recycling feature number is equal to kHi-1。
6. The convolution spirit of claim 1A method of block-wise inference via network memory optimization, characterized in that the block width is denoted BWThe convolution depth is denoted as D and the block height is denoted as BH
The input block size is equal to BW×BH
The output block data has an output block size equal to (B)W-2D)×BH
The i-th layer recalculated input feature block data has an i-th layer recalculated input feature block size equal to (B)W-2i+2)×BH
The ith layer recycling input feature block data has an ith layer recycling input feature block size equal to (B)W-2i+2)×(BH+2);
The ith layer output feature block data has an ith layer output feature block size equal to (B)W-2i)×BH(ii) a And
the convolution depth is less than half the block width.
7. The memory-optimized, block-wise inference method of convolutional neural networks of claim 1,
when at least one of the input features of the i-th layer sub-block input feature group is located in an outer region of the i-th layer recycling input feature block data, the input features of the i-th layer sub-block input feature group comprise a plurality of outer block features and a plurality of first inner block features, the outer block features represent operated features, and the first inner block features represent non-operated features;
when the input features of one of the i-th layer sub-block input feature groups are all located in an inner region of the i-th layer recycling input feature block data, the input features of the i-th layer sub-block input feature group only comprise a plurality of second inner region block features; and
the ith layer reuses input feature block data in the arrangement sequence of the outer area and the inner area along the block scanning direction.
8. The method of claim 7, wherein the plurality of outer block features are stored in a block register, the block register has a temporary storage space obtained by the i-th layer recalculation input feature block data width, the convolution depth, a layer number, a channel number and the i-th layer convolution kernel size, the i-th layer recalculation input feature block data width is denoted as BWiThe convolution depth is denoted as D, the layer number is denoted as i, the channel number is denoted as C, and the i-th layer convolution kernel size is kWi×kHiThe scratch space is denoted LBS and conforms to the following equation:
Figure FDA0002667195490000031
9. a memory-optimized block-based inference system for convolutional neural networks for processing an input image, the memory-optimized block-based inference system comprising:
a block register for accessing an ith output feature block data and a plurality of ith reuse features; and
an arithmetic processing unit electrically connected to the block register, the arithmetic processing unit receiving the input image and configured to perform operations comprising:
a parameter setting step, setting an inference parameter group, wherein the inference parameter group comprises a convolution depth, a block width, a block height and a multilayer convolution kernel size;
a dividing step, dividing the input image into a plurality of input block data according to the convolution depth, the block width, the block height and the size of the multilayer convolution kernel, wherein each input block data has an input block size; and
a block inference step of performing a multi-level convolution operation on each of the input block data to generate an output block data, wherein the multi-level convolution operation comprises:
a first direction data selection step, selecting a plurality of i-th layer recalculation characteristics along a scanning line-feed direction according to a position of the output block data, and then selecting an i-th layer recalculation input characteristic block data according to the position of the output block data and the i-th layer recalculation characteristics, wherein i is one of a plurality of positive integers from 1 to the convolution depth;
a second direction data selection step of selecting the i-th layer reuse characteristics along a block scanning direction according to the i-th layer recalculated input characteristic block data, and combining the i-th layer recalculated input characteristic block data and the i-th layer reuse characteristics to generate i-th layer reuse input characteristic block data; and
a convolution operation step, selecting a plurality of i-th layer sub-block input feature groups from the i-th layer repeated utilization input feature block data according to the size of an i-th layer convolution kernel, then executing convolution operation on each i-th layer sub-block input feature group and a convolution parameter group to generate an i-th layer sub-block output feature, and combining the i-th layer sub-block output features corresponding to the i-th layer sub-block input feature groups to form the i-th layer output feature block data.
10. The memory-optimized, tiled inference system of convolutional neural networks of claim 9,
when i is equal to 1, recalculating input feature block data equal to each input block data by the i-th layer; and
when i is equal to the convolution depth, the i-th layer output feature block data is equal to the output block data.
11. The system of claim 9, wherein the i-th layer recalculated input feature block data has an i-th layer recalculated input feature block size and an i-th layer recalculated input feature block channel number, the i-th layer output feature block data has an i-th layer output feature block size and an i-th layer output feature block channel number, the i-th layer output feature block size is larger than the i-th layer recalculated input feature block size, and the i-th layer recalculated input feature block channel number is equal to the i-th layer output feature block channel number.
12. The system of claim 9, wherein the block scan direction is perpendicular to the scan column direction, the block width is greater than the block height, and an extension direction of the block height is parallel to the block scan direction.
13. The system of claim 9, wherein the convolution depth, the block width, and the block height are positive integers, and the ith layer convolution kernel size is kWi×kHiThe i-th layer recycling features have a recycling feature number along the block scanning direction, and the recycling feature number is equal to kHi-1。
14. The convolutional neural network memory-optimized tile-wise inference system of claim 9, wherein the tile width is represented as BWThe convolution depth is denoted as D and the block height is denoted as BH
The input block size is equal to BW×BH
The output block data has an output block size equal to (B)W-2D)×BH
The i-th layer recalculated input feature block data has a firstThe i-th layer recalculates the input feature block size, and the i-th layer recalculates the input feature block size equal to (B)W-2i+2)×BH
The ith layer recycling input feature block data has an ith layer recycling input feature block size equal to (B)W-2i+2)×(BH+2);
The ith layer output feature block data has an ith layer output feature block size equal to (B)W-2i)×BH(ii) a And
the convolution depth is less than half the block width.
15. The memory-optimized, tiled inference system of convolutional neural networks of claim 9,
when at least one of the input features of the i-th layer sub-block input feature group is located in an outer region of the i-th layer recycling input feature block data, the input features of the i-th layer sub-block input feature group comprise a plurality of outer block features and a plurality of first inner block features, the outer block features represent operated features, and the first inner block features represent non-operated features;
when the input features of one of the i-th layer sub-block input feature groups are all located in an inner region of the i-th layer recycling input feature block data, the input features of the i-th layer sub-block input feature group only comprise a plurality of second inner region block features; and
the ith layer reuses input feature block data in the arrangement sequence of the outer area and the inner area along the block scanning direction.
16. The system of claim 15, wherein the plurality of outer block features are stored in a block register having a scratch space that passes through the ith layer of redundancyNewly calculating a width, the convolution depth, a layer number, a channel number and the i-th layer convolution kernel size of the input feature block data, wherein the width of the i-th layer newly calculated input feature block data is represented as BWiThe convolution depth is denoted as D, the layer number is denoted as i, the channel number is denoted as C, and the i-th layer convolution kernel size is kWi×kHiThe scratch space is denoted LBS and conforms to the following equation:
Figure FDA0002667195490000061
CN202010922472.8A 2019-10-08 2020-09-04 Block type inference method and system for memory optimization of convolution neural network Pending CN112633462A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962912630P 2019-10-08 2019-10-08
US62/912,630 2019-10-08

Publications (1)

Publication Number Publication Date
CN112633462A true CN112633462A (en) 2021-04-09

Family

ID=75300104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010922472.8A Pending CN112633462A (en) 2019-10-08 2020-09-04 Block type inference method and system for memory optimization of convolution neural network

Country Status (2)

Country Link
CN (1) CN112633462A (en)
TW (1) TWI765336B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118389B (en) * 2022-01-28 2022-05-10 深圳鲲云信息科技有限公司 Neural network data processing method, device and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017015649A1 (en) * 2015-07-23 2017-01-26 Mireplica Technology, Llc Performance enhancement for two-dimensional array processor
CN106779146A (en) * 2016-11-15 2017-05-31 广州铁路职业技术学院 A kind of tourism service system for providing recommendation tourism route
CN107437110A (en) * 2017-07-11 2017-12-05 中国科学院自动化研究所 The piecemeal convolution optimization method and device of convolutional neural networks
US20180096249A1 (en) * 2016-10-04 2018-04-05 Electronics And Telecommunications Research Institute Convolutional neural network system using adaptive pruning and weight sharing and operation method thereof
US20180095632A1 (en) * 2016-10-04 2018-04-05 Sas Institute Inc. Interactive visualizations of a convolutional neural network
US20180131946A1 (en) * 2016-11-07 2018-05-10 Electronics And Telecommunications Research Institute Convolution neural network system and method for compressing synapse data of convolution neural network
KR101847874B1 (en) * 2017-06-28 2018-05-25 서경대학교 산학협력단 Image recognition method using convolution neural network and recording medium thereof
CN108415881A (en) * 2017-02-10 2018-08-17 耐能股份有限公司 The arithmetic unit and method of convolutional neural networks
WO2019015144A1 (en) * 2017-07-21 2019-01-24 北京市商汤科技开发有限公司 Image processing method and system, storage medium, and computing device
US20190147319A1 (en) * 2017-11-14 2019-05-16 Samsung Electronics Co., Ltd. Device and method for processing convolution operation using kernel
US20190147332A1 (en) * 2017-11-14 2019-05-16 Advanced Micro Devices, Inc. Memory bandwidth reduction techniques for low power convolutional neural network inference applications
US20190188240A1 (en) * 2017-12-18 2019-06-20 International Business Machines Corporation Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10083395B2 (en) * 2015-05-21 2018-09-25 Google Llc Batch processing in a neural network processor
US10878273B2 (en) * 2017-07-06 2020-12-29 Texas Instruments Incorporated Dynamic quantization for deep neural network inference system and method
CN110135553B (en) * 2018-02-09 2021-09-03 宏达国际电子股份有限公司 Convolutional neural network adjusting method and electronic device
CN110175636A (en) * 2019-05-08 2019-08-27 深圳欧翼思特科技有限公司 A kind of Internet of Things deep neural network distribution differentiation inference system and method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017015649A1 (en) * 2015-07-23 2017-01-26 Mireplica Technology, Llc Performance enhancement for two-dimensional array processor
US20180096249A1 (en) * 2016-10-04 2018-04-05 Electronics And Telecommunications Research Institute Convolutional neural network system using adaptive pruning and weight sharing and operation method thereof
US20180095632A1 (en) * 2016-10-04 2018-04-05 Sas Institute Inc. Interactive visualizations of a convolutional neural network
US20180131946A1 (en) * 2016-11-07 2018-05-10 Electronics And Telecommunications Research Institute Convolution neural network system and method for compressing synapse data of convolution neural network
CN106779146A (en) * 2016-11-15 2017-05-31 广州铁路职业技术学院 A kind of tourism service system for providing recommendation tourism route
CN108415881A (en) * 2017-02-10 2018-08-17 耐能股份有限公司 The arithmetic unit and method of convolutional neural networks
KR101847874B1 (en) * 2017-06-28 2018-05-25 서경대학교 산학협력단 Image recognition method using convolution neural network and recording medium thereof
CN107437110A (en) * 2017-07-11 2017-12-05 中国科学院自动化研究所 The piecemeal convolution optimization method and device of convolutional neural networks
WO2019015144A1 (en) * 2017-07-21 2019-01-24 北京市商汤科技开发有限公司 Image processing method and system, storage medium, and computing device
US20190147319A1 (en) * 2017-11-14 2019-05-16 Samsung Electronics Co., Ltd. Device and method for processing convolution operation using kernel
US20190147332A1 (en) * 2017-11-14 2019-05-16 Advanced Micro Devices, Inc. Memory bandwidth reduction techniques for low power convolutional neural network inference applications
US20190188240A1 (en) * 2017-12-18 2019-06-20 International Business Machines Corporation Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CHEN PENG 等: "Finger vein recognition based on scattering convolution network", JOURNAL OF ZHEJIANG UNIVERSITY OF TECHNOLOGY, vol. 46, no. 1, 5 February 2018 (2018-02-05), pages 56 - 60 *
G. LI 等: "Block Convolution: Toward Memory-Efficient Inference of Large-Scale CNNs on FPGA", 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 23 April 2018 (2018-04-23), pages 1163 - 1166 *
XIAOCONG LIAN 等: "High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic", IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, vol. 27, no. 8, 16 May 2019 (2019-05-16), pages 1874, XP011736689, DOI: 10.1109/TVLSI.2019.2913958 *
YAN, JL 等: "GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration", IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, vol. 37, no. 11, 1 November 2018 (2018-11-01), pages 2519 - 2529, XP011692601, DOI: 10.1109/TCAD.2018.2857258 *
夏松: "Hopfield神经网络构造的联想记忆存储器实现与研究", 中国优秀硕士学位论文全文数据库 (信息科技辑), no. 2010, 15 October 2010 (2010-10-15), pages 140 - 54 *
陆志坚: "基于FPGA的卷积神经网络并行结构研究", 中国博士学位论文全文数据库 (信息科技辑), no. 2014, 15 April 2014 (2014-04-15), pages 140 - 12 *

Also Published As

Publication number Publication date
TW202115624A (en) 2021-04-16
TWI765336B (en) 2022-05-21

Similar Documents

Publication Publication Date Title
US20180268234A1 (en) Object Detection And Recognition Apparatus Based On CNN Based Integrated Circuits
US10402628B2 (en) Image classification systems based on CNN based IC and light-weight classifier
US20180189595A1 (en) Implementation Of MobileNet In A CNN Based Digital Integrated Circuit
US10339445B2 (en) Implementation of ResNet in a CNN based digital integrated circuit
US20190087725A1 (en) Approximating Fully-Connected Layers With Multiple Arrays Of 3x3 Convolutional Filter Kernels In A CNN Based Integrated Circuit
US9305329B2 (en) Low memory content aware fill
US20180157940A1 (en) Convolution Layers Used Directly For Feature Extraction With A CNN Based Integrated Circuit
US5845017A (en) Digital image processing method for degraining of film images using distance weighted averaging of target pixel code values
US10387772B1 (en) Ensemble learning based image classification systems
JP6927320B2 (en) Inference device, convolution operation execution method and program
US11526723B2 (en) Apparatus and methods of obtaining multi-scale feature vector using CNN based integrated circuits
US11694069B2 (en) Methods for processing data in an efficient convolutional engine with partitioned columns of convolver units
WO2022206556A1 (en) Matrix operation method and apparatus for image data, device, and storage medium
JP2007266699A (en) Image data conversion processing apparatus and method
CN112633462A (en) Block type inference method and system for memory optimization of convolution neural network
US20190318226A1 (en) Deep Learning Image Processing Systems Using Modularly Connected CNN Based Integrated Circuits
US20210103793A1 (en) Block-based inference method for memory-efficient convolutional neural network implementation and system thereof
CN110991627A (en) Information processing apparatus, information processing method, and computer program
Vaksman et al. Patch ordering as a regularization for inverse problems in image processing
US20190347812A1 (en) Apparatus and method for image-distance transformation using bi-directional scans
CN107111878B (en) Data processing method, apparatus and system
KR102453370B1 (en) Method and Apparatus for High-Speed Low-Power Processing in Large-Scale Deep Neural Network
CN113657587B (en) Deformable convolution acceleration method and device based on FPGA
CN111832585B (en) Image processing method and device
CN114662647A (en) Processing data for layers of a neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination