CN115238877A - Data mapping method and data mapping device for improving utilization rate of storage array - Google Patents

Data mapping method and data mapping device for improving utilization rate of storage array Download PDF

Info

Publication number
CN115238877A
CN115238877A CN202210875598.3A CN202210875598A CN115238877A CN 115238877 A CN115238877 A CN 115238877A CN 202210875598 A CN202210875598 A CN 202210875598A CN 115238877 A CN115238877 A CN 115238877A
Authority
CN
China
Prior art keywords
array
unit
unit array
convolution kernel
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210875598.3A
Other languages
Chinese (zh)
Inventor
梁峰
王尚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202210875598.3A priority Critical patent/CN115238877A/en
Publication of CN115238877A publication Critical patent/CN115238877A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Biophysics (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a data mapping method and a data mapping device for improving the utilization rate of a storage array, and relates to the technical field of chips. The method comprises the following steps: and splitting and mapping a k multiplied by k convolution kernel into k unit arrays, mapping a 1 multiplied by k multiplied by C vector in each unit array, mapping the unit array into the next unit array if the number of rows of the unit array is not enough, and determining that the storage and computation array uses a first data mapping method or a second data mapping method according to the relation between the convolution kernel size and the storage and computation array size. The invention effectively reduces the pressure of the input buffer and reduces the data transmission quantity, thereby reducing the time delay and the power consumption of the chip. In addition, according to the size of the convolution kernel, the speed/area can be increased to three times at most by applying the method provided by the invention, idle rows and columns in each array are effectively utilized, and the utilization rate and the calculation speed of the storage and calculation arrays are increased.

Description

Data mapping method and data mapping device for improving utilization rate of storage array
Technical Field
The invention relates to the technical field of chips, in particular to a data mapping method and a data mapping device for improving the utilization rate of a storage and computation array.
Background
The hierarchical structure of the units responsible for calculation in the calculation part of the calculation chip or the chip from top to bottom is as follows: tile, processing Element (Processing Element), array (XBar), several rows and columns in the array, one computing Element (bitcell) at each intersection of the rows and columns, each computing Element being capable of performing a computation once per cycle.
When the storage structure chip is used for calculating the neural network, the parameter matrix of the neural network is mapped onto the array according to a specific rule. One of the main purposes of weight mapping is to achieve the highest possible data reuse rate, and since each number in the input matrix needs to perform an average 9 times of multiply-add operations in a common 3 × 3 convolution kernel, the pressure of an input buffer is increased, and the data transmission amount is large, thereby causing the increase of chip delay and power consumption. In addition, when a layer of the neural network is accelerated multiply, more processing units and arrays are often used for performing accelerated computation during pipelining, and idle rows and columns inside each array cannot be effectively utilized, and particularly when the output channel of the convolution kernel is small, the problem is more obvious. How to effectively utilize idle rows and columns in each array and improve the utilization rate and the calculation speed of the storage and calculation array is an urgent problem to be solved.
Disclosure of Invention
In view of the above problems, the present invention has been made to provide a data mapping method and a data mapping apparatus that overcome or at least partially solve the above problems and improve the utilization rate of a computational array.
A first aspect of an embodiment of the present invention provides a data mapping method for increasing a utilization rate of a storage array, where the storage array includes: a plurality of cell arrays, the data mapping method comprising:
splitting and mapping a k multiplied by k convolution kernel into k unit arrays, mapping a 1 multiplied by k multiplied by C vector in each unit array, and mapping the unit arrays into the next unit array if the number of rows of the unit arrays is not enough, wherein k multiplied by k represents the height multiplied by the width of the convolution kernel, and C represents the number of input channels of the convolution kernel;
determining whether the storage array uses a first data mapping method or a second data mapping method according to the relation between the size of the convolution kernel and the size of the storage array;
taking k as an example, the first data mapping method includes:
mapping a first row of a 3 × 3 convolution kernel to any unit array, wherein elements in a first column in the unit array correspond to an arrangement mode of the first row in the 3 × 3 convolution kernel after being expanded into a column vector;
respectively mapping the same weights as those in the first column of the unit array in the second column and the third column of the unit array, wherein the mapping position corresponding to the second column of the unit array is vacant downwards by C rows, and the mapping position corresponding to the third column of the unit array is vacant downwards by 2C rows;
expanding the input elements in the input buffer to a 1 × 5 × C column vector;
in the current period, the input buffer outputs 5 elements in the input matrix to the unit array and carries out convolution operation;
when the current period is finished, obtaining a complete output element at the tail end of each row in the unit array, and then performing convolution operation of the next period;
taking k as an example, the second data mapping method includes:
mapping parameters on each input channel of a 3 × 3 convolution kernel into different unit arrays in the storage and computation array respectively, and mapping the same weight on three columns in each unit array respectively without empty rows, so that each input channel occupies ceil (C/n) unit arrays, ceil (C/n) × 3 arrays are occupied in the dimension of rows in total, and ceil (M × 3/n) arrays are occupied in the dimension of columns, wherein ceil is an rounding-up function, and n represents the size of the storage and computation array;
in the current period, the input buffer outputs 3 elements in the input matrix to three unit arrays in the storage and calculation array and carries out convolution operation to obtain partial results;
and adding partial results obtained in each period in groups to obtain a convolution result.
Optionally, the size of the convolution kernel is k × k × C × M, and the size of the storage array is n × n, where M represents the number of output channels of the convolution kernel;
determining whether the storage array uses a first data mapping method or a second data mapping method according to the relation between the size of the convolution kernel and the size of the storage array, wherein the method comprises the following steps:
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is not more than k/2k-1 of the size of the unit array, or the number of columns of the unit array is less than 1/k of the size of the unit array, determining that the first data mapping method is used by the storage and computation array;
when the convolution kernel is split and mapped to any unit array, if the number of lines occupying any unit array is not more than k/2k-1 of the size of the unit array, and the number of columns occupying any unit array is less than 1/k of the size of the unit array, determining that the storage array uses the first data mapping method;
when a convolution kernel is split and mapped to any unit array, if the number of rows of any unit array is occupied and is not smaller than k times of the size of the unit array, calculating to obtain the ratio of a first occupied array to a calculation speed and the ratio of a second occupied array to the calculation speed based on the size of the convolution kernel and the size of the calculation array, wherein the ratio of the first occupied array to the calculation speed is the ratio of the occupied array to the calculation speed when the convolution kernel is mapped to the calculation array by adopting a traditional method, the ratio of the second occupied array to the calculation speed is the ratio of the occupied array to the calculation speed obtained by adopting a method that parameters on each input channel of the convolution kernel are respectively mapped to different unit arrays in the calculation array, and k columns in each unit array are respectively mapped with the same weight without empty rows;
and if the value of the ratio of the first occupied array to the calculation speed is larger than the value of the ratio of the second occupied array to the calculation speed, determining that the storage and calculation array uses the second data mapping method.
Optionally, the formula of the ratio between the first occupancy array and the calculated speed is:
k×ceil(k×C/n)×ceil(M/n)
the formula of the ratio of the second occupancy array to the calculated velocity is:
Ceil(C/n)×k×ceil(M×k/n)
in the above two formulas, ceil (k × C/n) indicates that ceil (k × C/n) cell arrays need to be occupied by convolution kernel mapping in the row dimension in the conventional method, ceil (M/n) indicates that ceil (M/n) cell arrays need to be occupied by convolution kernel mapping in the column dimension in the conventional method, ceil (C/n) indicates that each input channel occupies ceil (C/n) cell arrays, ceil (C/n) × k indicates that ceil (C/n) × k cell arrays need to be occupied in the row dimension, and ceil (M × 3/n) indicates that ceil (M × k/n) cell arrays need to be occupied in the column dimension.
Optionally, in the current cycle, the outputting, by the input buffer, 5 elements in the input matrix to the cell array and performing convolution operation includes:
the first two elements of the 5 elements are partially calculated in the previous period, the rest of the calculations are completed in the current period, the middle element is input once in the current period to complete the whole 3 calculations, the last two elements are partially calculated in the current period, and the rest of the calculations are completed by placing the last two elements at the positions of the first two elements in the next period.
Optionally, when the current period ends, a complete output element is obtained at the end of each column in the cell array, and then the convolution operation in the next period is performed, where the convolution operation includes:
when the current period is finished, obtaining a complete output element at the tail end of each row in the unit array, then when the next period is started, translating the value area of the input buffer on the input matrix rightwards by 3 pixels on the basis of the value area of the input buffer on the input matrix in the current period, then taking 5 elements and outputting the elements to the unit array for convolution operation.
Optionally, the three cell arrays comprise: a first cell array, a second cell array, and a third cell array;
adding the partial result groups obtained in each period to obtain a convolution result, wherein the convolution result comprises the following steps:
adding the result obtained at the end of the first column in the first unit array, the result obtained at the end of the first column in the second unit array and the result obtained at the end of the first column in the third unit array in the current period to obtain a complete output element;
adding a result obtained at the end of a second column in the first unit array, a result obtained at the end of a third column in the first unit array and a result obtained at the end of a third column in the second unit array in the current period to a partial result obtained in the previous period respectively to obtain a complete output element;
and adding the result obtained at the end of the second row in the second unit array, the result obtained at the end of the second row in the third unit array and the result obtained at the end of the third row in the third unit array in the current period to the partial result obtained in the next period to obtain a complete output element.
A second aspect of the embodiments of the present invention provides a data mapping apparatus for increasing a utilization rate of a storage array, where the data mapping apparatus includes:
the system comprises a splitting and mapping module, a convolution kernel mapping module and a convolution kernel mapping module, wherein the splitting and mapping module is used for splitting and mapping k multiplied by k convolution kernels to k storage arrays, each storage array is mapped with a 1 multiplied by k multiplied by C vector, and if the number of lines of the storage array is not enough, the storage array is mapped to the next storage array, wherein k multiplied by k represents the height multiplied by the width of a C convolution kernel, and C represents the number of input channels of the convolution kernel;
the determining method module is used for determining that the calculation array uses a first data mapping method or a second data mapping method according to the relation between the size of the convolution kernel and the size of the calculation array;
taking k as an example, 3, the data mapping apparatus further includes: a first method module and a second method module;
wherein the first method module comprises:
the expansion mapping unit is used for mapping the first row of the 3 x 3 convolution kernel to any unit array in the storage and computation array, and elements of a first column in the unit array correspond to an arrangement mode of the first row in the 3 x 3 convolution kernel after being expanded into column vectors;
the empty row mapping unit is used for respectively mapping the same weights as those of the first column in the unit array in the second column and the third column in the unit array, the mapping positions corresponding to the second column elements in the unit array are empty downwards for C rows, and the mapping positions corresponding to the third column elements in the unit array are empty downwards for 2C rows;
an expansion input unit for expanding input data in the input buffer into a 1 × 5 × C column vector;
the first convolution unit is used for outputting 5 elements in an input matrix to the unit array by the input buffer in the current period and performing convolution operation;
the value output unit is used for obtaining a complete output element at the tail end of each row in the unit array when the current period is finished, and then carrying out convolution operation of the next period;
the second method module comprises:
a full row mapping unit, configured to map parameters on each input channel of the 3 × 3 convolution kernel to different cell arrays in the storage array, respectively, and map the same weight to three columns in each cell array, respectively, without empty rows, so that each input channel occupies ceil (C/n) cell arrays, total ceil (C/n) × 3 arrays are occupied in a row dimension, and ceil (M × 3/n) arrays are occupied in a column dimension, where ceil is an rounding-up function, and n represents a size of the storage array;
the second convolution unit is used for outputting 3 elements in an input matrix to three unit arrays in the storage and calculation array by the input buffer in the current period and performing convolution operation to obtain a partial result;
and the adding unit is used for adding the partial results obtained in each period in groups to obtain a convolution result.
Optionally, the size of the convolution kernel is k × k × C × M, and the size of the storage array is n × n, where M represents the number of output channels of the convolution kernel;
the determination method module is specifically configured to:
when the convolution kernel is split and mapped to any unit array, if the number of lines occupying any unit array is not more than k/2k-1 of the size of the unit array, or the number of columns occupying any unit array is less than 1/k of the size of the unit array, determining that the first data mapping method is used by the storage array;
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is not more than k/2k-1 of the size of the unit array, and the number of columns of the unit array is less than 1/k of the size of the unit array, determining that the storage and computation array uses the first data mapping method;
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is occupied and is not less than k times of the size of the unit array, calculating to obtain the ratio of a first occupied array to a calculation speed and the ratio of a second occupied array to the calculation speed based on the size of the convolution kernel and the size of the calculation array, wherein the ratio of the first occupied array to the calculation speed is the ratio of the occupied array to the calculation speed when the convolution kernel is mapped to the calculation array by adopting a traditional method, the ratio of the second occupied array to the calculation speed is the ratio of the occupied array to the calculation speed obtained by adopting a method that parameters on each input channel of the convolution kernel are respectively mapped to different unit arrays in the calculation array, and three rows in each unit array are respectively mapped with the same weight without idle rows;
and if the value of the ratio of the first occupied array to the calculation speed is larger than the value of the ratio of the second occupied array to the calculation speed, determining that the storage and calculation array uses the second data mapping method.
Optionally, the first volume unit is specifically configured to:
the first two elements of the 5 elements are partially calculated in the previous cycle, the rest of the calculations are completed in the current cycle, the middle element is input once in the current cycle to complete all 3 calculations, the last two elements are partially calculated in the current cycle, and the rest of the calculations are completed by placing the last two elements at the positions of the first two elements in the next cycle.
Optionally, the value output unit is specifically configured to:
when the current period is finished, obtaining a complete output element at the tail end of each row in the unit array, then when the next period is started, translating the value area of the input buffer on the input matrix rightwards by 3 pixels on the basis of the value area of the input buffer on the input matrix in the current period, then taking 5 elements and outputting the elements to the unit array for convolution operation.
Optionally, the three cell arrays comprise: a first cell array, a second cell array, and a third cell array;
the adding unit is specifically configured to:
adding the result obtained at the tail end of the first column in the first unit array, the result obtained at the tail end of the first column in the second unit array and the result obtained at the tail end of the first column in the third unit array in the current period to obtain a complete output element;
adding a result obtained at the end of a second column in the first unit array, a result obtained at the end of a third column in the first unit array and a result obtained at the end of a third column in the second unit array in the current period to a partial result obtained in the previous period respectively to obtain a complete output element;
and adding a result obtained at the end of the second column in the second unit array, a result obtained at the end of the second column in the third unit array and a result obtained at the end of the third column in the third unit array in the current period to a partial result obtained in the next period respectively to obtain a complete output element.
The invention provides a data mapping method for improving the utilization rate of storage arrays, which is characterized in that a convolution kernel of k multiplied by k is split and mapped into k storage arrays, each storage array is mapped with a vector of 1 multiplied by k multiplied by C, if the number of lines of the storage arrays is not enough, the storage arrays are mapped into the next storage array, and the storage arrays are determined to use a first data mapping method or a second data mapping method according to the relation between the size of the convolution kernel and the size of the storage arrays.
Taking a 3 × 3 convolution kernel as an example, when the convolution kernel is small, a first data mapping method is used, 80% of input data needs to be input for 2 times in total, the rest 20% of input data only needs to be input for 1 time, and the average multiplexing rate is 1.8; when the convolution kernel is large, the second data mapping method is used, all input data only need to be input for 1 time, and the average multiplexing rate is 3. The two data mapping methods can effectively reduce the pressure of an input buffer and reduce the data transmission quantity, thereby reducing the time delay and the power consumption of a chip.
In addition, according to the size of the convolution kernel, the method provided by the invention can increase the ratio of the occupied array to the calculation speed (speed/area for short) to three times at most. When the second mapping method is used, about 3 times of calculation array is occupied, and meanwhile, the calculation speed is improved to 3 times, so that the speed/area is not changed. And when the input or output channel of the convolution kernel is very small, particularly the occupied line number is increased to 5/3, the occupied column number is increased to 3 times, and the size of a single array is not exceeded, under the condition that the number of the calculation arrays is not changed, the calculation speed is increased to 3 times, so the speed/area is also increased to 3 times, the idle lines and columns in each array are effectively utilized, and the utilization rate of the calculation arrays and the calculation speed are increased.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a diagram illustrating a simulation of a mapping scheme when a convolution kernel is small according to an embodiment of the present invention;
fig. 2 is a diagram illustrating a simulation of a mapping manner when a convolution kernel is large according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The inventor finds that the conventional method for performing convolution operation by using a storage array has the problems of high power consumption, low speed/area ratio and the like. Taking M convolution kernels of 3 × 3 × C size as an example, each number in the input matrix needs to perform an average of 9 times of multiply-add operations, and the mapping method of the currently arranged weight matrix and the flowing method of the input data are common:
firstly, each convolution kernel of 3 multiplied by C is unfolded into a one-dimensional column vector which is arranged on the same column, and different columns are mapped with different convolution kernels; secondly, splitting and mapping a 3 × 3 convolution kernel into 9 unit arrays, wherein each column of each unit array is mapped with a 1 × C vector; thirdly, 3 × 3 convolution kernels are split and mapped into 3 unit arrays, and each unit array is mapped with a 1 × 3 × C column vector. These three approaches still have two significant drawbacks:
1) The multiplexing of input data inside the array cannot be realized. As mentioned above, since 9 multiply-add operations are required to average each input data in a 3 × 3 convolution kernel, the above method still needs to input each data 9 times into different cell arrays or different rows of the same cell array. The multiplexing rate of the input data is insufficient.
2) When a layer of the neural network is accelerated in multiple ways, more processing units and arrays are often used for accelerated calculation in a pipeline, idle rows and columns in each unit array cannot be effectively utilized, and the accelerated calculation is more prominent particularly when an output channel of a convolution kernel is small.
In view of the above problems, the inventors have creatively proposed the data mapping method for improving the utilization rate of the storage array of the present invention, and the method of the present invention will be described in detail below.
In general, a memory array is composed of a plurality of unit arrays, when the data mapping method of the present invention is applied, firstly, a k × k convolution kernel is split and mapped into k unit arrays, each unit array maps a 1 × k × C column vector, if the number of rows of the unit array is not enough, the unit array is continuously mapped into the next unit array, where k × k represents the height × width of the convolution kernel, and C represents the number of input channels of the convolution kernel.
And then determining whether the calculation array uses the first data mapping method or the second data mapping method according to the relation between the size of the convolution kernel and the size of the calculation array. Specifically, the following method can be used for determining which data mapping method is used:
assuming that the size of the convolution kernel is k (height of the convolution kernel) × k (width of the convolution kernel) × C (number of input channels of the convolution kernel) × M (number of output channels of the convolution kernel), the size of the computing array is n (number of rows of the computing array) × n (number of columns of the computing array).
When the convolution kernel is split and mapped to any unit array, if the number of rows occupied by any unit array is not more than k/2k-1 of the size of the unit array, or the number of columns occupied by any unit array is less than 1/k of the size of the unit array, determining that the storage and computation array uses a first data mapping method.
If the number of rows of any unit array is not more than k/2k-1 of the size of the unit array, but the number of columns of any unit array is not less than 1/k of the size of the unit array, the speed/area corresponding to the data mapping method can be reduced by 1-k/2 times compared with the speed/area corresponding to the existing mapping method; if the number of columns occupied by any unit array is smaller than 1/k of the size of the unit array, but the number of rows occupied by any unit array is larger than k/2k-1 of the size of the unit array, the speed/area corresponding to the data mapping method can be reduced by k/2-k times compared with the speed/area corresponding to the existing mapping method.
And when the convolution kernel is split and mapped to any unit array, if the number of rows occupied by any unit array is not more than k/2k-1 of the size of the unit array, and the number of columns occupied by any unit array is less than 1/k of the size of the unit array, determining that the storage and computation array uses the first data mapping method. In this case, the reduction of the speed/area by k times can be achieved completely.
The foregoing approach substantially addresses the case where the convolution kernel is small, and the case where the convolution kernel is large when the convolution kernel needs to occupy multiple cell arrays to completely map all data. In this case, when the convolution kernel is split and mapped to any one cell array, the number of rows occupying any one cell array is not less than k times of the size of the cell array, and based on the size of the convolution kernel and the size of the storage array, the ratio of the first occupied array to the calculation speed and the ratio of the second occupied array to the calculation speed are obtained through calculation, that is, the first speed/area and the second speed/area are obtained through calculation.
The ratio of the first occupied array to the calculation speed is: mapping the convolution kernel to the ratio of the occupied array to the calculation speed when storing and calculating the array by adopting a traditional method; the ratio of the second occupancy array to the calculation speed is: and respectively mapping the parameters on each input channel of the convolution kernel to different unit arrays in the storage and calculation array, and respectively mapping the same weight to k columns in each unit array without a row-empty method to obtain the ratio of the occupied array to the calculation speed. The meaning of the empty row is explained below and will not be described in detail.
And after the ratio of the first occupation array to the calculation speed and the ratio of the second occupation array to the calculation speed are obtained, if the value of the ratio of the first occupation array to the calculation speed is larger than the value of the ratio of the second occupation array to the calculation speed, determining that the storage and calculation array uses a second data mapping method.
The formula of the ratio between the first occupancy array and the calculated speed may be:
k×ceil(k×C/n)×ceil(M/n)
the formula for the ratio of the second occupancy array to the calculated speed may be:
Ceil(C/n)×k×ceil(M×k/n)
in the above two formulas, ceil (k × C/n) indicates that ceil (k × C/n) cell arrays need to be occupied by convolution kernel mapping in a row dimension in the conventional method, ceil (M/n) indicates that ceil (M/n) cell arrays need to be occupied by convolution kernel mapping in a column dimension in the conventional method, ceil (C/n) indicates that ceil (C/n) cell arrays are occupied by each input channel in the method of the present invention, ceil (C/n) × k indicates that ceil (C/n) × k cell arrays are occupied in a row dimension in the method of the present invention, and ceil (M × 3/n) indicates that ceil (M × k/n) cell arrays are occupied in a column dimension in the method of the present invention.
If the value calculated by k × Ceil (k × C/n) × Ceil (M/n) is greater than the value calculated by Ceil (C/n) × k × Ceil (M × k/n), the data mapping method of the present invention is used.
The following describes the contents of the first data mapping method and the second data mapping method in the embodiment of the present invention, taking a 3 × 3 convolution kernel as an example.
In combination with the simulation diagram of the mapping mode with a smaller convolution kernel shown in fig. 1, the first data mapping method includes: and mapping the first row of the 3 x 3 convolution kernel to any unit array, wherein elements in the first column in the unit array correspond to the arrangement mode of the first row in the 3 x 3 convolution kernel after being expanded into column vectors. For example, as shown in FIG. 1, the 3 × 3 convolution kernels illustratively show X1 to Xc, Y1 to Yc, Z1 to Zc; exemplary matrix data INPUT in an INPUT matrix 01 、INPUT 02 ,INPUT 01 The method comprises the following steps: IN 01 ~IN 0c 、IN 11 ~IN 1c 、IN 21 ~IN 2c 、IN 31 ~IN 3c
In the first cycle, the INPUT register will have the corresponding matrix data INPUT 01 Inputting the data into a memory array, and calculating to obtain the first element OUT of an output matrix 01 . In the conventional mapping method, the INPUT buffer will INPUT the next cycle 02 Inputting and obtaining a second element OUT of the output matrix at the end of the first column of the memory array 02 . However, as shown in the upper left corner of fig. 1, two thirds of the input data of two adjacent cycles are repeated, and in order to effectively utilize the data input multiple times in the first cycle, the same weight as that of the first column is respectively mapped in the second column and the third column of the storage array, but the mapping positions are downward in sequence, and each time the data are emptyAnd C, discharging. That is, the position of X1 in the second column is aligned with the position of Y1 in the first column, and the position of X1 in the third column is aligned with the position of Y1 in the second column and with the position of Z1 in the first column.
Expanding the input elements of the input buffer to a 1 x 5 x C column vector results in three elements OUT of the output matrix in one cycle 01 、OUT 02 、OUT 03 The calculation speed is increased to three times. It can be seen that, in the input data, one fifth of the data is calculated three times, two fifths of the data is calculated twice, and the other two fifths of the data are calculated once, and the average multiplexing rate of the data is 1.8. Meanwhile, the cost is that the number of lines occupied in the array is increased to 5/3 of the original number, the number of columns occupied is increased to 3 times, and the data volume input by the input buffer in each period is increased to 5/3.
I.e. 5 elements of the input, where the first two elements have already been partially computed in the previous cycle, which completes the rest of the computation; one element in the middle can complete all 3 times of calculation only by inputting once in the period; the latter two elements complete part of the computation in this cycle, and the remaining computation is completed by placing the two elements in the positions of the first two elements in the next cycle. Under the data mapping method, after each period is finished, a complete output element can be obtained at the tail end of each column, and then the value area of the input buffer on the input matrix is shifted to the right by 3 pixels. Thus, under this data mapping method, the input buffer increases the input data by a factor of 5/3 per cycle, but the number of calculation cycles becomes one third of the original.
In combination with the simulation diagram of the mapping mode with a large convolution kernel shown in fig. 2, the second data mapping method includes: the parameters on each input channel of the 3 multiplied by 3 convolution kernel are respectively mapped to different unit arrays in the storage and computation array, and three columns in each unit array are respectively mapped with the same weight without empty rows. For example, as shown in FIG. 2, the 3 × 3 convolution kernels illustratively show X1 to Xc, Y1 to Yc, Z1 to Zc; exemplary matrix data INPUT in an INPUT matrix m1 ,INPUT m1 The method comprises the following steps: IN m1 ~IN mc 、IN (m+1)1 ~IN (m+1)c 、IN (m+2)1 ~IN (m+2)c
And respectively mapping the parameters on each input channel into 3 unit arrays, wherein three columns in each unit array are respectively mapped with the same weight without empty rows. The parameters mapped by each column of each of the three cell arrays are different from those mapped by the other cell arrays. Such as: the first column mapping parameters in the upper first cell array in the memory array are X1-Xc, the first column mapping parameters in the middle second cell array are Y1-Yc, and the first column mapping parameters in the lower third cell array are Z1-Zc. And partial results are output at the tail end of each column of the unit array, and partial results obtained in each period are grouped and added to obtain a convolution result. In this case, the average multiplexing rate of data reaches 3, and no additional rows are occupied compared to the first data mapping, and the input buffer does not need to input additional data per cycle.
The result OUT01 obtained at the tail end of the first column in the first unit array, the result OUT04 obtained at the tail end of the first column in the second unit array and the result OUT07 obtained at the tail end of the first column in the third unit array in the current period are added to obtain a complete output element.
And adding the result OUT02 obtained at the end of the second column in the first unit array, the result OUT03 obtained at the end of the third column in the first unit array and the result OUT06 obtained at the end of the third column in the second unit array in the current period to the partial result obtained in the previous period respectively to obtain a complete output element.
And adding the result OUT05 obtained at the end of the second column in the second unit array in the current period, the result OUT08 obtained at the end of the second column in the third unit array in the current period and the result OUT09 obtained at the end of the third column in the third unit array to the partial result obtained in the next period respectively to obtain a complete output element. Therefore, all calculation can be completed by inputting each element in the input matrix once, the input data of the input buffer per period is unchanged, and the calculation period number is changed to one third of the original calculation period number.
Taking a 3 × 3 convolution kernel as an example, when the convolution kernel is small, a first data mapping method is used, 80% of input data needs to be input for 2 times in total, the rest 20% of input data only needs to be input for 1 time, and the average multiplexing rate is 1.8; when the convolution kernel is large, the second data mapping method is used, all input data only need to be input for 1 time, and the average multiplexing rate is 3. The two data mapping methods can effectively reduce the pressure of an input buffer and reduce the data transmission quantity, thereby reducing the time delay and the power consumption of a chip.
In addition, the speed/area can be increased to three times at most by applying the method provided by the invention according to the size of the convolution kernel. When the second mapping method is used, about 3 times of calculation array is occupied, and meanwhile, the calculation speed is improved to 3 times, so that the speed/area is not changed. And when the input or output channel of the convolution kernel is very small, particularly the number of occupied lines is increased to 5/3, the number of occupied columns is increased to 3 times, and the size of a single array is not exceeded, under the condition that the number of the calculation arrays is not changed, the calculation speed is increased to 3 times, so the speed/area is increased to 3 times, idle lines and columns in each array are effectively utilized, and the utilization rate of the calculation arrays and the calculation speed are increased.
Based on the above data mapping method for improving the utilization rate of the storage and computation array, an embodiment of the present invention further provides a data mapping apparatus for improving the utilization rate of the storage and computation array, where the data mapping apparatus includes:
the system comprises a splitting and mapping module, a convolution kernel mapping module and a convolution kernel mapping module, wherein the splitting and mapping module is used for splitting and mapping k multiplied by k convolution kernels to k storage arrays, each storage array is mapped with a 1 multiplied by k multiplied by C vector, and if the number of lines of the storage array is not enough, the storage array is mapped to the next storage array, wherein k multiplied by k represents the height multiplied by the width of a C convolution kernel, and C represents the number of input channels of the convolution kernel;
a determining method module, configured to determine that the computational array uses a first data mapping method or a second data mapping method according to a relationship between the convolution kernel size and the computational array size;
taking k as an example, 3, the data mapping apparatus further includes: a first method module and a second method module;
wherein the first method module comprises:
the expansion mapping unit is used for mapping the first row of the 3 x 3 convolution kernel to any unit array in the storage array, and elements in the first column in the unit array correspond to an arrangement mode of the first row in the 3 x 3 convolution kernel after being expanded into column vectors;
the empty row mapping unit is used for respectively mapping the same weights as those of the first column in the unit array in the second column and the third column in the unit array, a C row is vacated downwards from the mapping position corresponding to the second column element in the unit array, and a 2C row is vacated downwards from the mapping position corresponding to the third column element in the unit array;
an expansion input unit for expanding input data in the input buffer into a 1 × 5 × C column vector;
the first convolution unit is used for outputting 5 elements in an input matrix to the unit array by the input buffer in the current period and performing convolution operation;
the value output unit is used for obtaining a complete output element at the tail end of each row in the unit array when the current period is finished, and then carrying out convolution operation of the next period;
the second method module comprises:
a full row mapping unit, configured to map parameters on each input channel of the 3 × 3 convolution kernel to different cell arrays in the storage array, respectively, and map the same weight to three columns in each cell array, respectively, without empty rows, so that each input channel occupies ceil (C/n) cell arrays, total ceil (C/n) × 3 arrays are occupied in a row dimension, and ceil (M × 3/n) arrays are occupied in a column dimension, where ceil is an rounding-up function, and n represents a size of the storage array;
the second convolution unit is used for outputting 3 elements in the input matrix to three unit arrays in the storage and calculation array by the input buffer in the current period and performing convolution operation to obtain a partial result;
and the adding unit is used for adding the partial results obtained in each period in groups to obtain a convolution result.
Optionally, the size of the convolution kernel is k × k × C × M, and the size of the storage array is n × n, where M represents the number of output channels of the convolution kernel;
the determination method module is specifically configured to:
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is not more than k/2k-1 of the size of the unit array, or the number of columns of the unit array is less than 1/k of the size of the unit array, determining that the first data mapping method is used by the storage and computation array;
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is not more than k/2k-1 of the size of the unit array, and the number of columns of the unit array is less than 1/k of the size of the unit array, determining that the storage and computation array uses the first data mapping method;
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is occupied and is not less than k times of the size of the unit array, calculating to obtain the ratio of a first occupied array to a calculation speed and the ratio of a second occupied array to the calculation speed based on the size of the convolution kernel and the size of the calculation array, wherein the ratio of the first occupied array to the calculation speed is the ratio of the occupied array to the calculation speed when the convolution kernel is mapped to the calculation array by adopting a traditional method, the ratio of the second occupied array to the calculation speed is the ratio of the occupied array to the calculation speed obtained by adopting a method that parameters on each input channel of the convolution kernel are respectively mapped to different unit arrays in the calculation array, and three rows in each unit array are respectively mapped with the same weight without idle rows;
and if the value of the ratio of the first occupied array to the calculation speed is larger than the value of the ratio of the second occupied array to the calculation speed, determining that the storage and calculation array uses the second data mapping method.
Optionally, the first volume unit is specifically configured to:
the first two elements of the 5 elements are partially calculated in the previous period, the rest of the calculations are completed in the current period, the middle element is input once in the current period to complete the whole 3 calculations, the last two elements are partially calculated in the current period, and the rest of the calculations are completed by placing the last two elements at the positions of the first two elements in the next period.
Optionally, the value output unit is specifically configured to:
when the current period is finished, a complete output element is obtained at the tail end of each row in the unit array, then when the next period starts, the value area of the input buffer on the input matrix is translated rightwards by 3 pixels on the basis of the value area of the input buffer on the input matrix in the current period, and then 5 elements are taken and output to the unit array for convolution operation.
Optionally, the three cell arrays include: a first cell array, a second cell array, a third cell array;
the adding unit is specifically configured to:
adding the result obtained at the tail end of the first column in the first unit array, the result obtained at the tail end of the first column in the second unit array and the result obtained at the tail end of the first column in the third unit array in the current period to obtain a complete output element;
adding a result obtained at the end of a second column in the first unit array, a result obtained at the end of a third column in the first unit array and a result obtained at the end of a third column in the second unit array in the current period to a partial result obtained in the previous period respectively to obtain a complete output element;
and adding a result obtained at the end of the second column in the second unit array, a result obtained at the end of the second column in the third unit array and a result obtained at the end of the third column in the third unit array in the current period to a partial result obtained in the next period respectively to obtain a complete output element.
By the above example, in the data mapping method for improving the utilization rate of the storage array, k × k convolution kernels are split and mapped into k storage arrays, each storage array maps one 1 × k × C vector, if the number of rows of the storage array is not enough, the storage array is mapped into the next storage array, and the storage array is determined to use the first data mapping method or the second data mapping method according to the relationship between the convolution kernel size and the storage array size.
Taking a 3 × 3 convolution kernel as an example, when the convolution kernel is small, a first data mapping method is used, 80% of input data needs to be input for 2 times in total, the rest 20% of input data only needs to be input for 1 time, and the average multiplexing rate is 1.8; when the convolution kernel is large, the second data mapping method is used, all input data only need to be input for 1 time, and the average multiplexing rate is 3. The two data mapping methods can effectively reduce the pressure of an input buffer and reduce the data transmission quantity, thereby reducing the time delay and the power consumption of a chip.
In addition, according to the size of the convolution kernel, the method provided by the invention can increase the ratio of the occupied array to the calculation speed (speed/area for short) to three times at most. When the second mapping method is used, about 3 times of calculation array is occupied, and meanwhile, the calculation speed is improved to 3 times, so that the speed/area is not changed. And when the input or output channel of the convolution kernel is very small, particularly the number of occupied lines is increased to 5/3, the number of occupied columns is increased to 3 times, and the size of a single array is not exceeded, under the condition that the number of the calculation arrays is not changed, the calculation speed is increased to 3 times, so the speed/area is increased to 3 times, idle lines and columns in each array are effectively utilized, and the utilization rate of the calculation arrays and the calculation speed are increased.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one of 8230, and" comprising 8230does not exclude the presence of additional like elements in a process, method, article, or apparatus comprising the element.
While the present invention has been described with reference to the particular illustrative embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but is intended to cover various modifications, equivalent arrangements, and equivalents thereof, which may be made by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A data mapping method for increasing utilization of an inventory array, the inventory array comprising: a plurality of cell arrays, the data mapping method comprising:
splitting and mapping a k multiplied by k convolution kernel into k unit arrays, mapping a 1 multiplied by k multiplied by C vector in each unit array, and mapping the unit arrays into the next unit array if the number of rows of the unit arrays is not enough, wherein k multiplied by k represents the height multiplied by the width of the convolution kernel, and C represents the number of input channels of the convolution kernel;
determining whether the storage and computation array uses a first data mapping method or a second data mapping method according to the relation between the size of the convolution kernel and the size of the storage and computation array;
taking k as an example, the first data mapping method includes:
mapping a first row of a 3 × 3 convolution kernel to any unit array, wherein elements in a first column in the unit array correspond to an arrangement mode of the first row in the 3 × 3 convolution kernel after being expanded into a column vector;
respectively mapping the same weights as those in the first column of the unit array in the second column and the third column of the unit array, wherein the mapping position corresponding to the second column of the unit array is vacant downwards by C rows, and the mapping position corresponding to the third column of the unit array is vacant downwards by 2C rows;
expanding the input elements in the input buffer to a 1 × 5 × C column vector;
in the current period, the input buffer outputs 5 elements in the input matrix to the unit array and carries out convolution operation;
when the current period is finished, obtaining a complete output element at the tail end of each row in the unit array, and then performing convolution operation of the next period;
taking k as an example, the second data mapping method includes:
mapping parameters on each input channel of a 3 × 3 convolution kernel into different unit arrays in the storage array respectively, and mapping the same weight on three columns in each unit array respectively without empty rows, so that each input channel occupies ceil (C/n) unit arrays, which occupy ceil (C/n) × 3 arrays in the row dimension in total, and ceil (M × 3/n) arrays in the column dimension, wherein ceil is an upward rounding function, and n represents the size of the storage array;
in the current period, the input buffer outputs 3 elements in the input matrix to three unit arrays in the storage and calculation array and carries out convolution operation to obtain partial results;
and adding partial results obtained in each period in groups to obtain a convolution result.
2. The data mapping method of claim 1, wherein the size of the convolution kernel is k x C x M, and the size of the storage array is n x n, where M represents the number of output channels of the convolution kernel;
determining whether the storage array uses a first data mapping method or a second data mapping method according to the relation between the size of the convolution kernel and the size of the storage array, wherein the method comprises the following steps:
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is not more than k/2k-1 of the size of the unit array, or the number of columns of the unit array is less than 1/k of the size of the unit array, determining that the first data mapping method is used by the storage and computation array;
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is not more than k/2k-1 of the size of the unit array, and the number of columns of the unit array is less than 1/k of the size of the unit array, determining that the storage and computation array uses the first data mapping method;
when the convolution kernel is split and mapped to any unit array, if the number of lines occupying any unit array is not less than k times of the size of the unit array, calculating to obtain a ratio of a first occupied array to a calculation speed and a ratio of a second occupied array to the calculation speed based on the size of the convolution kernel and the size of the calculation array, wherein the ratio of the first occupied array to the calculation speed is the ratio of the occupied array to the calculation speed when the convolution kernel is mapped to the calculation array by adopting a traditional method, the ratio of the second occupied array to the calculation speed is the ratio of the occupied array to the calculation speed obtained by respectively mapping parameters on each input channel of the convolution kernel to different unit arrays in the calculation array, and respectively mapping the same weight on k columns in each unit array without using an empty line method;
and if the value of the ratio of the first occupied array to the calculation speed is larger than the value of the ratio of the second occupied array to the calculation speed, determining that the storage and calculation array uses the second data mapping method.
3. The data mapping method of claim 2, wherein the formula of the ratio of the first occupancy array to the calculated speed is:
k×ceil(k×C/n)×ceil(M/n)
the formula of the ratio of the second occupancy array to the calculated speed is:
Ceil(C/n)×k×ceil(M×k/n)
in the above two formulas, ceil (k × C/n) indicates that ceil (k × C/n) cell arrays need to be occupied by convolution kernel mapping in the row dimension in the conventional method, ceil (M/n) indicates that ceil (M/n) cell arrays need to be occupied by convolution kernel mapping in the column dimension in the conventional method, ceil (C/n) indicates that each input channel occupies ceil (C/n) cell arrays, ceil (C/n) × k indicates that ceil (C/n) × k cell arrays need to be occupied in the row dimension, and ceil (M × 3/n) indicates that ceil (M × k/n) cell arrays need to be occupied in the column dimension.
4. The data mapping method of claim 1, wherein in the current cycle, the input buffer outputs 5 elements in the input matrix to the cell array and performs convolution operation, comprising:
the first two elements of the 5 elements are partially calculated in the previous period, the rest of the calculations are completed in the current period, the middle element is input once in the current period to complete the whole 3 calculations, the last two elements are partially calculated in the current period, and the rest of the calculations are completed by placing the last two elements at the positions of the first two elements in the next period.
5. The data mapping method according to claim 4, wherein at the end of the current cycle, a complete output element is obtained at the end of each column in the cell array, and then the next cycle of convolution operation is performed, including:
when the current period is finished, obtaining a complete output element at the tail end of each row in the unit array, then when the next period is started, translating the value area of the input buffer on the input matrix rightwards by 3 pixels on the basis of the value area of the input buffer on the input matrix in the current period, then taking 5 elements and outputting the elements to the unit array for convolution operation.
6. The data mapping method of claim 2, wherein the three cell arrays comprise: a first cell array, a second cell array, a third cell array;
and adding the partial results obtained in each period in groups to obtain a convolution result, wherein the convolution result comprises the following steps:
adding the result obtained at the end of the first column in the first unit array, the result obtained at the end of the first column in the second unit array and the result obtained at the end of the first column in the third unit array in the current period to obtain a complete output element;
adding a result obtained at the end of a second column in the first unit array, a result obtained at the end of a third column in the first unit array and a result obtained at the end of a third column in the second unit array in the current period to a partial result obtained in the previous period respectively to obtain a complete output element;
and adding the result obtained at the end of the second row in the second unit array, the result obtained at the end of the second row in the third unit array and the result obtained at the end of the third row in the third unit array in the current period to the partial result obtained in the next period to obtain a complete output element.
7. A data mapping apparatus for increasing utilization of a storage array, the data mapping apparatus comprising:
the split mapping module is used for splitting and mapping k multiplied by k convolution kernels into k storage and computation arrays, each storage and computation array is mapped with a 1 multiplied by k multiplied by C vector, and if the number of rows of the storage and computation arrays is not enough, the storage and computation arrays are mapped into the next storage and computation array, wherein k multiplied by k represents the height multiplied by the width of the C convolution kernel, and C represents the number of input channels of the convolution kernel;
the determining method module is used for determining that the calculation array uses a first data mapping method or a second data mapping method according to the relation between the size of the convolution kernel and the size of the calculation array;
taking k as an example, the data mapping apparatus further includes: a first method module and a second method module;
wherein the first method module comprises:
the expansion mapping unit is used for mapping the first row of the 3 x 3 convolution kernel to any unit array in the storage and computation array, and elements of a first column in the unit array correspond to an arrangement mode of the first row in the 3 x 3 convolution kernel after being expanded into column vectors;
the empty row mapping unit is used for respectively mapping the same weights as those of the first column in the unit array in the second column and the third column in the unit array, a C row is vacated downwards from the mapping position corresponding to the second column element in the unit array, and a 2C row is vacated downwards from the mapping position corresponding to the third column element in the unit array;
an expansion input unit for expanding input data in the input buffer into a 1 × 5 × C column vector;
the first convolution unit is used for outputting 5 elements in an input matrix to the unit array by the input buffer in the current period and performing convolution operation;
the value output unit is used for obtaining a complete output element at the tail end of each row in the unit array when the current period is finished, and then carrying out convolution operation of the next period;
the second method module comprises:
a full row mapping unit, configured to map parameters on each input channel of the 3 × 3 convolution kernel to different cell arrays in the storage array, respectively, and map the same weight to three columns in each cell array, respectively, without empty rows, so that each input channel occupies ceil (C/n) cell arrays, total ceil (C/n) × 3 arrays are occupied in a row dimension, and ceil (M × 3/n) arrays are occupied in a column dimension, where ceil is an rounding-up function, and n represents a size of the storage array;
the second convolution unit is used for outputting 3 elements in the input matrix to three unit arrays in the storage and calculation array by the input buffer in the current period and performing convolution operation to obtain a partial result;
and the adding unit is used for adding the partial results obtained in each period in groups to obtain a convolution result.
8. The data mapping apparatus according to claim 7, wherein the size of the convolution kernel is kxkxCxM, and the size of the storage array is nxn, where M represents the number of output channels of the convolution kernel;
the determination method module is specifically configured to:
when the convolution kernel is split and mapped to any unit array, if the number of lines occupying any unit array is not more than k/2k-1 of the size of the unit array, or the number of columns occupying any unit array is less than 1/k of the size of the unit array, determining that the first data mapping method is used by the storage array;
when the convolution kernel is split and mapped to any unit array, if the number of lines occupying any unit array is not more than k/2k-1 of the size of the unit array, and the number of columns occupying any unit array is less than 1/k of the size of the unit array, determining that the storage array uses the first data mapping method;
when the convolution kernel is split and mapped to any unit array, if the number of rows of the unit array is occupied and is not less than k times of the size of the unit array, calculating to obtain the ratio of a first occupied array to a calculation speed and the ratio of a second occupied array to the calculation speed based on the size of the convolution kernel and the size of the calculation array, wherein the ratio of the first occupied array to the calculation speed is the ratio of the occupied array to the calculation speed when the convolution kernel is mapped to the calculation array by adopting a traditional method, the ratio of the second occupied array to the calculation speed is the ratio of the occupied array to the calculation speed obtained by adopting a method that parameters on each input channel of the convolution kernel are respectively mapped to different unit arrays in the calculation array, and three rows in each unit array are respectively mapped with the same weight without idle rows;
and if the value of the ratio of the first occupied array to the calculation speed is larger than the value of the ratio of the second occupied array to the calculation speed, determining that the storage and calculation array uses the second data mapping method.
9. The data mapping apparatus of claim 7, wherein the first convolution unit is specifically configured to:
the first two elements of the 5 elements are partially calculated in the previous period, the rest of the calculations are completed in the current period, the middle element is input once in the current period to complete the whole 3 calculations, the last two elements are partially calculated in the current period, and the rest of the calculations are completed by placing the last two elements at the positions of the first two elements in the next period.
10. The data mapping apparatus according to claim 9, wherein the value output unit is specifically configured to:
when the current period is finished, obtaining a complete output element at the tail end of each row in the unit array, then when the next period is started, translating the value area of the input buffer on the input matrix rightwards by 3 pixels on the basis of the value area of the input buffer on the input matrix in the current period, then taking 5 elements and outputting the elements to the unit array for convolution operation.
CN202210875598.3A 2022-07-25 2022-07-25 Data mapping method and data mapping device for improving utilization rate of storage array Pending CN115238877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210875598.3A CN115238877A (en) 2022-07-25 2022-07-25 Data mapping method and data mapping device for improving utilization rate of storage array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210875598.3A CN115238877A (en) 2022-07-25 2022-07-25 Data mapping method and data mapping device for improving utilization rate of storage array

Publications (1)

Publication Number Publication Date
CN115238877A true CN115238877A (en) 2022-10-25

Family

ID=83674895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210875598.3A Pending CN115238877A (en) 2022-07-25 2022-07-25 Data mapping method and data mapping device for improving utilization rate of storage array

Country Status (1)

Country Link
CN (1) CN115238877A (en)

Similar Documents

Publication Publication Date Title
CN107844828B (en) Convolution calculation method in neural network and electronic device
CN109919311B (en) Method for generating instruction sequence, method and device for executing neural network operation
CN106875011B (en) Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof
JP2019109895A (en) Method and electronic device for performing convolution calculations in neutral network
WO2022037257A1 (en) Convolution calculation engine, artificial intelligence chip, and data processing method
JPS58169675A (en) Monolithic type high speed fourier transform circuit
CN113419705A (en) Memory multiply-add calculation circuit, chip and calculation device
JPS6126712B2 (en)
CN110705703A (en) Sparse neural network processor based on systolic array
CN113673701A (en) Method for operating neural network model, readable medium and electronic device
US20230068450A1 (en) Method and apparatus for processing sparse data
CN112950656A (en) Block convolution method for pre-reading data according to channel based on FPGA platform
CN115186802A (en) Block sparse method and device based on convolutional neural network and processing unit
CN113741858A (en) In-memory multiply-add calculation method, device, chip and calculation equipment
JP2000148730A (en) Internal product vector arithmetic unit
CN109446478B (en) Complex covariance matrix calculation system based on iteration and reconfigurable mode
US20050289207A1 (en) Fast fourier transform processor, dynamic scaling method and fast Fourier transform with radix-8 algorithm
CN115238877A (en) Data mapping method and data mapping device for improving utilization rate of storage array
CN108108189A (en) A kind of computational methods and Related product
CN111610963A (en) Chip structure and multiply-add calculation engine thereof
CN113031912A (en) Multiplier, data processing method, device and chip
CN116451755A (en) Acceleration method and device of graph convolution neural network and electronic equipment
CN113407904B (en) Winograd processing method, system and medium compatible with multi-dimensional convolutional neural network
CN113592075B (en) Convolution operation device, method and chip
WO2021213010A1 (en) Crossbar architecture-based pruning method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination