CN115509467B

CN115509467B - Feature deformation method for calculating matching weight in memory

Info

Publication number: CN115509467B
Application number: CN202211472617.4A
Authority: CN
Inventors: 伍骏; 董光达
Original assignee: Suzhou Yizhu Intelligent Technology Co ltd
Current assignee: Suzhou Yizhu Intelligent Technology Co ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-04-28
Anticipated expiration: 2042-11-23
Also published as: CN115509467A

Abstract

The invention discloses a feature deformation method for calculating matching weight in a memory, which comprises the following steps: carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels; performing NCWH deformation on the NCHWC' format in iReg; intercepting output data from the deformed data, and transmitting the intercepted output data to an interface of an in-memory computing unit for multiplying and adding the weight. The deformation of the invention gathers the effective data, solves the problem of space gap waste between Feature and Weight, combines the multiplexing characteristic of the data under the convolution calculation sliding window, caches the multiplexed data, improves the efficiency of dma data carrying, and enhances the performance of convolution calculation.

Description

Feature deformation method for calculating matching weight in memory

Technical Field

The invention relates to the technical field of in-memory operation, in particular to a characteristic deformation method for in-memory calculation of matching weights.

Background

The convolution calculation in the neural network needs to perform multiplication and addition calculation on the corresponding feature and weight, the traditional calculation is to load the feature and weight into a calculation unit at the same time, the data placement is flexible, registers for caching the feature and weight can be read and written repeatedly, but in the in-memory calculation, the weight is written in advance, and the higher requirement is provided for improving the weight density of CIM.

Access order: c '- > W- > H- > C- > N, corresponding to a cube of n×h×w×c=1×14×14×128, cutting C into 64×2, according to NCHWC' = "0- > 1- > 2- > 3- > 4- > 5. In the manner of storing weight with NCHWC ', generally different C's need to be placed in a fixed width, and in the format of aligning specific lengths (such as 32B), if calculated C '<32B, there is a gap between different C's, which is often a majority, so that CIM space is easily wasted. In addition, when features corresponding to weight are generated for different convolution calculations (general convolution and depth convolution), the software configuration is complicated by using two separate methods.

Disclosure of Invention

Based on the defects existing in the prior art, the invention provides a characteristic deformation method for calculating matching weights in memory, which comprises the following specific technical scheme:

the feature deformation method for calculating the matching weight in the memory comprises the following steps:

carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;

performing NCWH deformation on the NCHWC' format in iReg;

intercepting output data from the deformed data, broadcasting the output data to an in-memory computing unit interface, and multiplying and adding the output data with the weight.

Specifically, the iReg consists of 9 64B register sets.

In particular, the convolution kernels include 1*1, 3*3, 5*5, or 7*7.

Specifically, the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:

the convolution kernel is 1*1: the maximum number of supported input channels is 288, regshiftnum is typically 0, ireg fill position regOffset, regIndex is (0, 0);

the convolution kernel is 3*3: the maximum number of supported input channels is 32 or 64, regshiftnum is typically 24, ireg fill position regOffset, regIndex is (6, 0), (7, 0), (8, 0);

the convolution kernel is 5*5: the maximum number of input channels supported is 16, regshiftnum is typically 10, ireg fill position regOffset, regIndex is (5, 0), (5, 16), (5, 32), (2, 48), (6, 0);

the convolution kernel is 7*7: the maximum number of input channels supported is 5 or 8, regshiftnum is typically 7, ireg fill position regOffset, regIndex is (5, 16), (5, 24), (5, 32), (5, 40), (5, 48), (5, 56), (6, 0).

Specifically, the NCWH deformation includes the following steps:

ireg moves regShiftNum 8B from high address to low address, where regShiftNum is the number of register shifts;

the maximum number of input channels supported at intervals is counted, 576 registers are counted for the same times as the convolution kernel in size, and 64 groups of data in a two-dimensional array form are formed;

and sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing the format deformation.

Specifically, the regShiftNum is equal to the data that is not multiplexed divided by 8.

Specifically, the output data is located at the front end of the deformed data, where:

for a 32-channel general convolution, the output data includes 288B;

for a 64-channel depth convolution, every 9B of data extends to 288B, forming 64 sets of 288B of data, distributing the 64 sets of data onto the interfaces of the corresponding in-memory computing units.

The invention has the beneficial effects that:

(1) The deformation converges effective data, and the problem of space gap waste between features and Weight is solved.

(2) And combining the multiplexing characteristic of the data under the sliding window of the convolution calculation, caching the multiplexed data, improving the efficiency of dma data carrying and enhancing the performance of the convolution calculation.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a flow chart of a general convolution embodiment of the present invention;

FIG. 3 is a schematic flow diagram of a deep convolution embodiment of the present invention;

fig. 4 is a schematic diagram of the NCHWC' access sequence.

Detailed Description

For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.

carrying NCHW format data to be deformed from an SRAM memory and filling the NCHW format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;

performing NCWH deformation on the NCHW format in iReg;

intercepting output data from the deformed data, and transmitting the intercepted output data to an interface of an in-memory computing unit for multiplying and adding the weight.

Specifically, the iReg consists of 9 64B register sets.

In particular, the convolution kernels include 1*1, 3*3, 5*5, or 7*7.

specifically, the NCWH deformation includes the following steps:

ireg moves regShiftNum x 8B from high address to low address;

Specifically, the regShiftNum is equal to data/8 that is not multiplexed.

for a 32-channel general convolution, the output data includes 288B;

Example 1:

as shown in fig. 1, the feature deformation method for calculating matching weights in a memory provided by the invention comprises the following steps:

1. the feature is filled into ireg. The deformation of ireg varies for different kernel, as shown in fig. 2, kernel=3, and the positions of each filling are three, namely (6, 0), (7, 0), (8, 0), and the coordinates mean (the group address of ireg, the offset address in the ireg group).

2. ireg moves regShiftNum x 8B from high address to low address. Each period is filled with features to the same position of ireg, and in the case of kernel=3, regshiftnum=data (Byte)/8=64×3/8=24 that is not multiplexed, when stride=1, the W direction of features needs 2 periods (filling 3 periods) to start deformation when the W direction starts for the first time, after which 64×3b data is filled in each period, and 64×3b data is divided from low displacement, and each period can be deformed to generate data that participates in calculation.

The ireg filling for the different kernel is shown in the table below

3. As shown in kernel=3 of fig. 2, the same offset that the deformation changed to each group (64B) is moved together (9B), forming 64 9B data. The same shift of the deformation into each group (16B) forms 16 25B data if kernel=5.

4. For general convolution, 288B is broadcast to the in-memory computing unit interface before interception, and for deep convolution, every 9B of data is extended to 288B (as shown in fig. 3), 64 sets of 288B of data are formed, and 64 sets of data are distributed to the corresponding in-memory computing unit interfaces.

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. The characteristic deformation method for calculating the matching weight in the memory is characterized by comprising the following steps of:

performing NCWH deformation on the NCHWC' format in iReg; the NCWH deformation comprises the following steps:

sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing format deformation;

intercepting output data from deformed data, broadcasting the output data to an in-memory computing unit interface, and multiplying and adding the output data with weights; the output data is located at the front end of the deformed data, wherein:

for a 32-channel general convolution, the output data includes 288B;

2. The method of claim 1, wherein iReg consists of 9 64B register sets.

3. The in-memory feature deformation method of computing matching weights of claim 1, wherein the convolution kernel comprises 1*1, 3*3, 5*5, or 7*7.

4. The method for feature deformation for in-memory computing of matching weights according to claim 3, wherein the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:

5. The method of claim 1, wherein regShiftNum is equal to the data not multiplexed divided by 8.