CN115509467B - Feature deformation method for calculating matching weight in memory - Google Patents
Feature deformation method for calculating matching weight in memory Download PDFInfo
- Publication number
- CN115509467B CN115509467B CN202211472617.4A CN202211472617A CN115509467B CN 115509467 B CN115509467 B CN 115509467B CN 202211472617 A CN202211472617 A CN 202211472617A CN 115509467 B CN115509467 B CN 115509467B
- Authority
- CN
- China
- Prior art keywords
- data
- ireg
- memory
- deformation
- regshiftnum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0661—Format or protocol conversion arrangements
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a feature deformation method for calculating matching weight in a memory, which comprises the following steps: carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels; performing NCWH deformation on the NCHWC' format in iReg; intercepting output data from the deformed data, and transmitting the intercepted output data to an interface of an in-memory computing unit for multiplying and adding the weight. The deformation of the invention gathers the effective data, solves the problem of space gap waste between Feature and Weight, combines the multiplexing characteristic of the data under the convolution calculation sliding window, caches the multiplexed data, improves the efficiency of dma data carrying, and enhances the performance of convolution calculation.
Description
Technical Field
The invention relates to the technical field of in-memory operation, in particular to a characteristic deformation method for in-memory calculation of matching weights.
Background
The convolution calculation in the neural network needs to perform multiplication and addition calculation on the corresponding feature and weight, the traditional calculation is to load the feature and weight into a calculation unit at the same time, the data placement is flexible, registers for caching the feature and weight can be read and written repeatedly, but in the in-memory calculation, the weight is written in advance, and the higher requirement is provided for improving the weight density of CIM.
Access order: c '- > W- > H- > C- > N, corresponding to a cube of n×h×w×c=1×14×14×128, cutting C into 64×2, according to NCHWC' = "0- > 1- > 2- > 3- > 4- > 5. In the manner of storing weight with NCHWC ', generally different C's need to be placed in a fixed width, and in the format of aligning specific lengths (such as 32B), if calculated C '<32B, there is a gap between different C's, which is often a majority, so that CIM space is easily wasted. In addition, when features corresponding to weight are generated for different convolution calculations (general convolution and depth convolution), the software configuration is complicated by using two separate methods.
Disclosure of Invention
Based on the defects existing in the prior art, the invention provides a characteristic deformation method for calculating matching weights in memory, which comprises the following specific technical scheme:
the feature deformation method for calculating the matching weight in the memory comprises the following steps:
carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;
performing NCWH deformation on the NCHWC' format in iReg;
intercepting output data from the deformed data, broadcasting the output data to an in-memory computing unit interface, and multiplying and adding the output data with the weight.
Specifically, the iReg consists of 9 64B register sets.
In particular, the convolution kernels include 1*1, 3*3, 5*5, or 7*7.
Specifically, the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:
the convolution kernel is 1*1: the maximum number of supported input channels is 288, regshiftnum is typically 0, ireg fill position regOffset, regIndex is (0, 0);
the convolution kernel is 3*3: the maximum number of supported input channels is 32 or 64, regshiftnum is typically 24, ireg fill position regOffset, regIndex is (6, 0), (7, 0), (8, 0);
the convolution kernel is 5*5: the maximum number of input channels supported is 16, regshiftnum is typically 10, ireg fill position regOffset, regIndex is (5, 0), (5, 16), (5, 32), (2, 48), (6, 0);
the convolution kernel is 7*7: the maximum number of input channels supported is 5 or 8, regshiftnum is typically 7, ireg fill position regOffset, regIndex is (5, 16), (5, 24), (5, 32), (5, 40), (5, 48), (5, 56), (6, 0).
Specifically, the NCWH deformation includes the following steps:
ireg moves regShiftNum 8B from high address to low address, where regShiftNum is the number of register shifts;
the maximum number of input channels supported at intervals is counted, 576 registers are counted for the same times as the convolution kernel in size, and 64 groups of data in a two-dimensional array form are formed;
and sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing the format deformation.
Specifically, the regShiftNum is equal to the data that is not multiplexed divided by 8.
Specifically, the output data is located at the front end of the deformed data, where:
for a 32-channel general convolution, the output data includes 288B;
for a 64-channel depth convolution, every 9B of data extends to 288B, forming 64 sets of 288B of data, distributing the 64 sets of data onto the interfaces of the corresponding in-memory computing units.
The invention has the beneficial effects that:
(1) The deformation converges effective data, and the problem of space gap waste between features and Weight is solved.
(2) And combining the multiplexing characteristic of the data under the sliding window of the convolution calculation, caching the multiplexed data, improving the efficiency of dma data carrying and enhancing the performance of the convolution calculation.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a flow chart of a general convolution embodiment of the present invention;
FIG. 3 is a schematic flow diagram of a deep convolution embodiment of the present invention;
fig. 4 is a schematic diagram of the NCHWC' access sequence.
Detailed Description
For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.
The feature deformation method for calculating the matching weight in the memory comprises the following steps:
carrying NCHW format data to be deformed from an SRAM memory and filling the NCHW format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;
performing NCWH deformation on the NCHW format in iReg;
intercepting output data from the deformed data, and transmitting the intercepted output data to an interface of an in-memory computing unit for multiplying and adding the weight.
Specifically, the iReg consists of 9 64B register sets.
In particular, the convolution kernels include 1*1, 3*3, 5*5, or 7*7.
Specifically, the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:
specifically, the NCWH deformation includes the following steps:
ireg moves regShiftNum x 8B from high address to low address;
the maximum number of input channels supported at intervals is counted, 576 registers are counted for the same times as the convolution kernel in size, and 64 groups of data in a two-dimensional array form are formed;
and sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing the format deformation.
Specifically, the regShiftNum is equal to data/8 that is not multiplexed.
Specifically, the output data is located at the front end of the deformed data, where:
for a 32-channel general convolution, the output data includes 288B;
for a 64-channel depth convolution, every 9B of data extends to 288B, forming 64 sets of 288B of data, distributing the 64 sets of data onto the interfaces of the corresponding in-memory computing units.
Example 1:
as shown in fig. 1, the feature deformation method for calculating matching weights in a memory provided by the invention comprises the following steps:
1. the feature is filled into ireg. The deformation of ireg varies for different kernel, as shown in fig. 2, kernel=3, and the positions of each filling are three, namely (6, 0), (7, 0), (8, 0), and the coordinates mean (the group address of ireg, the offset address in the ireg group).
2. ireg moves regShiftNum x 8B from high address to low address. Each period is filled with features to the same position of ireg, and in the case of kernel=3, regshiftnum=data (Byte)/8=64×3/8=24 that is not multiplexed, when stride=1, the W direction of features needs 2 periods (filling 3 periods) to start deformation when the W direction starts for the first time, after which 64×3b data is filled in each period, and 64×3b data is divided from low displacement, and each period can be deformed to generate data that participates in calculation.
The ireg filling for the different kernel is shown in the table below
3. As shown in kernel=3 of fig. 2, the same offset that the deformation changed to each group (64B) is moved together (9B), forming 64 9B data. The same shift of the deformation into each group (16B) forms 16 25B data if kernel=5.
4. For general convolution, 288B is broadcast to the in-memory computing unit interface before interception, and for deep convolution, every 9B of data is extended to 288B (as shown in fig. 3), 64 sets of 288B of data are formed, and 64 sets of data are distributed to the corresponding in-memory computing unit interfaces.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.
Claims (5)
1. The characteristic deformation method for calculating the matching weight in the memory is characterized by comprising the following steps of:
carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;
performing NCWH deformation on the NCHWC' format in iReg; the NCWH deformation comprises the following steps:
ireg moves regShiftNum 8B from high address to low address, where regShiftNum is the number of register shifts;
the maximum number of input channels supported at intervals is counted, 576 registers are counted for the same times as the convolution kernel in size, and 64 groups of data in a two-dimensional array form are formed;
sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing format deformation;
intercepting output data from deformed data, broadcasting the output data to an in-memory computing unit interface, and multiplying and adding the output data with weights; the output data is located at the front end of the deformed data, wherein:
for a 32-channel general convolution, the output data includes 288B;
for a 64-channel depth convolution, every 9B of data extends to 288B, forming 64 sets of 288B of data, distributing the 64 sets of data onto the interfaces of the corresponding in-memory computing units.
2. The method of claim 1, wherein iReg consists of 9 64B register sets.
3. The in-memory feature deformation method of computing matching weights of claim 1, wherein the convolution kernel comprises 1*1, 3*3, 5*5, or 7*7.
4. The method for feature deformation for in-memory computing of matching weights according to claim 3, wherein the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:
the convolution kernel is 1*1: the maximum number of supported input channels is 288, regshiftnum is typically 0, ireg fill position regOffset, regIndex is (0, 0);
the convolution kernel is 3*3: the maximum number of supported input channels is 32 or 64, regshiftnum is typically 24, ireg fill position regOffset, regIndex is (6, 0), (7, 0), (8, 0);
the convolution kernel is 5*5: the maximum number of input channels supported is 16, regshiftnum is typically 10, ireg fill position regOffset, regIndex is (5, 0), (5, 16), (5, 32), (2, 48), (6, 0);
the convolution kernel is 7*7: the maximum number of input channels supported is 5 or 8, regshiftnum is typically 7, ireg fill position regOffset, regIndex is (5, 16), (5, 24), (5, 32), (5, 40), (5, 48), (5, 56), (6, 0).
5. The method of claim 1, wherein regShiftNum is equal to the data not multiplexed divided by 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211472617.4A CN115509467B (en) | 2022-11-23 | 2022-11-23 | Feature deformation method for calculating matching weight in memory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211472617.4A CN115509467B (en) | 2022-11-23 | 2022-11-23 | Feature deformation method for calculating matching weight in memory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115509467A CN115509467A (en) | 2022-12-23 |
CN115509467B true CN115509467B (en) | 2023-04-28 |
Family
ID=84514282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211472617.4A Active CN115509467B (en) | 2022-11-23 | 2022-11-23 | Feature deformation method for calculating matching weight in memory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115509467B (en) |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10223333B2 (en) * | 2014-08-29 | 2019-03-05 | Nvidia Corporation | Performing multi-convolution operations in a parallel processing system |
CN111401510A (en) * | 2019-09-24 | 2020-07-10 | 上海寒武纪信息科技有限公司 | Data processing method and device, computer equipment and storage medium |
CN111984189B (en) * | 2020-07-22 | 2022-05-17 | 深圳云天励飞技术股份有限公司 | Neural network computing device, data reading method, data storage method and related equipment |
US20220129744A1 (en) * | 2020-10-26 | 2022-04-28 | Arm Limited | Method for permuting dimensions of a multi-dimensional tensor |
CN113672855A (en) * | 2021-08-25 | 2021-11-19 | 恒烁半导体(合肥)股份有限公司 | Memory operation method, device and application thereof |
CN114995823A (en) * | 2022-06-07 | 2022-09-02 | 重庆大学 | Deep learning compiler optimization method for special accelerator for CNN |
-
2022
- 2022-11-23 CN CN202211472617.4A patent/CN115509467B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115509467A (en) | 2022-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229645B (en) | Convolution acceleration and calculation processing method and device, electronic equipment and storage medium | |
KR102642853B1 (en) | Convolution circuit, application processor having the same, and operating methoe thereof | |
CN111242277B (en) | Convolutional neural network accelerator supporting sparse pruning based on FPGA design | |
CN102208005B (en) | 2-dimensional (2-D) convolver | |
CN110991634B (en) | Artificial intelligence accelerator, equipment, chip and data processing method | |
CN116541647A (en) | Operation accelerator, processing method and related equipment | |
CN108573305B (en) | Data processing method, equipment and device | |
EP3494542B1 (en) | Method and system for correcting a distorted input image | |
CN103902507A (en) | Matrix multiplication calculating device and matrix multiplication calculating method both oriented to programmable algebra processor | |
WO2014105154A1 (en) | Systems, methods, and computer program products for performing mathematical operations | |
CN111931925B (en) | Acceleration system of binary neural network based on FPGA | |
CN103760525A (en) | Completion type in-place matrix transposition method | |
CN111768458A (en) | Sparse image processing method based on convolutional neural network | |
CN108802726B (en) | Synthetic aperture radar imaging method based on Graphic Processing Unit (GPU) | |
CN115509467B (en) | Feature deformation method for calculating matching weight in memory | |
CN109324984B (en) | Method and apparatus for using circular addressing in convolution operations | |
CN109446478B (en) | Complex covariance matrix calculation system based on iteration and reconfigurable mode | |
CN111610963B (en) | Chip structure and multiply-add calculation engine thereof | |
CN109993293B (en) | Deep learning accelerator suitable for heap hourglass network | |
CN1105358C (en) | Semiconductor memory having arithmetic function, and processor using the same | |
US20090063607A1 (en) | Method and structure for fast in-place transformation of standard full and packed matrix data formats | |
CN103106181A (en) | Realization method of large-point-number fast Fourier transform (FFT) on processor | |
US9715715B2 (en) | Efficient cache preloading | |
CN113592075B (en) | Convolution operation device, method and chip | |
CN116051345A (en) | Image data processing method, device, computer equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 200120 building C, No. 888, Huanhu West 2nd Road, Lingang New Area, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai Applicant after: Suzhou Yizhu Intelligent Technology Co.,Ltd. Address before: 200120 building C, No. 888, Huanhu West 2nd Road, Lingang New Area, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai Applicant before: Shanghai Yizhu Intelligent Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |