CN115509467B - Feature deformation method for calculating matching weight in memory - Google Patents

Feature deformation method for calculating matching weight in memory Download PDF

Info

Publication number
CN115509467B
CN115509467B CN202211472617.4A CN202211472617A CN115509467B CN 115509467 B CN115509467 B CN 115509467B CN 202211472617 A CN202211472617 A CN 202211472617A CN 115509467 B CN115509467 B CN 115509467B
Authority
CN
China
Prior art keywords
data
ireg
memory
deformation
regshiftnum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211472617.4A
Other languages
Chinese (zh)
Other versions
CN115509467A (en
Inventor
伍骏
董光达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Yizhu Intelligent Technology Co ltd
Original Assignee
Suzhou Yizhu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Yizhu Intelligent Technology Co ltd filed Critical Suzhou Yizhu Intelligent Technology Co ltd
Priority to CN202211472617.4A priority Critical patent/CN115509467B/en
Publication of CN115509467A publication Critical patent/CN115509467A/en
Application granted granted Critical
Publication of CN115509467B publication Critical patent/CN115509467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0661Format or protocol conversion arrangements
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a feature deformation method for calculating matching weight in a memory, which comprises the following steps: carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels; performing NCWH deformation on the NCHWC' format in iReg; intercepting output data from the deformed data, and transmitting the intercepted output data to an interface of an in-memory computing unit for multiplying and adding the weight. The deformation of the invention gathers the effective data, solves the problem of space gap waste between Feature and Weight, combines the multiplexing characteristic of the data under the convolution calculation sliding window, caches the multiplexed data, improves the efficiency of dma data carrying, and enhances the performance of convolution calculation.

Description

Feature deformation method for calculating matching weight in memory
Technical Field
The invention relates to the technical field of in-memory operation, in particular to a characteristic deformation method for in-memory calculation of matching weights.
Background
The convolution calculation in the neural network needs to perform multiplication and addition calculation on the corresponding feature and weight, the traditional calculation is to load the feature and weight into a calculation unit at the same time, the data placement is flexible, registers for caching the feature and weight can be read and written repeatedly, but in the in-memory calculation, the weight is written in advance, and the higher requirement is provided for improving the weight density of CIM.
Access order: c '- > W- > H- > C- > N, corresponding to a cube of n×h×w×c=1×14×14×128, cutting C into 64×2, according to NCHWC' = "0- > 1- > 2- > 3- > 4- > 5. In the manner of storing weight with NCHWC ', generally different C's need to be placed in a fixed width, and in the format of aligning specific lengths (such as 32B), if calculated C '<32B, there is a gap between different C's, which is often a majority, so that CIM space is easily wasted. In addition, when features corresponding to weight are generated for different convolution calculations (general convolution and depth convolution), the software configuration is complicated by using two separate methods.
Disclosure of Invention
Based on the defects existing in the prior art, the invention provides a characteristic deformation method for calculating matching weights in memory, which comprises the following specific technical scheme:
the feature deformation method for calculating the matching weight in the memory comprises the following steps:
carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;
performing NCWH deformation on the NCHWC' format in iReg;
intercepting output data from the deformed data, broadcasting the output data to an in-memory computing unit interface, and multiplying and adding the output data with the weight.
Specifically, the iReg consists of 9 64B register sets.
In particular, the convolution kernels include 1*1, 3*3, 5*5, or 7*7.
Specifically, the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:
the convolution kernel is 1*1: the maximum number of supported input channels is 288, regshiftnum is typically 0, ireg fill position regOffset, regIndex is (0, 0);
the convolution kernel is 3*3: the maximum number of supported input channels is 32 or 64, regshiftnum is typically 24, ireg fill position regOffset, regIndex is (6, 0), (7, 0), (8, 0);
the convolution kernel is 5*5: the maximum number of input channels supported is 16, regshiftnum is typically 10, ireg fill position regOffset, regIndex is (5, 0), (5, 16), (5, 32), (2, 48), (6, 0);
the convolution kernel is 7*7: the maximum number of input channels supported is 5 or 8, regshiftnum is typically 7, ireg fill position regOffset, regIndex is (5, 16), (5, 24), (5, 32), (5, 40), (5, 48), (5, 56), (6, 0).
Specifically, the NCWH deformation includes the following steps:
ireg moves regShiftNum 8B from high address to low address, where regShiftNum is the number of register shifts;
the maximum number of input channels supported at intervals is counted, 576 registers are counted for the same times as the convolution kernel in size, and 64 groups of data in a two-dimensional array form are formed;
and sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing the format deformation.
Specifically, the regShiftNum is equal to the data that is not multiplexed divided by 8.
Specifically, the output data is located at the front end of the deformed data, where:
for a 32-channel general convolution, the output data includes 288B;
for a 64-channel depth convolution, every 9B of data extends to 288B, forming 64 sets of 288B of data, distributing the 64 sets of data onto the interfaces of the corresponding in-memory computing units.
The invention has the beneficial effects that:
(1) The deformation converges effective data, and the problem of space gap waste between features and Weight is solved.
(2) And combining the multiplexing characteristic of the data under the sliding window of the convolution calculation, caching the multiplexed data, improving the efficiency of dma data carrying and enhancing the performance of the convolution calculation.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a flow chart of a general convolution embodiment of the present invention;
FIG. 3 is a schematic flow diagram of a deep convolution embodiment of the present invention;
fig. 4 is a schematic diagram of the NCHWC' access sequence.
Detailed Description
For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.
The feature deformation method for calculating the matching weight in the memory comprises the following steps:
carrying NCHW format data to be deformed from an SRAM memory and filling the NCHW format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;
performing NCWH deformation on the NCHW format in iReg;
intercepting output data from the deformed data, and transmitting the intercepted output data to an interface of an in-memory computing unit for multiplying and adding the weight.
Specifically, the iReg consists of 9 64B register sets.
In particular, the convolution kernels include 1*1, 3*3, 5*5, or 7*7.
Specifically, the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:
Figure 696057DEST_PATH_IMAGE002
specifically, the NCWH deformation includes the following steps:
ireg moves regShiftNum x 8B from high address to low address;
the maximum number of input channels supported at intervals is counted, 576 registers are counted for the same times as the convolution kernel in size, and 64 groups of data in a two-dimensional array form are formed;
and sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing the format deformation.
Specifically, the regShiftNum is equal to data/8 that is not multiplexed.
Specifically, the output data is located at the front end of the deformed data, where:
for a 32-channel general convolution, the output data includes 288B;
for a 64-channel depth convolution, every 9B of data extends to 288B, forming 64 sets of 288B of data, distributing the 64 sets of data onto the interfaces of the corresponding in-memory computing units.
Example 1:
as shown in fig. 1, the feature deformation method for calculating matching weights in a memory provided by the invention comprises the following steps:
1. the feature is filled into ireg. The deformation of ireg varies for different kernel, as shown in fig. 2, kernel=3, and the positions of each filling are three, namely (6, 0), (7, 0), (8, 0), and the coordinates mean (the group address of ireg, the offset address in the ireg group).
2. ireg moves regShiftNum x 8B from high address to low address. Each period is filled with features to the same position of ireg, and in the case of kernel=3, regshiftnum=data (Byte)/8=64×3/8=24 that is not multiplexed, when stride=1, the W direction of features needs 2 periods (filling 3 periods) to start deformation when the W direction starts for the first time, after which 64×3b data is filled in each period, and 64×3b data is divided from low displacement, and each period can be deformed to generate data that participates in calculation.
The ireg filling for the different kernel is shown in the table below
Figure 941093DEST_PATH_IMAGE003
3. As shown in kernel=3 of fig. 2, the same offset that the deformation changed to each group (64B) is moved together (9B), forming 64 9B data. The same shift of the deformation into each group (16B) forms 16 25B data if kernel=5.
4. For general convolution, 288B is broadcast to the in-memory computing unit interface before interception, and for deep convolution, every 9B of data is extended to 288B (as shown in fig. 3), 64 sets of 288B of data are formed, and 64 sets of data are distributed to the corresponding in-memory computing unit interfaces.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims (5)

1. The characteristic deformation method for calculating the matching weight in the memory is characterized by comprising the following steps of:
carrying NCHWC 'format data to be deformed from an SRAM memory and filling the NCHWC' format data into a register group iReg, wherein filling addresses are different according to different convolution kernels;
performing NCWH deformation on the NCHWC' format in iReg; the NCWH deformation comprises the following steps:
ireg moves regShiftNum 8B from high address to low address, where regShiftNum is the number of register shifts;
the maximum number of input channels supported at intervals is counted, 576 registers are counted for the same times as the convolution kernel in size, and 64 groups of data in a two-dimensional array form are formed;
sequentially combining the extracted data of each group, combining the extracted data of 64 groups to form a one-dimensional array, and completing format deformation;
intercepting output data from deformed data, broadcasting the output data to an in-memory computing unit interface, and multiplying and adding the output data with weights; the output data is located at the front end of the deformed data, wherein:
for a 32-channel general convolution, the output data includes 288B;
for a 64-channel depth convolution, every 9B of data extends to 288B, forming 64 sets of 288B of data, distributing the 64 sets of data onto the interfaces of the corresponding in-memory computing units.
2. The method of claim 1, wherein iReg consists of 9 64B register sets.
3. The in-memory feature deformation method of computing matching weights of claim 1, wherein the convolution kernel comprises 1*1, 3*3, 5*5, or 7*7.
4. The method for feature deformation for in-memory computing of matching weights according to claim 3, wherein the correspondence among the convolution kernel, the number of supported channels, and the iReg filling address is:
the convolution kernel is 1*1: the maximum number of supported input channels is 288, regshiftnum is typically 0, ireg fill position regOffset, regIndex is (0, 0);
the convolution kernel is 3*3: the maximum number of supported input channels is 32 or 64, regshiftnum is typically 24, ireg fill position regOffset, regIndex is (6, 0), (7, 0), (8, 0);
the convolution kernel is 5*5: the maximum number of input channels supported is 16, regshiftnum is typically 10, ireg fill position regOffset, regIndex is (5, 0), (5, 16), (5, 32), (2, 48), (6, 0);
the convolution kernel is 7*7: the maximum number of input channels supported is 5 or 8, regshiftnum is typically 7, ireg fill position regOffset, regIndex is (5, 16), (5, 24), (5, 32), (5, 40), (5, 48), (5, 56), (6, 0).
5. The method of claim 1, wherein regShiftNum is equal to the data not multiplexed divided by 8.
CN202211472617.4A 2022-11-23 2022-11-23 Feature deformation method for calculating matching weight in memory Active CN115509467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211472617.4A CN115509467B (en) 2022-11-23 2022-11-23 Feature deformation method for calculating matching weight in memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211472617.4A CN115509467B (en) 2022-11-23 2022-11-23 Feature deformation method for calculating matching weight in memory

Publications (2)

Publication Number Publication Date
CN115509467A CN115509467A (en) 2022-12-23
CN115509467B true CN115509467B (en) 2023-04-28

Family

ID=84514282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211472617.4A Active CN115509467B (en) 2022-11-23 2022-11-23 Feature deformation method for calculating matching weight in memory

Country Status (1)

Country Link
CN (1) CN115509467B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10223333B2 (en) * 2014-08-29 2019-03-05 Nvidia Corporation Performing multi-convolution operations in a parallel processing system
CN111401510A (en) * 2019-09-24 2020-07-10 上海寒武纪信息科技有限公司 Data processing method and device, computer equipment and storage medium
CN111984189B (en) * 2020-07-22 2022-05-17 深圳云天励飞技术股份有限公司 Neural network computing device, data reading method, data storage method and related equipment
US20220129744A1 (en) * 2020-10-26 2022-04-28 Arm Limited Method for permuting dimensions of a multi-dimensional tensor
CN113672855A (en) * 2021-08-25 2021-11-19 恒烁半导体(合肥)股份有限公司 Memory operation method, device and application thereof
CN114995823A (en) * 2022-06-07 2022-09-02 重庆大学 Deep learning compiler optimization method for special accelerator for CNN

Also Published As

Publication number Publication date
CN115509467A (en) 2022-12-23

Similar Documents

Publication Publication Date Title
CN108229645B (en) Convolution acceleration and calculation processing method and device, electronic equipment and storage medium
KR102642853B1 (en) Convolution circuit, application processor having the same, and operating methoe thereof
CN111242277B (en) Convolutional neural network accelerator supporting sparse pruning based on FPGA design
CN102208005B (en) 2-dimensional (2-D) convolver
CN110991634B (en) Artificial intelligence accelerator, equipment, chip and data processing method
CN116541647A (en) Operation accelerator, processing method and related equipment
CN108573305B (en) Data processing method, equipment and device
EP3494542B1 (en) Method and system for correcting a distorted input image
CN103902507A (en) Matrix multiplication calculating device and matrix multiplication calculating method both oriented to programmable algebra processor
WO2014105154A1 (en) Systems, methods, and computer program products for performing mathematical operations
CN111931925B (en) Acceleration system of binary neural network based on FPGA
CN103760525A (en) Completion type in-place matrix transposition method
CN111768458A (en) Sparse image processing method based on convolutional neural network
CN108802726B (en) Synthetic aperture radar imaging method based on Graphic Processing Unit (GPU)
CN115509467B (en) Feature deformation method for calculating matching weight in memory
CN109324984B (en) Method and apparatus for using circular addressing in convolution operations
CN109446478B (en) Complex covariance matrix calculation system based on iteration and reconfigurable mode
CN111610963B (en) Chip structure and multiply-add calculation engine thereof
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN1105358C (en) Semiconductor memory having arithmetic function, and processor using the same
US20090063607A1 (en) Method and structure for fast in-place transformation of standard full and packed matrix data formats
CN103106181A (en) Realization method of large-point-number fast Fourier transform (FFT) on processor
US9715715B2 (en) Efficient cache preloading
CN113592075B (en) Convolution operation device, method and chip
CN116051345A (en) Image data processing method, device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 200120 building C, No. 888, Huanhu West 2nd Road, Lingang New Area, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant after: Suzhou Yizhu Intelligent Technology Co.,Ltd.

Address before: 200120 building C, No. 888, Huanhu West 2nd Road, Lingang New Area, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant before: Shanghai Yizhu Intelligent Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant