WO2022265044A1 - 演算処理装置 - Google Patents
演算処理装置 Download PDFInfo
- Publication number
- WO2022265044A1 WO2022265044A1 PCT/JP2022/023990 JP2022023990W WO2022265044A1 WO 2022265044 A1 WO2022265044 A1 WO 2022265044A1 JP 2022023990 W JP2022023990 W JP 2022023990W WO 2022265044 A1 WO2022265044 A1 WO 2022265044A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pooling
- data
- convolution operation
- operation result
- result data
- Prior art date
Links
- 238000011176 pooling Methods 0.000 claims abstract description 221
- 239000000872 buffer Substances 0.000 claims description 45
- 238000004364 calculation method Methods 0.000 description 32
- 238000000034 method Methods 0.000 description 29
- 230000008569 process Effects 0.000 description 22
- 230000006870 function Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 101100167365 Caenorhabditis elegans cha-1 gene Proteins 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 10
- 230000004913 activation Effects 0.000 description 9
- 238000000605 extraction Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 101800000089 Movement protein P3N-PIPO Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3287—Power saving characterised by the action undertaken by switching off individual functional units in the computer system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C5/00—Details of stores covered by group G11C11/00
- G11C5/14—Power supply arrangements, e.g. power down, chip selection or deselection, layout of wirings or power grids, or multiple supply levels
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5011—Pool
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to an arithmetic processing device.
- Arithmetic processing devices that perform image recognition, etc. using convolutional neural networks, that is, neural networks with convolutional layers, are known, and are expected to be applied to robot control and car driving control.
- Convolutional computation processing and pooling processing are performed in convolutional neural networks such as those used for image recognition.
- convolution operation processing an enormous sum-of-products operation is performed by weighting and adding the data of the input layer and the intermediate layer using the weight data of the convolution filter.
- the pooling process for example, extraction of the maximum value and calculation of the average value are performed from a plurality of convolution calculation results obtained in the convolution calculation process.
- power gating which cuts off the power supply to arithmetic circuits such as processor cores and suppresses leakage current, is known as a technique for reducing power consumption.
- the present invention has been made in view of the above circumstances, and an object of the present invention is to provide an arithmetic processing device capable of further reducing power consumption.
- an arithmetic processing apparatus of the present invention has a convolution operation unit for sequentially outputting convolution operation result data, a pooling operation circuit, and a nonvolatile storage circuit for pooling, wherein the storage circuit for pooling holds the convolution operation result data or the operation result of the pooling operation circuit as hold data, and each time the convolution operation result data from the convolution operation unit is input, the pooling operation circuit uses the hold data Power supply to the pooling storage circuit is interrupted between a pooling processing unit that calculates and outputs pooled data obtained by pooling processing in the pooling area and a waiting time for input of the convolution operation result data from the convolution operation unit. and a power gating section.
- the arithmetic processing device of the present invention includes a convolution operation unit for sequentially outputting the convolution operation result data in the row direction of the channel for each row of a channel in which a plurality of convolution operation result data are two-dimensionally arranged, a pooling operation circuit, and Pooling having a non-volatile storage circuit for pooling, and outputting the convolution operation result data having the maximum value in each pooling area in which a plurality of the convolution operation result data are divided into 2 rows and 2 columns of the channel as pooling data.
- the pooling storage circuit has buffers connected to Y+2 stages where the number of columns of the channels is Y (Y is an even number equal to or greater than 2), and the convolution operation from the convolution operation unit
- Y is an even number equal to or greater than 2
- the pooling operation circuit holds and outputs the convolution operation result data stored in the pooling operation circuit, and the pooling operation circuit is a data group consisting of the convolution operation result data from each of the buffers at the first stage, the second stage, the Y+1 stage, and the Y+2 stage.
- a comparator for comparing each of the convolution operation result data of the data group, and selecting and outputting the convolution operation result data having the maximum value in the data group based on the comparison result of the comparator and the pooling processing unit performs the convolution operation output from the selector when each of the convolution operation result data of the data group is a combination of the convolution operation result data in one pooling area. It outputs the result data as pooling data.
- the power gating unit cuts off the power supply to the nonvolatile storage circuit for pooling.
- the power consumption of the processing device can be made smaller.
- the number of buffers that hold convolution operation result data can be made smaller (number of columns + 2) than the number of channel element data (number of columns x number of rows), thereby further reducing power consumption. can be done.
- FIG. 4 is an explanatory diagram showing an example of connected layers of a convolutional neural network
- FIG. 10 is an explanatory diagram showing the relationship between the movement of the position of the convolutional area and the pooling area
- 3 is a block diagram showing the configuration of an arithmetic unit
- FIG. 3 is a block diagram showing the configuration of a convolution arithmetic circuit
- FIG. 10 is an explanatory diagram showing the state of convolution operation processing when performing an operation on the first element data of the next hierarchy by channel parallelism;
- FIG. 10 is an explanatory diagram showing the state of convolution operation processing when performing an operation on the first element data of the next hierarchy by channel parallelism;
- FIG. 11 is an explanatory diagram showing the state of convolution operation processing when performing operation on the second element data of the next layer by channel parallelism;
- FIG. 10 is an explanatory diagram showing a period in which the PG switch is turned off;
- FIG. 4 is a block diagram showing a configuration example of a pooling processing unit that performs average value pooling processing;
- FIG. 4 is a block diagram showing a configuration example of a pooling processing unit that performs weighted average value pooling processing;
- FIG. 10 is a block diagram showing a pooling processing unit configured by buffers in which register units are connected in multiple stages;
- the arithmetic processing unit 10 performs arithmetic processing based on a convolutional neural network.
- the arithmetic processing unit 10 includes an arithmetic unit 11, a memory unit 12, and a power gating control unit 14 that perform convolution arithmetic processing using a convolution filter and pooling processing on a channel (also referred to as a characteristic plane). has a controller 15 for controlling the
- the arithmetic unit 11 is provided in parallel with k (k is an integer equal to or greater than 2) arithmetic units 17 that perform convolution arithmetic processing and pooling processing, as will be described later in detail.
- a plurality of layers are connected in the convolutional neural network on which the arithmetic processing device 10 is based. Each layer has one or more channels.
- the first layer is the input layer, which is an image consisting of RGB channels, for example.
- the 1st to 4th layers are connected.
- the first hierarchy consists of three channels ch1-1 to ch1-3
- the second hierarchy consists of four channels ch2-1 to ch2-4
- the third hierarchy consists of three channels ch3-1 to ch3-3.
- the fourth hierarchy has three channels ch4-1 to ch4-3, respectively.
- the first and second hierarchies are the hierarchies to be subjected to convolution operation processing.
- Channels ch2-1 to ch2-4 are generated, and channels ch3-1 to ch3-3 of the third hierarchy are generated from channels ch2-1 to ch2-4 of the second hierarchy.
- the third hierarchy is a hierarchy targeted for pooling processing, and channels ch4-1 to ch4-3 of the fourth hierarchy are generated from channels ch3-1 to ch3-3 of the third hierarchy by pooling processing.
- each layer can have one or more channels.
- the number of channels may increase or decrease in the preceding and succeeding layers, but the number of channels may not change.
- the number of channels is the same in the layers before and after.
- Hierarchies may be three hierarchies or five or more hierarchies.
- the calculation unit 11 generates the (n+1)th layer by performing convolution calculation processing or pooling processing on the channel of the nth layer, where n is an integer equal to or greater than 1.
- Hierarchy generation is to generate each channel that constitutes the hierarchy, and channel generation is to calculate each element data that constitutes the channel.
- the n-th layer may be called the previous layer with respect to the n+1-th layer, and the n+1-th layer may be called the next layer with respect to the n-th layer. Therefore, the channel of the next layer is generated by convolution operation processing and pooling processing for the channel of the previous layer.
- a channel consists of multiple element data arranged two-dimensionally.
- a two-dimensional array of element data is an array on the data structure, and each element data is positioned in two variables (row and column in this explanation), and the positional relationship between element data is identified. This means that location information is attached to The same applies to load data, which will be described later.
- the size of each channel that is, the number of element data in the row and column directions is arbitrary and not particularly limited. Although two-dimensional channels are described in this example, one-dimensional or three-dimensional channels may be used.
- element data is calculated by convolution calculation.
- the element data calculated by the convolution operation is a value obtained by adding the results of applying a convolution filter to each element data in the convolution area for each channel in the previous layer, and adding the data in each convolution area at the same position of each channel. is.
- the application of the convolution filter is to obtain the sum-of-products operation result of the element data in the convolution area and the weight data of the convolution filter.
- a convolution filter is a two-dimensional array of weight data that serves as a weight for element data.
- one convolution filter is composed of 3 ⁇ 3 (3 rows by 3 columns) weight data.
- Each weight data of the convolution filter is set to a value according to the purpose of the convolution filter.
- a convolution filter corresponding to the combination of the previous layer channel and the next layer channel is used.
- the convolution region defines the range over which the convolution filter is applied on the channel, and has the same array size as the convolution filter (3 rows by 3 columns in this example).
- the load data of the convolution filter and the element data of the convolution region are multiplied at corresponding positions.
- the convolution area is moved so as to scan the entire channel while moving the position of the area by one piece of element data, and the element data calculation process is performed each time the convolution area is moved.
- a convolution operation is performed on each channel of the next layer using all the channels of the previous layer. Also, a convolution operation is performed using a convolution filter corresponding to the combination of the channels of the previous layer and the channels of the next layer.
- any number of channels of the previous layer can be used to generate one channel of the next layer, and one channel of the previous layer can be used to generate one channel of the next layer. can also be used.
- all or part of a plurality of convolution filters used in one layer may be a common weight array.
- one convolution filter with the common weight array may be prepared and used when calculating a plurality of channels.
- each channel in the next layer is generated by reducing the size in the row and column directions from each channel in the previous layer.
- a maximum value pooling process of extracting the maximum value from a pooling area of 2 rows and 2 columns is performed. For this reason, each channel is divided into a plurality of pooling regions of 2 rows and 2 columns so as not to overlap each other, and for each of these pooling regions, the maximum value element data in the region is output as a result of pooling processing.
- the size of the pooling area is not limited to 2 rows and 2 columns.
- One of p and q may be an integer of 1 or more, the other may be an integer of 2 or more, and the pooling area may be p rows and q columns. Further, instead of the maximum value pooling process, as will be described later, an average value pooling process of outputting the average value of the element data in the pooling area may be performed. Further convolution processing can be performed on the hierarchy consisting of the channels reduced by the pooling processing. The pooling areas can be divided so that they partially overlap each other. In this case, the pooling process can be performed so as to generate the next-layer channels having the same size in the row direction and column direction as the previous layer channels.
- the convolution area Ra in this example has 3 rows and 3 columns.
- the convolution area Ra in the channel ChA of the previous layer is calculated as shown in FIG.
- the position shown in (A) the position shifted by one element data in the row direction from the position in FIG. 3 (A) as shown in FIG. 3(D), and then to a position shifted by one element data in the row direction from the position shown in FIG. 3(C) as shown in FIG. 3(D).
- the element data in one pooling area Rb in the channel ChB of the next layer are continuously calculated.
- the order of movement of the convolutional regions Ra is not limited to the above order as long as the element data in one pooling region Rb is calculated continuously.
- the memory unit 12 stores the load data of the convolution filter, the layer to which the convolution operation process is applied, that is, the element data of each channel of the previous layer, and the convolution operation result data and pooling data, that is, each channel of the next layer. element data is written. Element data obtained by the convolution operation is handed over to the pooling process inside the arithmetic unit 17 for the layer targeted for the pooling process, so it is not written to the memory unit 12 .
- the power gating control unit 14 controls power supply in each arithmetic unit 17 under the control of the controller 15, that is, controls power gating, as will be described later in detail.
- the computation unit 17 has a convolution computation section 21 that performs convolution computation, a pooling processing section 22 that performs pooling (extraction of the maximum value in this example), and an activation function processing section 23 .
- the arithmetic unit 17 is provided with a bit number adjusting circuit (not shown) for converting the data length of the element data output from the convolution arithmetic unit 21 into a predetermined data length.
- the convolution operation unit 21 performs a convolution operation to obtain element data.
- the convolution calculation unit 21 calculates one element data by one convolution calculation.
- the convolution calculation unit 21 sequentially switches the channels of the previous layer, and calculates 9 element data in the convolution area and 9 elements of the convolution filter for each channel of the previous layer. Load data is entered.
- the element data from the convolution calculation unit 21 is input to the activation function processing unit 23 and converted using the activation function.
- the activation function include a step function, a sigmoid function, a normalized linear function (ReLU: Rectified Linear Unit), a leaky normalized linear function (Leaky ReLU), a hyperbolic tangent function, and the like.
- Element data passed through the activation function processing unit 23 is sent to the memory unit 12 and the pooling processing unit 22 as element data of the next layer.
- the pooling processing unit 22 performs the pooling processing described above and outputs the element data that has the maximum value within the pooling area. Power supply to the pooling processing unit 22 is controlled by the power gating control unit 14 .
- the element data obtained by the convolution operation by the convolution operation unit 21 (including the data passed through the activation function processing unit 23) is particularly referred to as the convolution operation result data, and is obtained by the pooling processing by the pooling processing unit 22.
- the element data that is stored is sometimes called pooling data.
- the convolution calculation unit 21 is composed of the same number of multipliers 24, multiplexers 25, adders 26, registers 27, etc. as the weight data of the convolution filter (nine in this example).
- Each multiplier 24 receives element data and load data, and outputs a multiplication result obtained by multiplying them.
- the multiplexer 25 selects and outputs the multiplication results from each multiplier 24 one by one.
- Register 27 holds the addition result of adder 26 .
- the adder 26 adds the multiplication result from the multiplexer 25 and the data held in the register 27 each time one multiplication result is output from the multiplexer 25 and causes the register 27 to hold the addition result.
- the element data of each channel of the previous layer and the weight data of the convolution filter are input to the convolution operation unit 21, and finally the addition result held in the register 27 is output as the convolution operation result data (element data).
- the configuration of the convolution calculation unit 21 is not limited to this.
- the pooling processing unit 22 has a pooling arithmetic circuit 31 and a register 32 as a non-volatile storage circuit for pooling.
- the pooling arithmetic circuit 31 cooperates with the register 32 to perform extraction processing for extracting the element data having the maximum value within the pooling area.
- This pooling arithmetic circuit 31 is composed of a comparator 33 and a multiplexer 34 .
- the register 32 is composed of, for example, a plurality of nonvolatile flip-flops (NV-FF) using magnetic tunnel junction (MTJ) elements.
- NV-FF nonvolatile flip-flops
- MTJ magnetic tunnel junction
- a nonvolatile flip-flop using a magnetic tunnel junction device has a smaller substrate size than other nonvolatile flip-flops, and is advantageous in convolutional neural networks that require high-density integration. Since it is low, it is advantageous in reducing power consumption.
- the register 32 Since the register 32 is non-volatile, it retains data even when the power supply is cut off, and by supplying power, it is possible to read and output the retained data when the power is cut off.
- This register 32 holds the element data selected by the multiplexer 34 as holding data.
- the register 32 is reset each time the output of the maximum value of the pooling area is completed, and the content held therein is set to the initial value (value "0"). Note that the configuration of the pooling storage circuit is not limited to the above.
- the element data from the convolution operation unit 21 and the element data held by the register 32 are input to the comparator 33 and multiplexer 34 that constitute the pooling operation circuit 31 via the activation function processing unit 23 .
- the comparator 33 compares the two input element data and outputs a selection signal to the multiplexer 34 to select the element data with the larger value.
- the multiplexer 34 functions as a selector and selects and outputs one of the input element data based on the selection signal.
- element data having a larger value between the element data from the convolution operation unit 21 and the element data held by the register 32 is output from the multiplexer 34, and the output element data is transferred to the register 32 as new held data. is held to
- a drive voltage (VDD) is applied to the register 32 via the PG switch 35 .
- the PG switch 35 constitutes a power gating section together with the power gating control section 14 .
- the PG switch 35 is composed of a MOS transistor or the like, and is controlled to be turned on/off by the power gating control section 14 .
- the register 32 receives power and becomes capable of writing and outputting (reading) data.
- the PG switch 35 is turned off, no drive voltage is applied to the register 32, that is, the power supply is cut off, making it impossible to write and output data. Thereby, power gating can be performed on the register 32 .
- the PG switch 35 is provided for each pooling processing unit 22, but one PG switch 35 common to each pooling processing unit 22 may be provided.
- the power gating control unit 14 turns on the PG switch 35 at least while writing and outputting element data to the register 32, and turns off the PG switch 35 during other periods to reduce power consumption. Reduce.
- the PG switch 35 is turned off while the pooling processing unit 22 is waiting for input of element data from the convolution operation unit 21, that is, while the pooling processing unit 22 does not perform processing. , otherwise the PG switch 35 is turned on.
- new element data is generated by the processing of the pooling operation circuit 31 from the timing of the output of the element data from the convolution operation unit 21, that is, the input of the element data to the pooling processing unit 22. This is the period until it is held in the register 32, or until the output of the pooled data is completed.
- the PG switch 35 is turned off except during the pooling processing period.
- each element data of the pooling region sequentially calculated by the convolution operation unit 21 as described above is input to the pooling processing unit 22 each time the element data is calculated.
- the pooling processing period is started when the convolution operation processing for generating the channel of the layer targeted for the pooling processing is started, or when the first element data in the layer targeted for the pooling processing is sent to the pooling processing unit 22. at the time of input.
- the end of the pooling processing period is when the final element data of the channel generated by the pooling processing has been output from the pooling processing unit 22 .
- the pooling process starts when the convolution operation for calculating the element data of the pooling area is started or when the first element data of the pooling area is input to the pooling processing unit 22. This is the start of the period, and the end of the pooling processing period is the completion of the output of the element data that is the maximum value of the pooling area held in the register 32 .
- the output of the element data held in the register 32 is completed when the element data output from the register 27 is acquired by the circuit to be acquired from the register 27 . In this example, the output of the element data is completed when the memory unit 12 latches the element data.
- the arithmetic processing unit 10 uses the K arithmetic units 17 provided in the arithmetic unit 11 to perform convolution arithmetic processing in a mode called channel parallelism in which one element data is calculated in parallel for each of k channels of the next layer. I do. Further, when the layer generated by the convolution operation process is subject to the pooling process, the arithmetic processing unit 10 performs pooling as described above every time one element data is calculated for each of the k channels of the next layer. Move the convolution region in the k-channel so as to successively calculate the multiple element data in the region. In addition, when the next layer generated by the convolution operation processing is not subject to the pooling processing, the element data may be calculated in a manner other than the above.
- the action of the above configuration will be described for the case where the n+1th layer is generated by convolution operation processing on the nth layer, and the n+2th layer is generated by performing pooling processing on the n+1th layer.
- the nth layer consists of channels ChA1, ChA2, . Assume that n+1 layers are generated.
- each arithmetic unit 17 of the arithmetic unit 11 performs an arithmetic operation to apply a convolution filter to the convolution area Ra of the first channel ChA1 of the n-th hierarchy.
- Nine element data of the convolution area Ra of the channel ChA1 are read out from the memory section 12 and input to the convolution calculation section 21 of each calculation unit 17 .
- nine weight data of one convolution filter are input to one convolution operation unit 21, and the first channel ChA1 and the first to k-th channels ChB1, ChB2, . . . corresponding to the convolution filters FA1B1, FA1B2, . . .
- each convolution operation unit 21 multiplies the input element data of the convolution area Ra of the channel ChA1 and the weight data of the convolution filter by the corresponding data, and the sum of the multiplication results is the sum of products.
- the results are stored in the registers 27 (FIG. 7(A)).
- each calculation unit 17 performs a calculation to apply a convolution filter to the convolution area Ra at the same position as the first channel ChA1 in the second channel ChA2 of the n-th layer.
- Nine element data of the convolution area Ra in the channel ChA2 are input to the convolution operation section 21 of each operation unit 17, and the second channel ChA2 and the first to kth channels ChB1, ChB2, .
- Each arithmetic unit 17 corresponds to one channel of the next hierarchy, and the corresponding channel of the next hierarchy does not change, for example, until the calculation of all the element data of the k channel is completed. For this reason, for example, in the computation unit 17 to which the weight data of the convolution filter FA1B1 corresponding to the second channel ChB1 is input during computation targeting the first channel ChA1 in the previous hierarchy, the second channel ChA2 , the load data of the convolution filter FA2B2 corresponding to the second channel ChB2 is also input.
- each convolution calculation unit 21 stores the product-sum result obtained by applying the convolution filter to the convolution area Ra of the channel ChA1 as follows: A value obtained by adding the sum of products obtained by applying the convolution filter to the convolution area Ra of the channel ChA2 is stored (FIG. 7(B)).
- the convolution calculation section 21 of each calculation unit 17 sequentially performs calculations for applying the convolution filter to the convolution area Ra at the same position as the first channel ChA1 for each of the third and subsequent channels of the n-th layer. .
- the register 27 of each convolution calculation unit 21 stores the sum of products obtained by applying the convolution filter to the convolution area Ra of each channel of the previous layer.
- the sum of the results that is, the 1st element data for each of the 1st to kth channels in the next layer is stored.
- the first element data (convolution operation result data) thus obtained is output from the convolution operation unit 21 .
- the convolution operation section 21 of each operation unit 17 shifts the convolution area Ra by one piece of element data in the row direction as shown in FIG. , to the convolution area Ra of the first channel ChA1 in the previous hierarchy, applying the convolution filters FA1B1, FA1B2, . . . After that, in the same procedure, as shown in FIG. conduct. Thereafter, in the same manner, operations are sequentially performed to apply a convolution filter to the convolution area Ra for each of the third and subsequent channels of the previous layer, and the second element data is calculated for each of the first to k-th channels, and the convolution operation is performed. Output from the unit 21 .
- the convolution area Ra is shifted by one piece of element data in the column direction from the first position, and the third element data is calculated and output by the convolution operation unit 21 by the same procedure as above. .
- the convolution area Ra is shifted by one piece of element data in the row direction, and the fourth element data is calculated and output by the convolution operation unit 21 by the same procedure as above. In this way, the four element data of the pooling area to be pooled are calculated continuously.
- the pooling processing unit 22 is in a state of waiting for input of element data during the period T1 when the convolution operation unit 21 is performing the convolution operation.
- the power gating control unit 14 turns off the PG switch 35 and cuts off the power supply to the registers 32 of the pooling processing units 22 .
- the PG switch 35 is turned on by the power gating control unit 14 at the timing when the first element data (convolution operation result data) is output from the convolution operation unit 21 .
- power is supplied to the register 32 of each pooling processing unit 22 and data can be written.
- the input element data and the data held in the register 32 are compared by the comparator 33. and the multiplexer 34 is controlled based on the result of the comparison. Since the register 32 is reset and holds the initial value (value "0") when the convolution operation is started, the input element data is selected by the multiplexer 34 and the element data is transferred to the register 32. written.
- the pooling processing unit 22 waits for input of the second element data calculated by the convolution operation unit 21 in the period T2.
- the power gating control unit 14 turns off the PG switch 35 and cuts off the power supply to the register 32 of each pooling processing unit 22 .
- the power gating control unit 14 turns on the PG switch 35 to supply power to the register 32 .
- the second element data output from the convolution operation unit 21 is compared with the element data held in the register 32 by the comparator 33, and the multiplexer 34 is controlled based on the comparison result. Since the register 32 is non-volatile, it outputs the data held before the power shutdown when the power supply is restarted. Therefore, the second element data output from the convolution operation section 21 is compared with the element data held in the register 32 by the comparator 33, and the multiplexer 34 is controlled based on the comparison result.
- the comparator 33 compares the first element data and the second element data. By this comparison, the multiplexer 34 selects the element data having the larger value among those element data, and the selected element data is written to the register 32 .
- the convolution operation unit 21 waits for the third element data calculated in the period T3, the PG switch 35 is turned off, and the register of each pooling processing unit 22 is turned off. 32 is cut off.
- the power gating control unit 14 turns on the PG switch 35 to supply power to the register 32 . Then, the third element data and the element data held in the register 32 are compared by the comparator 33 . A multiplexer 34 selects element data having a large value among those element data, and the selected element data is written to the register 32 .
- the convolution operation unit 21 waits for the fourth element data calculated in the period T4, the PG switch 35 is turned off, and each pooling processing unit 22 is transferred to the register 32. Power supply is interrupted.
- the PG switch 35 is turned on and power is supplied to the register 32 . Then, the fourth element data is compared with the element data held in the register 32 , and the element data having the larger value is selected and written to the register 32 .
- the register 32 holds the element data having the largest value among the 1st to 4th element data in the pooling area of the (n+1)th layer, and the element data held in this register 32 becomes the (n+2)th layer. is output from the pooling processing unit 22 as element data (pooling data).
- element data (pooling data)
- the PG switch 35 is turned off, and the register 32 power is cut off.
- the convolution operation unit 21 After outputting the fourth element data, the convolution operation unit 21 further moves the convolution area Ra and calculates the element data for the next pooling area in the same procedure as above.
- the pooling processing unit 22 compares the element data with the data held in the register 32 each time the first to fourth element data are output, thereby determining the largest value in the new pooling area.
- the element data is held in the register 32, and the element data is output as the element data of the (n+2)th layer. While the pooling processing unit 22 is waiting for the input of the element data as described above, the PG switch 35 is turned off and the power supply to the register 32 is cut off.
- the arithmetic processing unit 10 performs the pooling processing by the pooling processing unit 22. While the pooling processing unit 22 is waiting for the input of the element data, the power game is performed so as to cut off the power supply to the register 32. ting. Therefore, the leakage current of the register 32 is suppressed while waiting for input of the element data, and the power consumption of the arithmetic processing unit 10 is small.
- Nonvolatile configuration Power consumption (operating power consumption) when the register 32 is configured with a nonvolatile flip-flop using a magnetic tunnel junction element (hereinafter referred to as a nonvolatile configuration) when configured with a normal nonvolatile flip-flop
- a calculated ratio to the power consumption (power consumption during operation+power consumption during standby) in (hereinafter referred to as normal configuration) can be set to 0.22, for example.
- the operating power consumption of the non-volatile configuration is assumed to be 10 times that of the normal configuration, and the normal configuration is assumed to have “standby power consumption:operating power consumption” of “30:110”, and the registers in the arithmetic processing unit 10
- the ratio of the number of operation cycles of 32 to the number of standby cycles is calculated as "0.006".
- FIG. 10 shows an example in which the pooling processing unit 22 is configured to perform average value pooling processing for outputting the average value of the element data in the pooling area.
- the pooling processing unit 22 is composed of a pooling arithmetic circuit 31 and a register 42.
- the pooling arithmetic circuit 31 is composed of an adder 43 and a 2-bit shifter 44, and cooperates with the register 42 to average the element data of the pooling area. Calculate the value.
- the adder 43 adds the data held in the register 42 and the input element data, which is the convolution operation result data.
- the register 42 is a nonvolatile storage circuit for pooling and holds the addition result of the adder 43 .
- the 2-bit shifter 44 is a bit shift circuit and is provided as a divider. The 2-bit shifter 44 shifts the result of addition by the adder 43 up to the final element data in the pooling area by 2 bits, thereby calculating the quotient obtained by dividing by the number of element data in the pooling
- the pooling processing unit 22 converts element data (pooling data).
- the register 42 is non-volatile like the register 32 (see FIG. 6), and is power gated by turning the PG switch 35 on and off. Therefore, during the pooling processing period, the PG switch 35 is turned off while the pooling processing unit 22 is waiting for input of element data from the convolution operation unit 21, that is, while the pooling processing unit 22 is not performing processing. power supply is cut off. Thereby, the power consumption of the arithmetic processing unit 10 is reduced.
- a multiplier 45 that multiplies a predetermined weight may be provided in the preceding stage of the adder 43, and the element data input as the result of the convolution operation may be weighted according to the position of the element data. good.
- the weight can be, for example, a two-dimensional Gaussian weight.
- FIG. 12 shows an example of the pooling processing unit 22 configured by buffers 51a in which register units 51 as storage circuits for pooling are provided in multiple stages.
- the pooling processing unit 22 in this example includes a register unit 51, a comparator 52, and a multiplexer 53 as a selector. Element data (convolution operation result data) from the convolution operation unit 21 is input to the register unit 51 .
- the pooling area is 2 rows and 2 columns.
- the convolution operation unit 21 calculates element data for each row in order from the first row, and for each row, sequentially from one end of the row to the other end. Calculate element data.
- the number of columns of channels in the layer to be pooled is an even number.
- Each bit of data is input to each buffer 51a in parallel, and each buffer 51a holds data that is input in synchronization with a clock, for example, and outputs the held data in parallel.
- a parallel input parallel output type PIPO: Parallel-In, Parallel-Out
- the register unit 51 includes buffers 51a in (Y+2) stages. is connected.
- Each buffer 51a is connected in multiple stages so that the output of the previous stage is input to the buffer 51a of the subsequent stage. That is, the element data from the convolution operation unit 21 is input to the first-stage buffer 51a via the activation function processing unit 23, and the second-stage and subsequent buffers 51a receive the output from the preceding-stage buffer 51a as the subsequent-stage buffer. 51a.
- a clock is input to each buffer 51a in synchronization with the input of element data to the buffer 51a of the first stage.
- Element data from the buffers 51a of the first stage, the second stage, the Y+1 stage, and the Y+2 stage are input to the comparator 52 and the multiplexer 53 that constitute the pooling operation circuit 31 as a data group.
- the comparator 52 compares the four input element data and outputs a selection signal for selecting and outputting the element data with the largest value.
- the multiplexer 53 selects and outputs one of the input four element data based on the selection signal.
- the pooling arithmetic circuit 31 stores the four data elements output (held) by the buffers 51a of the first stage, the second stage, the Y+1 stage, and the Y+2 stage within one pooling area.
- comparison by the comparator 52 and selection by the multiplexer 53 are performed. Specifically, where m is an integer of 1 or more, after the (2m-1) Y-th element data is input, two element data are input until the 2m-Y-th element data is input. is input, comparison by the comparator 52 and selection by the multiplexer 53 are performed.
- the multiplexer 53 selects the element data having the maximum value of each pooling area divided into a plurality of 2 rows and 2 columns without overlapping the channels to be pooled. are output as pooling data.
- the register unit 51 is configured as a nonvolatile memory circuit. That is, each buffer 51a is non-volatile. As in the other examples, it is preferable to configure the buffer 51a with a plurality of nonvolatile flip-flops (NV-FF) using magnetic tunnel junction elements.
- a driving voltage (VDD) is applied to the register unit 51 via the PG switch 35, and during the pooling processing period, while the pooling processing unit 22 is waiting for input of element data from the convolution operation unit 21, , the PG switch 35 is turned off to reduce power consumption.
- the PG switch 35 When outputting the pooling data, the PG switch 35 is turned on until the comparison of the element data by the comparator 52 and the selection by the multiplexer 53 are performed, and the output from the multiplexer 53 is completed. While the unit 22 does not need to operate, the PG switch 35 is turned off and power supply to the register unit 51 is interrupted.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Power Engineering (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
Description
14 パワーゲーティング制御部
21 畳み込み演算部
22 プーリング処理部
32、42 レジスタ
33 比較器
34 マルチプレクサ
51 レジスタ部
51a バッファ
52 比較器
53 マルチプレクサ
Claims (14)
- 畳み込み演算結果データを順次に出力する畳み込み演算部と、
プーリング演算回路及び不揮発性のプーリング用記憶回路を有し、前記プーリング用記憶回路が前記畳み込み演算結果データまたは前記プーリング演算回路の演算結果を保持データとして保持し、前記畳み込み演算部からの前記畳み込み演算結果データが入力されるごとに、前記プーリング演算回路が前記保持データを用いてプーリング領域にプーリング処理を行ったプーリングデータを算出して出力するプーリング処理部と、
前記畳み込み演算部からの前記畳み込み演算結果データの入力待ちの間における前記プーリング用記憶回路への電力供給を遮断するパワーゲーティング部と
を備えることを特徴とする演算処理装置。 - 前記プーリング演算回路は、前記畳み込み演算部からの前記畳み込み演算結果データと前記保持データとを比較する比較器と、前記畳み込み演算部からの前記畳み込み演算結果データと前記保持データとが入力され、前記比較器の比較結果に基づき、入力されるデータのうちの値の大きなデータを選択して出力するセレクタとを有し、
前記プーリング用記憶回路は、前記プーリング演算回路の出力するデータを新たな前記保持データとして保持し、
前記プーリング処理部は、前記プーリング演算回路への前記プーリング領域の各前記畳み込み演算結果データの入力により前記プーリング用記憶回路が保持する前記保持データをプーリングデータとして出力する
ことを特徴とする請求項1に記載の演算処理装置。 - 前記プーリング演算回路は、前記畳み込み演算部からの前記畳み込み演算結果データと前記保持データとを加算する加算器と、前記加算器の加算結果を前記プーリング領域内の前記畳み込み演算結果データの個数で除算する除算器とを有し、
前記プーリング用記憶回路は、前記加算器の加算結果を新たな前記保持データとして保持し、
前記プーリング処理部は、前記プーリング演算回路への前記プーリング領域の各前記畳み込み演算結果データの入力により得られる前記加算器の加算結果を前記除算器で除算したデータをプーリングデータとして出力する
ことを特徴とする請求項1に記載の演算処理装置。 - 前記プーリング演算回路は、前記畳み込み演算部からの前記畳み込み演算結果データに所定の重みを乗算して重み付けする乗算器と、前記乗算器からの乗算結果と前記保持データとを加算する加算器と、前記加算器の加算結果を前記プーリング領域内の前記畳み込み演算結果データの個数で除算する除算器とを有し、
前記プーリング用記憶回路は、前記加算器の加算結果を新たな前記保持データとして保持し、
前記プーリング処理部は、前記プーリング演算回路への前記プーリング領域の各前記畳み込み演算結果データの入力により得られる前記加算器の加算結果を前記除算器で除算したデータをプーリングデータとして出力する
ことを特徴とする請求項1に記載の演算処理装置。 - 前記除算器は、前記プーリング領域の各前記畳み込み演算結果データの個数に応じたビット数でデータをシフトするビットシフト回路であることを特徴とする請求項3または4に記載の演算処理装置。
- 複数の前記畳み込み演算結果データが2次元配列されたチャネル上のp行q列の前記プーリング領域内の前記畳み込み演算結果データが前記プーリング処理部に入力されることを特徴とする請求項1ないし5のいずれか1項に記載の演算処理装置。
- 前記プーリング領域は、2行2列であることを特徴とする請求項6に記載の演算処理装置。
- 前記プーリング用記憶回路は、不揮発性のレジスタにより構成されていることを特徴とする請求項1ないし7のいずれか1項に記載の演算処理装置。
- 前記レジスタは、不揮発性のフリップフロップにより構成されることを特徴とする請求項8に記載の演算処理装置。
- 前記畳み込み演算部は、複数の前記畳み込み演算結果データが2次元配列されるチャネルの行ごとに前記畳み込み演算結果データを前記チャネルの行方向に順次に出力し、
前記プーリング処理部は、複数の前記畳み込み演算結果データを前記チャネルの2行2列ごとに区分した各プーリング領域について最大値となる前記畳み込み演算結果データをプーリングデータとして出力し、
前記プーリング用記憶回路は、前記チャネルの列数をY(Yは2以上の偶数)として、Y+2段に接続された不揮発性のバッファを有し、前記畳み込み演算部からの前記畳み込み演算結果データが1段目のバッファに入力されるごとに、1段目のバッファが入力される前記畳み込み演算結果データを保持して出力し、2段目以降の各バッファが前段のバッファから出力されている前記畳み込み演算結果データを保持して出力し、
前記プーリング演算回路は、1段目、2段目、Y+1段目及びY+2段目の各バッファからの各前記畳み込み演算結果データからなるデータ群が入力され、前記データ群の各前記畳み込み演算結果データを比較する比較器と、前記比較器の比較結果に基づいて前記データ群のうちで最大値となる前記畳み込み演算結果データを選択して出力するセレクタとを有し、
前記プーリング処理部は、前記データ群の各前記畳み込み演算結果データが1つの前記プーリング領域内の前記畳み込み演算結果データの組み合わせとなるときの前記セレクタから出力されている前記畳み込み演算結果データをプーリングデータとして出力する
ことを特徴とする請求項1に記載の演算処理装置。 - 前記バッファは、不揮発性の並列入力並列出力形のシフトレジスタであることを特徴とする請求項10に記載の演算処理装置。
- 前記シフトレジスタは、不揮発性のフリップフロップにより構成されることを特徴とする請求項11に記載の演算処理装置。
- 前記不揮発性のフリップフロップは、磁気トンネル接合素子を含む回路であることを特徴とする請求項12に記載の演算処理装置。
- 複数の畳み込み演算結果データが2次元配列されるチャネルの行ごとに前記畳み込み演算結果データを前記チャネルの行方向に順次に出力する畳み込み演算部と、
プーリング演算回路及び不揮発性のプーリング用記憶回路を有し、複数の前記畳み込み演算結果データを前記チャネルの2行2列ごとに区分した各プーリング領域について最大値となる前記畳み込み演算結果データをプーリングデータとして出力するプーリング処理部と
を備え、
前記プーリング用記憶回路は、前記チャネルの列数をY(Yは2以上の偶数)として、Y+2段に接続されたバッファを有し、前記畳み込み演算部からの前記畳み込み演算結果データが1段目のバッファに入力されるごとに、1段目のバッファが入力される前記畳み込み演算結果データを保持して出力し、2段目以降の各バッファが前段のバッファから出力されている前記畳み込み演算結果データを保持して出力し、
前記プーリング演算回路は、1段目、2段目、Y+1段目及びY+2段目の各バッファからの各前記畳み込み演算結果データからなるデータ群が入力され、前記データ群の各前記畳み込み演算結果データを比較する比較器と、前記比較器の比較結果に基づいて前記データ群のうちで最大値となる前記畳み込み演算結果データを選択して出力するセレクタとを有し、
前記プーリング処理部は、前記データ群の各前記畳み込み演算結果データが1つの前記プーリング領域内の前記畳み込み演算結果データの組み合わせとなるときの前記セレクタから出力されている前記畳み込み演算結果データをプーリングデータとして出力する
ことを特徴とする演算処理装置。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023530377A JPWO2022265044A1 (ja) | 2021-06-18 | 2022-06-15 | |
US18/286,638 US20240126616A1 (en) | 2021-06-18 | 2022-06-15 | Computation processing device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021-101996 | 2021-06-18 | ||
JP2021101996 | 2021-06-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022265044A1 true WO2022265044A1 (ja) | 2022-12-22 |
Family
ID=84527110
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/023990 WO2022265044A1 (ja) | 2021-06-18 | 2022-06-15 | 演算処理装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240126616A1 (ja) |
JP (1) | JPWO2022265044A1 (ja) |
WO (1) | WO2022265044A1 (ja) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018018569A (ja) * | 2016-07-14 | 2018-02-01 | 株式会社半導体エネルギー研究所 | 半導体装置、表示システム及び電子機器 |
WO2018189620A1 (ja) * | 2017-04-14 | 2018-10-18 | 株式会社半導体エネルギー研究所 | ニューラルネットワーク回路 |
WO2019189895A1 (ja) * | 2018-03-30 | 2019-10-03 | 国立大学法人東北大学 | ニューラルネットワーク回路装置 |
WO2020161845A1 (ja) * | 2019-02-06 | 2020-08-13 | 国立大学法人東北大学 | クラスタリング装置及びクラスタリング方法 |
-
2022
- 2022-06-15 JP JP2023530377A patent/JPWO2022265044A1/ja active Pending
- 2022-06-15 US US18/286,638 patent/US20240126616A1/en active Pending
- 2022-06-15 WO PCT/JP2022/023990 patent/WO2022265044A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018018569A (ja) * | 2016-07-14 | 2018-02-01 | 株式会社半導体エネルギー研究所 | 半導体装置、表示システム及び電子機器 |
WO2018189620A1 (ja) * | 2017-04-14 | 2018-10-18 | 株式会社半導体エネルギー研究所 | ニューラルネットワーク回路 |
WO2019189895A1 (ja) * | 2018-03-30 | 2019-10-03 | 国立大学法人東北大学 | ニューラルネットワーク回路装置 |
WO2020161845A1 (ja) * | 2019-02-06 | 2020-08-13 | 国立大学法人東北大学 | クラスタリング装置及びクラスタリング方法 |
Also Published As
Publication number | Publication date |
---|---|
US20240126616A1 (en) | 2024-04-18 |
JPWO2022265044A1 (ja) | 2022-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109034373B (zh) | 卷积神经网络的并行处理器及处理方法 | |
JP6945986B2 (ja) | 演算回路、その制御方法及びプログラム | |
KR20190138815A (ko) | 동형 암호화에 의한 보안 계산 가속화를 위한 동형 처리 유닛(hpu) | |
US20170236053A1 (en) | Configurable and Programmable Multi-Core Architecture with a Specialized Instruction Set for Embedded Application Based on Neural Networks | |
US20200167405A1 (en) | Convolutional operation device with dimensional conversion | |
US5285524A (en) | Neural network with daisy chain control | |
KR20220088943A (ko) | 멤리스터 기반 신경망 병렬 가속 방법, 프로세서 및 장치 | |
US11966833B2 (en) | Accelerating neural networks in hardware using interconnected crossbars | |
CN209766043U (zh) | 存算一体芯片、存储单元阵列结构 | |
CN112151095A (zh) | 存算一体芯片、存储单元阵列结构 | |
US10884736B1 (en) | Method and apparatus for a low energy programmable vector processing unit for neural networks backend processing | |
CN109144469B (zh) | 流水线结构神经网络矩阵运算架构及方法 | |
JPH05233473A (ja) | 半導体メモリテストシステム | |
JP7435602B2 (ja) | 演算装置および演算システム | |
KR102555621B1 (ko) | 메모리 내의 컴퓨팅 회로 및 방법 | |
US6154809A (en) | Mathematical morphology processing method | |
CN110673824B (zh) | 矩阵向量乘电路以及循环神经网络硬件加速器 | |
KR20210022455A (ko) | 심층 신경망 학습 장치 및 그 방법 | |
CN114418080A (zh) | 存算一体运算方法、忆阻器神经网络芯片及存储介质 | |
WO2022265044A1 (ja) | 演算処理装置 | |
US9417841B2 (en) | Reconfigurable sorter and method of sorting | |
WO2020093669A1 (en) | Convolution block array for implementing neural network application and method using the same, and convolution block circuit | |
CN115374395A (zh) | 一种通过算法控制单元进行调度计算的硬件结构 | |
KR20210014897A (ko) | 인공 신경망을 위한 행렬 연산기 및 행렬 연산 방법 | |
KR20210050434A (ko) | 머신 러닝 추론에 대한 울트라 파이프라인 가속기 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22825032 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023530377 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18286638 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22825032 Country of ref document: EP Kind code of ref document: A1 |