CN108647782B

CN108647782B - Method and system for reducing eDRAM refreshing energy consumption in neural network chip

Info

Publication number: CN108647782B
Application number: CN201810488395.2A
Authority: CN
Inventors: 尹首一; 涂锋斌; 吴薇薇; 刘雷波; 魏少军
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-05-21
Filing date: 2018-05-21
Publication date: 2021-10-19
Anticipated expiration: 2038-05-21
Also published as: CN108647782A

Abstract

The invention provides a method and a system for reducing eDRAM refreshing energy consumption in a neural network chip, wherein the method comprises the following steps: training an original neural network model, and determining a target neural network model with the maximum fault-tolerant capability and the data retention time of the eDRAM corresponding to the target neural network model; scheduling each layer of the target neural network model, and determining a calculation mode of each layer and data survival time of each layer under the lowest calculation energy consumption; and setting the target neural network model to be executed on the neural network chip according to the calculation mode of each layer, and not refreshing the storage partition if the storage partition does not store valid data or the data survival time of the layer is less than the data retention time for the storage partition of each layer. The invention can remove unnecessary refreshing operation to the maximum extent and greatly reduce the refreshing energy consumption of the eDRAM in the neural network chip.

Description

Method and system for reducing eDRAM refreshing energy consumption in neural network chip

Technical Field

The invention belongs to the field of acceleration of a neural network chip, and particularly relates to a method and a system for reducing eDRAM refreshing energy consumption in the neural network chip.

Background

With the advent of the artificial intelligence era, intelligent tasks such as image recognition, voice recognition, natural language processing and the like are ubiquitous in life. Neural networks have achieved world-centered results in such intelligent tasks and are widely used in the industry. Such as hundredth picture search, microsoft speech recognition, google online translation, etc., which are implemented based on neural networks. The neural network is particularly suitable for accelerating the calculation of the neural network chip due to the characteristics of regular calculation, high parallelism and the like. However, because the data size is large and the internal memory capacity of the chip is limited, a large amount of external memory access is caused in the calculation process, and a large amount of energy consumption is consumed.

In the novel neural network chip, an eDRAM (Embedded Dynamic Random Access Memory) is used for replacing a traditional SRAM (Static Random Access Memory), so that larger on-chip storage capacity is obtained, and thus the Access of an off-chip Memory is reduced. However, the charge of the memory cell of the eDRAM is lost with the passage of time, and in order to ensure the correctness of data, in the prior art, the memory cell is refreshed in a periodic manner, and the periodic refreshing manner has the following defects: 1) for the condition that refreshing is not needed (the data retention time is less than the data refreshing time), the refreshing operation is still executed, so that the refreshing energy consumption is higher, and the energy consumption benefit brought by reduction of external storage access can be weakened; 2) the refresh period is determined according to the process of the eDRAM, for example, the data retention time of the eDRAM is 45us in the 65nm process, thereby leading to a shorter refresh period and frequent refresh.

Disclosure of Invention

The invention is used for solving the defects that the eDRAM type neural network chip in the prior art has larger refreshing energy consumption and weakens the energy consumption benefit brought by reduction of external storage access.

In order to solve the above technical problem, a technical solution of the present invention provides a method for reducing eDRAM refresh energy consumption in a neural network chip, including:

training an original neural network model, and determining a target neural network model with the maximum fault-tolerant capability and the data retention time of the eDRAM corresponding to the target neural network model;

scheduling each layer of the target neural network model, and determining a calculation mode of each layer and data survival time of each layer under the lowest calculation energy consumption;

and setting the target neural network model to be executed on the neural network chip according to the calculation mode of each layer, and not refreshing the storage partition if the storage partition does not store valid data or the data survival time of the layer is less than the data retention time of the eDRAM for the storage partition of each layer.

Another technical solution of the present invention provides a system for reducing eDRAM refresh energy consumption in a neural network chip, the system comprising: a neural network chip and an eDRAM controller;

the neural network chip is used for executing the target neural network model according to a calculation mode of each layer under the lowest calculation energy consumption of the target neural network model, wherein the target neural network model has the maximum fault-tolerant capability;

and the eDRAM controller is connected with the neural network chip and is used for not refreshing the storage partition of each layer of the target neural network model when the target neural network model is executed if the storage partition does not store effective data or the data survival time of the layer under the lowest calculation energy consumption of the target neural network model is less than the data retention time of the eDRAM corresponding to the target neural network model.

According to the method and the system for reducing the eDRAM refreshing energy consumption in the neural network chip, provided by the invention, longer data retention time can be obtained by training and adjusting the fault tolerance of the neural network model; by scheduling each layer of the neural network model and analyzing the calculated energy consumption of each layer, the shorter data survival time of each layer of the neural network model can be obtained; by setting the data survival time of each layer to be less than the data retention time, the memory partition is not refreshed, unnecessary refreshing operation can be removed to the greatest extent, and the refreshing energy consumption of the eDRAM in the neural network chip is greatly reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a method for reducing eDRAM refresh power consumption in a neural network chip according to an embodiment of the present invention;

FIG. 2 is a flow chart of a process of training a primitive neural network model according to an embodiment of the present invention;

FIG. 3 is a flowchart of a process for scheduling a target neural network model according to an embodiment of the present invention;

FIG. 4 is a block diagram of an eDRAM controller according to an embodiment of the present invention;

FIG. 5 is a block diagram of a system for reducing eDRAM refresh power consumption in a neural network chip according to an embodiment of the present invention.

Detailed Description

In order to make the technical features and effects of the invention more obvious, the technical solution of the invention is further described below with reference to the accompanying drawings, the invention can also be described or implemented by other different specific examples, and any equivalent changes made by those skilled in the art within the scope of the claims are within the scope of the invention.

In the description herein, references to the description of the terms "an embodiment," "a particular embodiment," "some embodiments," "for example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the various embodiments is provided to schematically illustrate the practice of the invention, and the sequence of steps is not limited and can be suitably adjusted as desired.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for reducing eDRAM refresh power consumption in a neural network chip according to an embodiment of the present invention. The embodiment can remove unnecessary refreshing operation, and greatly reduces the refreshing energy consumption of the eDRAM in the neural network chip. Specifically, the method for reducing the eDRAM refresh energy consumption in the neural network chip comprises the following steps:

step 100, training an original neural network model, and determining a target neural network model with the maximum fault-tolerant capability and the data retention time of the eDRAM corresponding to the target neural network model. In detail, the step utilizes the fault tolerance capability of the neural network from the training level, and can obtain the data retention time as long as possible. The target neural network model has the maximum fault-tolerant capability under a given tolerance error, can tolerate error rollover of data on a few bits, and does not lose identification precision.

And 200, scheduling each layer of the target neural network model, and determining a calculation mode of each layer and data survival time of each layer under the lowest calculation energy consumption. The step reduces the survival time of data when the neural network calculates from the scheduling level.

And 300, setting the target neural network model to be executed on the neural network chip according to the calculation mode of each layer, and not refreshing the storage partition if the storage partition does not store valid data or the data survival time of the layer is less than the data retention time for the storage partition of each layer.

In detail, the

steps

100 and 200 are performed in the compiling stage (the neural network model is not loaded into the neural network chip yet), the original neural network model is a model to be executed on the neural network chip in the prior art, and the target neural network model is a model to be executed on the neural network chip in the present invention. Step 300 is performed during the execution phase (the target neural network model is already loaded into the neural network chip).

According to the embodiment, the data refreshing period can be changed along with different networks and different layer numbers, refreshing operation can be omitted, unnecessary refreshing operation can be removed, and the refreshing energy consumption of the eDRAM in the neural network chip is greatly reduced.

In an embodiment of the present invention, as shown in fig. 2, the process of training the original neural network in the step 100 and determining the target neural network model with the maximum fault tolerance and the data retention time of the eDRAM corresponding to the target neural network model includes:

step 110, determining a corresponding relationship between a refresh period (i.e. data retention time) and an error probability according to a characteristic curve of the eDRAM. In detail, the error probability is bit-level, and the characteristic curve of the eDRAM represents a one-to-one correspondence between the refresh cycle of the eDRAM memory and the error probability.

Step 120, performing fixed-point pre-training on the original neural network model, and converting the original neural network model into a fixed-point neural network model which can be executed on a neural network chip.

Step 130, determining an error probability r according to the corresponding relation obtained in step 110, and injecting an error taking the error probability r as a probability in each layer of the fixed-point neural network model. Specifically, the error in the present invention refers to an error at a bit level of error inversion, and an error that occurs with an error probability r as a probability is implanted in a masked manner at each layer of the fixed-point neural network model.

In some embodiments, in order to ensure that the whole training process is performed in order, the minimum error probability in the correspondence relationship is set as the error probability r.

And 140, retraining the neural network model after the error is injected to adjust the weight, so that the neural network model obtained by retraining has fault tolerance capability on the error probability r. In detail, the error probability r is proportional to the fault tolerance, and a larger error probability r indicates a larger fault tolerance.

And 150, judging whether the training error is smaller than a given tolerance error, if so, increasing the error probability r, repeating the process of injecting the error probability r in the step 130 and retraining in the step 140 until the training error is larger than or equal to the given tolerance error, and taking the neural network model obtained in the last training as a target neural network model.

In detail, the error probability r injected in the last training is the largest, and since the error probability r is in direct proportion to the fault tolerance, the fault tolerance of the neural network model obtained in the last training is also the largest.

And step 160, determining data retention time according to the error probability r of the last injection and the corresponding relation. Specifically, a refresh period corresponding to the error probability r of the last injection is determined from the corresponding relationship according to the error probability r of the last injection, and the refresh period is the data retention time of the eDRAM corresponding to the target neural network model.

In an embodiment of the present invention, as shown in fig. 3, the step 200 of scheduling each layer of the target neural network model, and the process of determining the computation mode of each layer and the data lifetime of each layer under the lowest computation energy consumption includes:

step 210, when a layer of the target neural network model is scheduled, extracting the structural information and hardware constraints of the current layer, and operating a design space scheduling framework, wherein the design space scheduling framework is used for traversing all the calculation modes of the neural network, analyzing the data survival time and the calculation energy consumption in each calculation mode according to the structural information and the hardware constraints, taking the calculation mode with the lowest calculation energy consumption as the calculation mode of the current layer, and determining the data survival time of the current layer according to the calculation mode of the current layer.

In detail, the calculation mode of the neural network includes: an input priority mode, an output priority mode, and a weight priority mode. Data lifetime is also typically minimized when computational energy consumption is minimized.

Step 220, determining whether the current layer is the last layer, if yes, executing step 230, and if no, executing step 240.

Step 230, the scheduling process is ended.

And 240, switching to the next layer of the target neural network model, and returning to the step 210 to continue executing.

Further, in order to facilitate the step 300 to implement the refresh control when the neural network chip operates, the step 200 further includes, after obtaining the calculation mode of each layer and the data refresh time of each layer:

configuration information is generated according to the data retention time determined in step 100, the data lifetime of each layer determined in step 200, and the calculation mode of each layer, and the configuration information includes a refresh flag of each layer and a calculation mode flag of each layer.

Specifically, the calculation mode flag for each layer is used to indicate the calculation mode for each layer, and for example, 01 indicates an input priority calculation mode, 10 indicates an output priority calculation mode, and 11 indicates a weight priority calculation mode. The refresh flag of each layer is used to indicate whether to refresh the memory partition of the corresponding layer, for example, 0 is an "invalid" flag indicating that the memory partition of the corresponding layer is not refreshed (the memory partition of the corresponding layer does not store valid data or the data lifetime of the corresponding layer is less than the data retention time), and 1 is an "valid" flag indicating that the memory partition of the corresponding layer is refreshed (the memory partition of the corresponding layer stores valid data or the data lifetime of the corresponding layer is greater than or equal to the data retention time).

In an embodiment of the present invention, the refresh control in step 300 is implemented by an eDRAM controller, as shown in fig. 4, the eDRAM controller includes: programmable clock divider 410, a plurality of refresh triggers 420 corresponding to memory partitions, and memory 430.

The programmable clock frequency divider 410 is used for taking a reference clock of the neural network chip as input and setting a refresh period as data retention time; and the controller is also used for acquiring the refresh mark of the corresponding layer from the memory and controlling the corresponding refresh trigger to work according to the refresh mark of the corresponding layer.

The refresh trigger 420 is used to refresh the memory partition according to the control of the programmable clock divider 410.

The memory 430 is used to store configuration information.

As shown in fig. 5, fig. 5 is a system for reducing eDRAM refresh power consumption in a neural network chip according to an embodiment of the present invention, including: a neural network chip 510 and an eDRAM controller 520.

The neural network chip 510 is configured to execute the target neural network model in a computation mode of each layer with the lowest computation energy consumption of the target neural network model, wherein the target neural network model has the largest fault tolerance capability.

The eDRAM controller 520 is connected to the neural network chip 510, and is configured to, when the target neural network model is executed, for a storage partition of each layer of the target neural network model, not refresh the storage partition if the storage partition does not store valid data or the data lifetime of the layer at the lowest computational energy consumption of the target neural network model is less than the data retention time of the eDRAM corresponding to the target neural network model.

Further, the eDRAM controller comprises: programmable clock divider 521, a plurality of refresh triggers 522 corresponding to the memory partitions, and memory 523.

The programmable clock frequency divider 521 is connected with a reference clock end of the neural network chip and is used for setting a refresh period as data retention time corresponding to a target neural network model; and the controller is also used for acquiring the refresh mark of the corresponding layer from the memory and controlling the corresponding refresh trigger to work according to the refresh mark of the corresponding layer.

The refresh trigger 522 is connected to the control terminal of the neural network chip and is used for refreshing the memory partition according to the control of the programmable clock divider.

The memory 523 is configured to store a refresh flag of each layer of the target neural network model, where the refresh flag of each layer is used to indicate whether to refresh the memory partition of the corresponding layer.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only for the purpose of illustrating the present invention, and any person skilled in the art can modify and change the above embodiments without departing from the spirit and scope of the present invention. Therefore, the scope of the claims should be accorded the full scope of the claims.

Claims

1. A method for reducing eDRAM refresh energy consumption in a neural network chip, the method comprising:

setting the target neural network model to be executed on the neural network chip according to the calculation mode of each layer, and for the storage partition of each layer, if the storage partition does not store valid data or the data survival time of the layer is less than the data retention time, not refreshing the storage partition;

training an original neural network, wherein the process of determining the target neural network model and the data retention time comprises the following steps:

determining a corresponding relation between a refreshing period and the error probability according to the characteristic curve of the eDRAM;

performing fixed-point pre-training on the original neural network model, and converting the original neural network model into a fixed-point neural network model;

determining an error probability r according to the corresponding relation, and injecting an error taking the error probability r as a probability in each layer of the fixed-point neural network model, wherein the error probability r is in direct proportion to the fault-tolerant capability;

retraining the neural network model after the error is injected;

judging whether the training error is smaller than a given tolerance error, if so, increasing the error probability r, repeating the processes of injecting the error probability r and retraining until the training error is larger than or equal to the given tolerance error, and taking the neural network model obtained by the last training as the target neural network model;

and determining the data retention time according to the error probability r of the last injection and the corresponding relation.

2. The method of claim 1, wherein determining an error probability r based on the correspondence comprises: and setting the minimum error probability in the corresponding relation as the error probability r.

3. The method of claim 1, wherein scheduling each layer of the target neural network model, determining a computation pattern for each layer at a lowest computation energy consumption and a data lifetime for each layer comprises:

and when each layer of the target neural network model is scheduled, traversing all the calculation modes of the neural network, analyzing the data survival time and the calculation energy consumption in each calculation mode, taking the calculation mode with the lowest calculation energy consumption as the calculation mode of the layer, and determining the data survival time of the layer according to the calculation mode of the layer.

4. The method of claim 3, wherein computing a pattern comprises: an input priority mode, an output priority mode, and a weight priority mode.

5. The method of claim 1, wherein the method further comprises:

generating configuration information according to the data retention time, the data survival time of each layer and the calculation mode of each layer, wherein the configuration information comprises a refreshing mark of each layer and a calculation mode mark of each layer;

the computing mode flag of each layer is used for indicating the computing mode of each layer, and the refresh flag of each layer is used for indicating whether to refresh the memory partition of the corresponding layer.

6. The method of claim 5, wherein the refresh control is implemented by an eDRAM controller, the eDRAM controller comprising: the device comprises a programmable clock frequency divider, a plurality of refreshing triggers corresponding to memory partitions and a memory;

the programmable clock frequency divider is used for taking a reference clock of the neural network chip as input and setting a refresh period as the data retention time; the device is also used for acquiring the refresh mark of the corresponding layer from the memory and controlling the corresponding refresh trigger to work according to the refresh mark of the corresponding layer;

the refreshing trigger is used for refreshing the memory partitions according to the control of the programmable clock frequency divider;

the memory is used for storing configuration information.

7. A system for reducing eDRAM refresh energy consumption in a neural network chip, comprising: a neural network chip and an eDRAM controller;

the eDRAM controller is connected with the neural network chip and is used for storing effective data in a storage partition of each layer of the target neural network model when the target neural network model is executed, and if the storage partition does not store the effective data or the data survival time of the layer under the lowest calculation energy consumption of the target neural network model is shorter than the data retention time of the eDRAM corresponding to the target neural network model, the storage partition is not refreshed;

wherein the determining process of the target neural network model and the data retention time comprises:

performing fixed-point pre-training on an original neural network model, and converting the original neural network model into a fixed-point neural network model;

retraining the neural network model after the error is injected;

8. The system of claim 7, wherein the eDRAM controller comprises: the device comprises a programmable clock frequency divider, a plurality of refresh triggers corresponding to memory partitions and a memory;

the programmable clock frequency divider is connected with a reference clock end of the neural network chip and is used for setting a refresh period as the data retention time; the refresh trigger is also used for acquiring the refresh mark of the corresponding layer from the memory and controlling the corresponding refresh trigger to work according to the refresh mark of the corresponding layer;

the refreshing trigger is connected with the control end of the neural network chip and used for refreshing the memory partition according to the control of the programmable clock frequency divider;

the memory is used for storing a refresh mark of each layer of the target neural network model, and the refresh mark of each layer is used for indicating whether to refresh the storage partition of the corresponding layer.