CN113449855A - Arithmetic device and method - Google Patents

Arithmetic device and method Download PDF

Info

Publication number
CN113449855A
CN113449855A CN202110597369.5A CN202110597369A CN113449855A CN 113449855 A CN113449855 A CN 113449855A CN 202110597369 A CN202110597369 A CN 202110597369A CN 113449855 A CN113449855 A CN 113449855A
Authority
CN
China
Prior art keywords
data
adder
input
stage
arithmetic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110597369.5A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN202110597369.5A priority Critical patent/CN113449855A/en
Publication of CN113449855A publication Critical patent/CN113449855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The present disclosure provides an arithmetic device including: the operation module comprises one or more operation units; and the control module comprises an operation control unit and is used for controlling the closing of the operation unit through a judgment condition. The present disclosure also provides an operation method. The low-power-consumption operation device and method have high flexibility and can be combined with a software lifting mode, so that the operation speed can be further improved, the calculated amount is reduced, and the operation power consumption of an accelerator is reduced.

Description

Arithmetic device and method
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an arithmetic device and method.
Background
The deep neural network is the foundation of many artificial intelligence applications at present, and has been applied in many aspects such as speech recognition, image processing, data analysis, advertisement recommendation system, car autopilot, and the like in a breakthrough manner, so that the deep neural network has been applied in various aspects of life. However, the computation of the deep neural network is huge, which always restricts the faster development and wider application of the deep neural network. When the design of the accelerator is considered to accelerate the operation of the deep neural network, a huge operation amount inevitably brings a large energy consumption overhead, and the further wide application of the accelerator is also restricted.
In terms of hardware, the existing commonly used accelerator architecture mainly analyzes the operation part which consumes the most time in the operation and then performs targeted acceleration design. Taking a convolutional neural network as an example, as shown in fig. 1, an existing acceleration structure of a conventional inner product operation is often a "multiply-add" structure, that is, a set of product values is obtained by a set of multipliers in one clock cycle, and then, the product values are accumulated in parallel to obtain a final result. However, this structure is not flexible, and cannot further increase the operation speed and reduce the amount of calculation.
Disclosure of Invention
Technical problem to be solved
To solve or at least partially alleviate the above technical problems, the present disclosure provides a low power consumption operation apparatus and method. The low-power-consumption operation device and method have high flexibility and can be combined with a software lifting mode, so that the operation speed can be further improved, the calculated amount is reduced, and the operation power consumption of an accelerator is reduced.
(II) technical scheme
According to an aspect of the present disclosure, there is provided an arithmetic device including:
the operation module comprises one or more operation units; and
and the control module comprises an operation control unit and is used for controlling the closing of the operation unit of the operation module according to a judgment condition.
In some embodiments, each of the arithmetic units includes one or more arithmetic elements that are adders, multipliers, selectors, or temporary buffers.
In some embodiments, the arithmetic module includes n multipliers at a first stage and an addition tree of n inputs at a second stage, where n is a positive integer.
In some embodiments, the determination condition includes a threshold determination condition or a function mapping determination condition.
In some embodiments, the determination condition is a threshold determination condition, including: less than a given threshold, greater than a given threshold, within a given range of values or outside a given range of values.
In some embodiments, the determination condition is a function mapping determination condition, that is, whether a given condition is satisfied after the function transformation is determined.
In some embodiments, the n multipliers of the first stage are respectively connected to an operation control unit, and the operation control unit controls the multipliers to be turned off according to the judgment condition.
In some embodiments, the operation control unit determines the data to be operated according to the determination condition, and turns off a multiplier when it determines that the absolute value of input data input to the multiplier is smaller than a given threshold.
In some embodiments, the addition tree includes k levels of adders, the number of adders in the first level is n/2, and the number of adders in the last level, i.e., the k-th level, is 1, where 2kN; the n/2 adders at the 1 st level of the addition tree are connected with the n multipliers and receive data signals and control signals sent by the multipliers; the adders at the 2 nd to the kth stages of the addition tree are respectively connected with the adder at the previous 1 stage, and receive the data signal and the control signal sent by the adder at the previous 1 stage.
In some embodiments, if the multiplier receives a closing signal sent by the operation control unit, a control signal 0 is input into the adder at the 1 st level of the addition tree, otherwise, the multiplication result is sent to the adder at the 1 st level of the addition tree, and a control signal 1 is input; if the adder receives two control signals which are both 1, the input values are accumulated, the accumulated sum is sent to a lower-level adder, and a control signal 1 is sent to a lower level; if one control signal received by the adder is 1 and one control signal is 0, the input data of one end of 1 is directly transmitted to the lower stage and the control signal 1 is input to the lower stage; if the adder receives two control signals which are both 0, the adder is closed, and the control signal 0 is input to the next-stage adder, and so on until the addition tree is accumulated to obtain a final result.
In some embodiments, the n multipliers of the first stage and the n-input addition tree of the second stage are respectively connected to an operation control unit, and the operation control unit judges the data to be operated according to the judgment condition; when it is judged that the absolute value of input data to a multiplier or adder is smaller than a given threshold, the multiplier or adder is turned off.
In some embodiments, the computing device further comprises: the data processing module is used for expanding or compressing the data; correspondingly, the control module comprises a data control unit for controlling the data processing module to expand or compress the data.
In some embodiments, the data processing module expands and compresses the data, if the synapse value is in a sparse mode, namely, a sparse network expressed by sparse coding is used for compressing the neuron data according to the sparse condition of the synapse value, and the neuron data which does not need to be operated is compressed and screened out; or, if the neuron data is in a sparse mode, carrying out corresponding compression on synapses according to the sparse condition of the neuron data, and carrying out compression screening on synapse data which do not need to be operated; or, giving a compression judgment condition, and compressing synapse and/or neuron data according to the compression judgment condition.
In some embodiments, the compression determination condition includes a threshold determination condition or a function mapping determination condition.
In some embodiments, the threshold determination condition includes: less than a given threshold, greater than a given threshold, within a given range of values or outside a given range of values.
In some embodiments, the n multipliers of the first stage are respectively connected to the data processing module, and receive the neuron data and the synapse data output by the data processing module.
In some embodiments, the operation module comprises m operation units, each operation unit comprises a multiplier, an adder and a temporary buffer;
the control module comprises m operation control units, each operation control unit is respectively connected with the multiplier and the adder of one operation unit, the closing of the multiplier and the adder is controlled according to the judgment condition, and m is a positive integer.
In some embodiments, the multiplier of each operation unit has three inputs and an output, wherein the two inputs are respectively used for receiving neuron data and synapse data, the other input is used for inputting a control signal, and the output is used for outputting a multiplication result;
the adder has three input ends and one output end, wherein the two input ends are respectively used for receiving the multiplication result and the data input by the temporary buffer, the other input end is used for inputting a control signal, the output end is used for outputting the addition result, and the addition result is stored back to the temporary buffer to be used as the input data of the next layer of addition operation.
In some embodiments, synapse data is broadcast to each arithmetic unit, if neuron data input to the arithmetic unit is less than a threshold, the multiplier and adder of the arithmetic unit are controlled to be closed by a control signal, and the sum of parts stored in the temporary buffer is unchanged; otherwise, the arithmetic unit multiplies the two input data by a multiplier, accumulates the two input data with the data in the temporary buffer, and stores the accumulated data back to the temporary buffer.
In some embodiments, the arithmetic device further comprises a data processing module and a storage module, wherein one of the input terminals of the multiplier is connected with the data processing module and is used for receiving the compressed synapse data; one of the input terminals is connected with the storage module and used for receiving the neuron data.
In some embodiments, the operation module comprises p operation units, each operation unit comprising a multiplier, an adder and a selector;
the control module comprises p operation control units, each operation control unit is respectively connected with the multiplier and the adder of one operation unit, the closing of the multiplier and the adder is controlled according to the judgment condition, and p is a positive integer.
In some embodiments, the multiplier of each operation unit has three inputs and an output, wherein the two inputs are respectively used for receiving neuron data and synapse data, the other input is used for inputting a control signal, and the output is used for outputting a multiplication result;
the adder of the 1 st arithmetic unit has three input ends and an output end, wherein two input ends are respectively used for receiving multiplication results and data input by the selector of the arithmetic unit of the current stage, the other input end is used for inputting control signals, the output end is used for outputting addition results, and the addition results are sent to the next arithmetic unit by the selector to be used as input data of the addition operation of the next stage;
the adders of the 2 nd to p-th arithmetic units have three input terminals for receiving the multiplication result and the data input from the selector of the previous-stage arithmetic unit, respectively, and an output terminal for inputting a control signal, and the output terminal for outputting the addition result, which is sent to the next arithmetic unit via the selector to be used as the input data for the next-stage addition operation.
In some embodiments, the computing device further comprises:
the storage module is connected with the control module and used for controlling the storage module to store or read the required data; the storage module is connected with the operation module at the same time and used for inputting the data to be operated into the operation module and receiving and storing the data operated by the operation module at the same time.
In some embodiments, the control module includes a storage control unit for controlling the storage module to store or read the required data.
According to another aspect of the present disclosure, there is provided an arithmetic method including:
setting a judgment condition;
and controlling the closing of the operation unit of the operation module according to the judgment condition.
In some embodiments, the determination condition includes a threshold determination condition or a function mapping determination condition.
In some embodiments, the determination condition is a threshold determination condition, including: less than a given threshold, greater than a given threshold, within a given range of values or outside a given range of values.
In some embodiments, the determination condition is a function mapping determination condition, that is, whether a given condition is satisfied after the function transformation is determined.
In some embodiments, according to the determination condition, if the absolute value of the input data to a multiplier is smaller than a given threshold, the multiplier is turned off.
In some embodiments, the multiplier receives the close signal and inputs a control signal 0 into the adder at level 1 of the addition tree, otherwise sends the product result to the adder at level 1 of the addition tree and inputs a control signal 1;
the 1 st-stage adder receives the control signal sent by the multiplier, and the 2 nd-k-stage adders respectively receive the control signal sent by the previous 1-stage adder until the addition tree is accumulated to obtain a final result; wherein the content of the first and second substances,
if the adder receives two control signals which are both 1, the input values are accumulated, the accumulated sum is sent to a lower-level adder, and a control signal 1 is sent to a lower level; if one control signal received by the adder is 1 and one control signal is 0, the input data of one end of 1 is directly transmitted to the lower stage and the control signal 1 is input to the lower stage; and if the adder at the 1 st stage receives two control signals which are both 0, closing the adder, and inputting a control signal 0 to the adder at the next stage.
In some embodiments, the determination condition is a threshold determination condition, and the arithmetic control unit sets a threshold and compares an absolute value of data input to the multiplier/adder with the threshold, and controls to turn off the multiplier/adder if the absolute value is smaller than the threshold.
In some embodiments, if the multiplier of an operation unit is not turned off, performing a multiplication operation on the input neuron data and synapse data and outputting a multiplication operation result; if not, the adder of the arithmetic unit is closed at the same time, and the sum of the parts stored in the temporary cache is unchanged;
if the adder of the arithmetic unit is not closed, receiving the multiplication result and data input by the temporary buffer to execute addition operation and outputting the addition operation result, wherein the addition operation result is stored back to the temporary buffer to be used as input data of the next layer of addition operation until the operation is finished; otherwise, the partial sum stored in the temporary cache is not changed.
In some embodiments, if the multiplier of an operation unit is not turned off, performing a multiplication operation on the input neuron data and synapse data and outputting a multiplication operation result; if not, the adder of the arithmetic unit is closed at the same time, and the selector selects to directly transmit the data of the previous stage to the selector of the next stage;
if the adder of the arithmetic unit is not closed, receiving the multiplication result and the input data of the current stage or the data output by the selector of the previous stage to execute addition operation and outputting the addition operation result, and sending the result to the next arithmetic unit through the selector;
otherwise, when the adder and the multiplier are closed, the selector selects the input data of the current stage from the input or the data output by the selector of the previous stage as the result output of the arithmetic unit.
In some embodiments, further comprising: and expanding or compressing the data to be operated.
In some embodiments, the data processing module expands and compresses the data, and if the synapse value is in a sparse mode, the data processing module compresses the neuron data according to the sparse condition of the synapse value, and compresses and screens out neuron data which do not need to be operated; or, if the neuron data is in a sparse mode, carrying out corresponding compression on synapses according to the sparse condition of the neuron data, and carrying out compression screening on synapse data which do not need to be operated; or, giving a compression judgment condition, and compressing synapse and/or neuron data according to the compression judgment condition.
(III) advantageous effects
According to the technical scheme, the low-power-consumption operation device and the method have at least one of the following beneficial effects:
(1) distinguishing data, and reasonably selecting whether to configure corresponding components of the arithmetic device according to the difference of the data, for example, whether to configure a data processing part according to whether the storage mode of the data is a sparse representation mode; for another example, the number of configured operation groups can be selected according to requirements, and the number of configured multipliers and adders in each operation group is selected, so that the method has high flexibility.
(2) When the data to be calculated meet the given judgment condition, the requirements of the corresponding multiplier and adder are closed, so that the power consumption of the accelerator can be reduced under the condition of not influencing the operation speed of the accelerator.
(3) The proportion of the adder and the multiplier is controlled to be closed according to the adjustment judgment condition, so that the aim of reducing the energy consumption is fulfilled.
Drawings
FIG. 1 is a diagram of functional modules of a computing device according to the prior art.
FIG. 2 is a functional block diagram of a low power computing device according to the present disclosure.
FIG. 3 is a schematic diagram of another functional block of the low power consumption computing device according to the present disclosure.
Fig. 4 is a schematic structural diagram of a low power consumption computing device according to embodiment 1 of the present disclosure.
Fig. 5 is a schematic structural diagram of a low power consumption computing device according to embodiment 2 of the present disclosure.
Fig. 6 is a schematic structural diagram of a low power consumption computing device according to embodiment 3 of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It should be noted that in the drawings or description, the same drawing reference numerals are used for similar or identical parts. Implementations not depicted or described in the drawings are of a form known to those of ordinary skill in the art. Additionally, while exemplifications of parameters including particular values may be provided herein, it is to be understood that the parameters need not be exactly equal to the respective values, but may be approximated to the respective values within acceptable error margins or design constraints. In addition, directional terms such as "upper", "lower", "front", "rear", "left", "right", and the like, referred to in the following embodiments, are directions only referring to the drawings. Accordingly, the directional terminology used is intended to be in the nature of words of description rather than of limitation.
The present disclosure provides a low power consumption neural network operation device. FIG. 2 is a functional block diagram of a low power computing device according to the present disclosure. As shown in fig. 2, the low power consumption neural network operation device mainly includes: the device comprises a control module, a storage module and an operation module; the control module is connected with the storage module and used for controlling the storage module to store or read required data; the storage module is connected with the operation module and used for inputting data to be operated into the operation module and receiving and storing the data operated by the operation module; the control module is simultaneously connected with the operation module and used for controlling the working mode of the operation module according to the operation type and the operation data to be operated.
Specifically, the control module includes a storage control unit and an operation control unit. The storage control unit is used for controlling the storage module to store or read required data; the operation control unit controls whether each component in the arithmetic unit works or not and how to work according to the operation type to be operated and the operation data, and the control module can be divided into operation groups to control the operation groups, and can also be used as a whole to control the operation groups integrally or independently outside.
Further, as shown in fig. 3, the low power consumption neural network operation device may further include a data processing module, configured to expand and compress data; correspondingly, the control module comprises a data control module for controlling whether to expand or compress the data, and the data processing module and the data control unit exist or do not exist at the same time.
The data processing module expands and compresses data, and specifically, when a synapse value is in a sparse mode, namely, in a sparse representation mode, neuron data are compressed according to the sparse condition of the synapse value, and neuron data which do not need to be operated are screened out; or, when the neuron data is in the sparse mode, correspondingly compressing synapses according to the sparse condition of the neuron data, and screening out synapse data which do not need to be operated; or giving a compression judgment condition, compressing synapse and/or neuron data according to the compression judgment condition, and screening out data meeting the compression judgment condition.
The compression judgment condition includes a threshold judgment condition or a function mapping judgment condition. Wherein the threshold judgment condition includes: less than a given threshold, greater than a given threshold, within a given range of values or outside a given range of values.
The storage module comprises a data storage unit and a temporary cache unit, and one or more data storage units can be arranged according to requirements, namely, the data to be operated can be stored in the same area or can be stored separately; the intermediate result values may be stored in the same area or may be stored separately.
The operation module may have various structures, and may include one or more operation units, each of which includes one or more multipliers and one or more adders. The arithmetic units transmit data to be operated or intermediate result values in a certain direction.
The low-power-consumption neural network arithmetic device mainly comprises an arithmetic process that a storage control module sends a reading control signal to a storage part to select and read data to be operated, and if the read data is the data with the coexistence of a compression mode and expansion data, a data control unit controls a data processing module to expand the compressed data or compress the corresponding data to be operated. And then, the operation control unit is ready to send out an operation signal to be operated, judges the read data value to be operated, and sends out a closing signal to the corresponding operation component if the absolute value of the data value is smaller than a given threshold value, otherwise, sends out the operation signal to be operated to the corresponding operation component. After the operation is finished, if the data needs to be compressed or expanded, the data control part receives the data and controls the data processing part to perform compression or expansion operation on the data, namely, the data which does not need the operation is compressed and screened out or the data which is expressed in a sparse mode is expanded into a non-sparse mode. And then the result is stored in the storage part according to the storage control part, and if data processing is not needed, the result can be directly stored in the storage part through the storage control part.
The following describes the structure of the computing device according to the present disclosure in detail with reference to specific embodiments. It will be appreciated by those skilled in the art that the structure of the computing device of the present disclosure is not limited to the few listed below.
Example 1
Referring to fig. 4, in embodiment 1, the operation unit of the operation module includes n (n is a positive integer) multipliers and an n-input addition tree. The operation control unit controls the operation module as a whole and can send a control signal to each multiplier. Each adder and multiplier can receive data signals and control signals, namely can receive the data to be operated and can receive the control signals and send the control signals to the next stage.
The specific operation process of the low-power-consumption neural network operation device adopting the operation module of this embodiment 1 is as follows: first, the memory control unit controls the memory module to read out neuron data and synapse data. If the synapse data is stored in a sparse coding mode, the neuron data and the index value of synapse are required to be transmitted into a data processing module together, and the neuron data are correspondingly compressed according to the index value, namely only the neuron data to be operated with the synapse data transmitted to the operation unit are screened out. And then the data processing module transmits the processed neuron data and the synapse data to each operation unit together. Each multiplier receives a neuron data and a corresponding synapse data. And then, the result of the multiplier is sent to the adder at the next stage for accumulation, and the accumulated sum is continuously sent to the adder at the next stage for accumulation until the final result is finished.
For the operation of the first-stage multiplier, the data to be operated is simultaneously input into the first-stage multiplier and the operation control unit, the operation control unit judges the threshold value of the data to be operated, when the absolute value of the input neuron data input to one multiplier is judged to be smaller than a given threshold value, the multiplier is closed, the multiplier receives a closing signal, a control signal 0 is input into the adder and is closed, otherwise, the product result is transmitted to the adder, and a control signal 1 is input. When the adder receives two control signals which are both 1, the input values are accumulated, the accumulated sum is transmitted to the next stage, and the control signal 1 is transmitted to the next stage; when the adder receives a control signal 1 and one control signal is 0, the input data at one end of the 1 is directly transmitted to the next stage, and the control signal 1 is input to the next stage; when the adder receives two control signals which are both 0, the adder is closed, and the control signal 0 is input to the adder at the next stage. And repeating the steps until the addition tree is accumulated to obtain a final result, and controlling the result to be written into the storage module by the storage control unit. The result obviously can be used for judging whether the multiplier or the adder with the input smaller than the given threshold value is turned on or off on the basis of not losing the operation speed of the parallel operation, so that the power consumption can be further reduced in the operation. The judgment condition for closing the multiplier can be that the value to be judged is larger than a given threshold value besides that the absolute value of the value to be judged is smaller than the given threshold value; the value to be judged is within a given value interval, namely between two thresholds of one larger threshold and one smaller threshold; or the value to be judged is outside the given value interval, namely is larger than a large threshold value or smaller than a small threshold value; or the value to be judged meets a certain condition after being subjected to function mapping, such as the value after mapping is equal to a given threshold value, is larger/smaller than the given threshold value, is in a given value interval and the like.
Example 2
Referring to fig. 5, in the embodiment 2, the operation module includes n (n is a positive integer) operation units, each operation unit includes a multiplier, an adder and a temporary buffer, and each operation control unit is also divided into each operation group. Synapse data is transmitted to each arithmetic unit in the form of broadcast, and neurons are directly transmitted from the storage section.
The specific operation process of the low-power-consumption neural network operation device adopting the operation module of the embodiment 2 is as follows: first, the temporary buffer in each arithmetic unit is initialized to 0. The storage control unit controls the storage module to read out neuron data, if the synapse data is stored in a sparse coding mode, the synapse index value and the neuron data are transmitted into the data processing module together, and the neuron data are correspondingly compressed according to the index value, namely, neuron data which are to be operated in the operation unit and the transmitted synapse data value are screened out. The neuron data and the synapse data are then both passed to the respective arithmetic units. Meanwhile, the storage control unit controls the storage module to read out the synapse data, and the storage module broadcasts the first synapse data to all the operation units. When the operation unit receives the neuron data, the operation control unit judges whether the neuron data is smaller than a given threshold value, if so, the multiplier and the adder of the operation unit are closed, the sum of the parts stored in the temporary cache is unchanged, otherwise, the operation unit multiplies the two input data by the multiplier, accumulates the two input data with the data in the temporary cache, and accumulates and stores the two input data back to the temporary cache. And then, broadcasting second synapse data, repeating the operations until the synapse data is broadcast, and storing the n operation results to a storage module by the storage control unit. And then, carrying out the next round of operation until all the operations are finished. When the synapse data is dense, it may be considered that the data processing module is skipped after the neuron data is read out from the storage module, and the data is directly transmitted to the operation module. The device has the advantages that the characteristics of the neural network, namely the reusability of synapses, are fully utilized, and repeated reading of synapse data is reduced. Meanwhile, according to the value of the neuron data, when the absolute value of the neuron data is smaller than a given threshold value, the corresponding multiplier and adder are closed, operation is not performed, and therefore power consumption can be reduced. The judgment condition for closing the multiplier and the adder can be that the value to be judged is larger than a given threshold value besides that the absolute value of the value to be judged is smaller than the given threshold value; the value to be judged is within a given value interval, namely between two thresholds of one larger threshold and one smaller threshold; or the value to be judged is outside the given value interval, namely is larger than a large threshold value or smaller than a small threshold value; or the value to be judged meets certain conditions through function mapping, such as the mapped value is equal to a given threshold value, is larger/smaller than the given threshold value, is in a given value interval and the like.
Example 3
Referring to fig. 6, in the embodiment 3, the operation module includes n operation units. Each arithmetic unit comprises a multiplier and an adder and a selector. The operation control unit controls each operation unit, respectively, and the transmission part between the operation units is summed.
The specific operation process of the low-power-consumption neural network operation device adopting the operation module of this embodiment 3 is as follows: firstly, the storage control unit reads the synapse value from the storage module and sends the synapse value to the data processing module. The data processing module compresses synapses by a given compression threshold, namely selects only synapse values with absolute values not less than the given compression threshold, respectively transmits the synapse values to each operation unit, and keeps the synapse values unchanged. And then reading the neuron values from the storage module, performing corresponding compression by the data processing module, namely selecting only neuron values of which the absolute values are not less than the given threshold value, and transmitting the neuron values to the operation control unit in each operation unit. The operation control unit receives the neuron data, and judges the neuron data after taking the absolute value of the neuron data and a given threshold value. If the output value is less than the given closing threshold value, closing the multiplier and the adder, and selecting by the selector to directly transmit the data of the previous stage to the selector of the next stage; otherwise, the multiplier multiplies the neuron data and the synapse data, and then sends the multiplied data and the data to an adder, the adder receives the multiplied data and the data transmitted from the previous stage for accumulation, the result is input to a selector, and the selector selects the output result of the adder as the output result of the current stage and transmits the output result to the adder and the selector of the next stage. And repeating the steps until the operation is finished to obtain a final result. The storage control unit stores the final result into the storage module. The method improves the proportion of effective data in the algorithm to the maximum extent by screening synapses and neurons for multiple times, further reduces the calculated amount and improves the operation speed; the method fully utilizes the synapse sharing characteristic of the neural network, and avoids memory access power consumption caused by repeated reading of synapses; meanwhile, when the data to be calculated is smaller than a given threshold value, useless multipliers and adders are closed, and power consumption is further reduced.
The judgment condition for closing the multiplier and the adder can be that the value to be judged is larger than a given threshold value besides that the absolute value of the value to be judged is smaller than the given threshold value; the value to be judged is within a given value interval, namely between two thresholds of one larger threshold and one smaller threshold; or the value to be judged is outside the given value interval, namely is larger than a large threshold value or smaller than a small threshold value; or the value to be judged meets certain conditions through function mapping, such as the mapped value is equal to a given threshold value, is larger/smaller than the given threshold value, is in a given value interval and the like.
In addition, although the compression is performed by a given threshold in the above embodiment, the compression determination condition of the present disclosure is not limited to the threshold determination condition, and may also be a function mapping determination condition, and the threshold determination condition may include: less than a given threshold, greater than a given threshold, within a given value range or outside a given value range, and the like; in addition, in the present disclosure, the judgment condition for turning off the operation unit and the compression judgment condition may be the same judgment condition (for example, both adopt threshold judgment, and the selection of the threshold may be the same or different), or different judgment conditions (for example, one of the conditions is threshold judgment, and the other is mapping judgment) may be adopted, without affecting the implementation of the present disclosure.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (25)

1. An arithmetic device, the device being applied to a deep neural network, the device comprising:
an operation module comprising one or more operation units and an operation control unit,
the operation control unit is used for controlling the closing of the operation unit of the operation module according to a judgment condition; the judgment condition comprises a threshold judgment condition or a function mapping judgment condition;
the arithmetic device also comprises a data processing module and a control module, wherein the data processing module is used for expanding or compressing the synapse value and the neuron value read from the storage module;
the control module comprises a data control unit for controlling the data processing module to expand or compress the data;
the data processing module expands or compresses the synapse values and the neuron values read from the storage module, and comprises the following steps: if the synapse value is in a sparse mode, namely a sparse network expressed by sparse coding is used for compressing the neuron value according to the sparse condition of the synapse value, and the neuron value which does not need to be operated is compressed and screened out; or, if the neuron value is in the sparse mode, carrying out corresponding compression on the synapse value according to the sparse condition of the neuron value, and carrying out compression screening on the synapse value which does not need to be operated; or, giving a compression judgment condition, and compressing the synapse value and/or the neuron value according to the compression judgment condition;
the operation module comprises n multipliers positioned at a first stage and an n-input addition tree positioned at a second stage, wherein n is a positive integer;
the n multipliers of the first stage are respectively connected with the data processing module and receive neuron data and synapse data output by the data processing module.
2. The arithmetic device according to claim 1, wherein each of the arithmetic units comprises one or more arithmetic elements which are adders, multipliers, selectors, or temporary buffers.
3. The arithmetic device according to claim 1, wherein the determination condition is a threshold determination condition, and includes: less than a given threshold, greater than a given threshold, within a given range of values or outside a given range of values.
4. The arithmetic device according to claim 1, wherein the determination condition is a function mapping determination condition that determines whether or not a given condition is satisfied after the function transformation.
5. The arithmetic device according to claim 1, wherein the n multipliers of the first stage are respectively connected to an arithmetic control unit that controls turning off of the multipliers according to the determination condition.
6. The arithmetic device according to claim 5, wherein the arithmetic control unit judges data to be operated according to the judgment condition, and turns off a multiplier when it judges that an absolute value of input data to be input to the multiplier is smaller than a given threshold.
7. The arithmetic device of claim 5, wherein the addition tree comprises k stages of adders, the number of adders at a first stage is n/2, and the number of adders at a last stage, i.e., a k-th stage, is 1, where 2kN; the n/2 adders at the 1 st level of the addition tree are connected with the n multipliers and receive data signals and control signals sent by the multipliers; the adders at the 2 nd to the kth stages of the addition tree are respectively connected with the adder at the previous 1 stage, and receive the data signal and the control signal sent by the adder at the previous 1 stage.
8. The arithmetic device according to claim 5, wherein if the multiplier receives the off signal sent by the arithmetic control unit, a control signal 0 is input to the adder at the 1 st stage of the addition tree, otherwise, the multiplication result is sent to the adder at the 1 st stage of the addition tree and a control signal 1 is input; if the adder receives two control signals which are both 1, the input values are accumulated, the accumulated sum is sent to a lower-level adder, and a control signal 1 is sent to a lower level; if one control signal received by the adder is 1 and one control signal is 0, the input data of one end of 1 is directly transmitted to the lower stage and the control signal 1 is input to the lower stage; if the adder receives two control signals which are both 0, the adder is closed, and the control signal 0 is input to the next-stage adder, and so on until the addition tree is accumulated to obtain a final result.
9. The arithmetic device according to claim 1, wherein the n multipliers of the first stage and the n-input addition tree of the second stage are respectively connected to an arithmetic control unit that judges data to be operated according to the judgment condition; when it is judged that the absolute value of input data to a multiplier or adder is smaller than a given threshold, the multiplier or adder is turned off.
10. The arithmetic device according to claim 1, wherein the compression determination condition includes a threshold determination condition or a function mapping determination condition.
11. The arithmetic device according to claim 10, wherein the threshold determination condition includes: less than a given threshold, greater than a given threshold, within a given range of values or outside a given range of values.
12. The arithmetic device according to claim 1, wherein the arithmetic module comprises m arithmetic units, each arithmetic unit comprising a multiplier, an adder and a temporary buffer;
the control module comprises m operation control units, each operation control unit is respectively connected with the multiplier and the adder of one operation unit, the closing of the multiplier and the adder is controlled according to the judgment condition, and m is a positive integer.
13. The arithmetic device according to claim 1, wherein the multiplier of each arithmetic unit has three inputs for receiving neuron data and synapse data, respectively, and an output for outputting a multiplication result;
the adder has three input ends and one output end, wherein the two input ends are respectively used for receiving the multiplication result and the data input by the temporary buffer, the other input end is used for inputting a control signal, the output end is used for outputting the addition result, and the addition result is stored back to the temporary buffer to be used as the input data of the next layer of addition operation.
14. The arithmetic device according to claim 15, wherein synapse data is broadcast to each arithmetic unit, and if neuron data input to the arithmetic unit is smaller than a threshold, the multiplier and adder of the arithmetic unit are controlled by a control signal to be turned off, and the sum of parts stored in the temporary buffer is unchanged; otherwise, the arithmetic unit multiplies the two input data by a multiplier, accumulates the two input data with the data in the temporary buffer, and stores the accumulated data back to the temporary buffer.
15. The arithmetic device according to claim 1, wherein the arithmetic module includes p arithmetic units, each of which includes a multiplier, an adder, and a selector;
the control module comprises p operation control units, each operation control unit is respectively connected with the multiplier and the adder of one operation unit, the closing of the multiplier and the adder is controlled according to the judgment condition, and p is a positive integer.
16. The arithmetic device of claim 15, wherein the multiplier of each arithmetic unit has three inputs for receiving neuron data and synapse data, respectively, and an output for outputting the multiplication result;
the adder of the 1 st arithmetic unit has three input ends and an output end, wherein two input ends are respectively used for receiving multiplication results and data input by the selector of the arithmetic unit of the current stage, the other input end is used for inputting control signals, the output end is used for outputting addition results, and the addition results are sent to the next arithmetic unit by the selector to be used as input data of the addition operation of the next stage;
the adders of the 2 nd to p-th arithmetic units have three input terminals for receiving the multiplication result and the data input from the selector of the previous-stage arithmetic unit, respectively, and an output terminal for inputting a control signal, and the output terminal for outputting the addition result, which is sent to the next arithmetic unit via the selector to be used as the input data for the next-stage addition operation.
17. The arithmetic device of claim 1, wherein the arithmetic device further comprises:
the storage module is connected with the control module and used for controlling the storage module to store or read the required data; the storage module is connected with the operation module at the same time and used for inputting the data to be operated into the operation module and receiving and storing the data operated by the operation module at the same time.
18. The arithmetic device according to claim 15, wherein the control module comprises a storage control unit configured to control the storage module to store or read the required data.
19. A method of operation, the method being applied to a deep neural network, the method comprising:
setting a judgment condition; the judgment condition comprises a threshold judgment condition or a function mapping judgment condition;
expanding or compressing the synapse values and neuron values read from the storage module;
wherein the data processing module expands and compresses the synapse values and neuron values read from the storage module, including: if the synapse value is in a sparse mode, compressing the neuron value according to the sparse condition of the synapse value, and compressing and screening out the neuron value which does not need to be operated; or, if the neuron value is in the sparse mode, carrying out corresponding compression on the synapse value according to the sparse condition of the neuron value, and carrying out compression screening on the synapse value which does not need to be operated; or, giving a compression judgment condition, and compressing the synapse value and/or the neuron value according to the compression judgment condition;
controlling the closing of an operation unit of an operation module according to the compressed neuron data and the judgment condition;
if the multiplier of an arithmetic unit is not closed, performing multiplication operation on the input neuron data and synapse data and outputting a multiplication operation result; if not, the adder of the arithmetic unit is closed at the same time, and the selector selects to directly transmit the data of the previous stage to the selector of the next stage;
if the adder of the arithmetic unit is not closed, receiving the multiplication result and the input data of the current stage or the data output by the selector of the previous stage to execute addition operation and outputting the addition operation result, and sending the result to the next arithmetic unit through the selector;
otherwise, when the adder and the multiplier are closed, the selector selects the input data of the current stage from the input or the data output by the selector of the previous stage as the result output of the arithmetic unit.
20. The operation method according to claim 19, wherein the determination condition is a threshold determination condition, including: less than a given threshold, greater than a given threshold, within a given range of values or outside a given range of values.
21. The operation method according to claim 19, wherein the determination condition is a function mapping determination condition that determines whether or not a given condition is satisfied after the function transformation.
22. The method according to claim 19, wherein if an absolute value of input data to a multiplier is smaller than a predetermined threshold, the multiplier is turned off according to the determination condition.
23. The method of claim 22, wherein the multiplier receives a close signal and inputs a control signal of 0 into the adder at level 1 of the addition tree, and otherwise sends the result of the multiplication to the adder at level 1 of the addition tree and inputs a control signal of 1;
the 1 st-stage adder receives the control signal sent by the multiplier, and the 2 nd-k-stage adders respectively receive the control signal sent by the previous 1-stage adder until the addition tree is accumulated to obtain a final result; wherein the content of the first and second substances,
if the adder receives two control signals which are both 1, the input values are accumulated, the accumulated sum is sent to a lower-level adder, and a control signal 1 is sent to a lower level; if one control signal received by the adder is 1 and one control signal is 0, the input data of one end of 1 is directly transmitted to the lower stage and the control signal 1 is input to the lower stage; and if the adder at the 1 st stage receives two control signals which are both 0, closing the adder, and inputting a control signal 0 to the adder at the next stage.
24. The operation method according to claim 19, wherein the determination condition is a threshold determination condition, the operation control unit sets a threshold and compares an absolute value of data input to the multiplier/adder with the threshold, and if the absolute value is smaller than the threshold, controls the multiplier/adder to be turned off.
25. The operation method according to claim 19, wherein if the multiplier of an operation unit is not turned off, performing multiplication on the input neuron data and synapse data and outputting a multiplication result; if not, the adder of the arithmetic unit is closed at the same time, and the sum of the parts stored in the temporary cache is unchanged;
if the adder of the arithmetic unit is not closed, receiving the multiplication result and data input by the temporary buffer to execute addition operation and outputting the addition operation result, wherein the addition operation result is stored back to the temporary buffer to be used as input data of the next layer of addition operation until the operation is finished; otherwise, the partial sum stored in the temporary cache is not changed.
CN202110597369.5A 2017-06-13 2017-06-13 Arithmetic device and method Pending CN113449855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110597369.5A CN113449855A (en) 2017-06-13 2017-06-13 Arithmetic device and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110597369.5A CN113449855A (en) 2017-06-13 2017-06-13 Arithmetic device and method
CN201710441977.0A CN109086880B (en) 2017-06-13 2017-06-13 Arithmetic device and method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201710441977.0A Division CN109086880B (en) 2017-06-13 2017-06-13 Arithmetic device and method

Publications (1)

Publication Number Publication Date
CN113449855A true CN113449855A (en) 2021-09-28

Family

ID=64839078

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710441977.0A Active CN109086880B (en) 2017-06-13 2017-06-13 Arithmetic device and method
CN202110597369.5A Pending CN113449855A (en) 2017-06-13 2017-06-13 Arithmetic device and method

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201710441977.0A Active CN109086880B (en) 2017-06-13 2017-06-13 Arithmetic device and method

Country Status (1)

Country Link
CN (2) CN109086880B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3657403A1 (en) 2017-06-13 2020-05-27 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
CN109117455A (en) 2017-06-26 2019-01-01 上海寒武纪信息科技有限公司 Computing device and method
CN109102073A (en) 2017-06-21 2018-12-28 上海寒武纪信息科技有限公司 A kind of sparse training method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130162457A1 (en) * 2010-07-13 2013-06-27 University of Washington through its Center for Communications Methods and Systems for Compressed Sensing Analog to Digital Conversion
US20130318020A1 (en) * 2011-11-03 2013-11-28 Georgia Tech Research Corporation Analog programmable sparse approximation system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI315489B (en) * 2005-04-13 2009-10-01 Via Tech Inc A processor and a system for reducing power through disabling unused arithmetic logic units
CN101527010B (en) * 2008-03-06 2011-12-07 上海理工大学 Hardware realization method and system for artificial neural network algorithm
CN104539263B (en) * 2014-12-25 2017-04-12 电子科技大学 Reconfigurable low-power dissipation digital FIR filter

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130162457A1 (en) * 2010-07-13 2013-06-27 University of Washington through its Center for Communications Methods and Systems for Compressed Sensing Analog to Digital Conversion
US20130318020A1 (en) * 2011-11-03 2013-11-28 Georgia Tech Research Corporation Analog programmable sparse approximation system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁博;刘红侠;: "小数乘法器的低功耗设计与实现", 数据采集与处理, no. 03 *

Also Published As

Publication number Publication date
CN109086880B (en) 2021-06-29
CN109086880A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN107862374B (en) Neural network processing system and processing method based on assembly line
US11531889B2 (en) Weight data storage method and neural network processor based on the method
CN109086880B (en) Arithmetic device and method
US20240119286A1 (en) Adaptive artificial neural network selection techniques
US20180197084A1 (en) Convolutional neural network system having binary parameter and operation method thereof
CN111738401A (en) Model optimization method, grouping compression method, corresponding device and equipment
CN111445026A (en) Deep neural network multi-path reasoning acceleration method for edge intelligent application
EP3637327B1 (en) Computing device and method
CN111008924B (en) Image processing method and device, electronic equipment and storage medium
CN113033794B (en) Light weight neural network hardware accelerator based on deep separable convolution
CN111814973A (en) Memory computing system suitable for neural ordinary differential equation network computing
CN111831358B (en) Weight precision configuration method, device, equipment and storage medium
CN113344179A (en) IP core of binary convolution neural network algorithm based on FPGA
CN114925823A (en) Convolutional neural network compression method and edge side FPGA accelerator
CN112488287A (en) Convolutional neural network compression method, system, device and medium
CN109325590B (en) Device for realizing neural network processor with variable calculation precision
CN114005458A (en) Voice noise reduction method and system based on pipeline architecture and storage medium
CN111260049A (en) Neural network implementation method based on domestic embedded system
CN111008698B (en) Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks
CN109886394B (en) Method and device for processing weight of ternary neural network in embedded equipment
CN113159297B (en) Neural network compression method, device, computer equipment and storage medium
CN109102074B (en) Training device
US10938412B2 (en) Decompression of model parameters using functions based upon cumulative count distributions
CN113240101A (en) Method for realizing heterogeneous SoC (system on chip) by cooperative acceleration of software and hardware of convolutional neural network
CN111431540B (en) Neural network model-based FPGA configuration file arithmetic compression and decompression method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination