CN110932713B - Time sequence elastic circuit for convolutional neural network hardware accelerator - Google Patents

Time sequence elastic circuit for convolutional neural network hardware accelerator Download PDF

Info

Publication number
CN110932713B
CN110932713B CN201911093269.8A CN201911093269A CN110932713B CN 110932713 B CN110932713 B CN 110932713B CN 201911093269 A CN201911093269 A CN 201911093269A CN 110932713 B CN110932713 B CN 110932713B
Authority
CN
China
Prior art keywords
inverter
time sequence
signal
clock
mos tube
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911093269.8A
Other languages
Chinese (zh)
Other versions
CN110932713A (en
Inventor
刘昊
范雪梅
汪茹晋
陆生礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201911093269.8A priority Critical patent/CN110932713B/en
Publication of CN110932713A publication Critical patent/CN110932713A/en
Application granted granted Critical
Publication of CN110932713B publication Critical patent/CN110932713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K19/00Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
    • H03K19/0175Coupling arrangements; Interface arrangements
    • H03K19/017509Interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K19/00Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
    • H03K19/0008Arrangements for reducing power consumption
    • H03K19/0013Arrangements for reducing power consumption in field effect transistor circuits
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K19/00Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
    • H03K19/0008Arrangements for reducing power consumption
    • H03K19/0016Arrangements for reducing power consumption by using a control or a clock signal, e.g. in order to apply power supply
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K19/00Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
    • H03K19/003Modifications for increasing the reliability for protection
    • H03K19/00315Modifications for increasing the reliability for protection in field-effect transistor circuits
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K19/00Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
    • H03K19/01Modifications for accelerating switching
    • H03K19/017Modifications for accelerating switching in field-effect transistor circuits
    • H03K19/01728Modifications for accelerating switching in field-effect transistor circuits in synchronous circuits, i.e. by using clock signals
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K19/00Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
    • H03K19/0175Coupling arrangements; Interface arrangements
    • H03K19/0185Coupling arrangements; Interface arrangements using field effect transistors only
    • H03K19/018507Interface arrangements
    • H03K19/01855Interface arrangements synchronous, i.e. using clock signals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a time sequence elastic circuit for a convolutional neural network hardware accelerator, which relates to the field of digital integrated circuits and is suitable for time sequence error detection and correction of the convolutional neural network hardware accelerator. The time sequence elastic circuit comprises: the device comprises a time sequence error detection unit based on data jump detection, an on-line correction unit and a clock control unit, wherein the time sequence error detection unit consists of 13 transistors, and the detection window length can be adjusted according to different process, voltage, temperature and aging degree conditions; the time sequence error correction unit adopts a traditional latch structure with smaller power consumption and is composed of 10 transistors; the clock control unit generates a clock reverse signal and a detection window clock signal, and data delay near the rising edge of the clock is not detected, so that the error tolerance of the circuit is improved. By combining the fault tolerance of the convolutional neural network, the invention can save excessive time sequence margin reserved by the traditional circuit, and reduce the power consumption of the circuit while ensuring the data precision.

Description

Time sequence elastic circuit for convolutional neural network hardware accelerator
Technical Field
The invention relates to the field of digital integrated circuits, in particular to a time sequence elastic circuit for a convolutional neural network hardware accelerator.
Background
Machine learning techniques have been widely used in various fields such as vision and medicine. As one of machine learning techniques, a Deep Neural Network (DNN) performs different calculation forms on parameters of different layers, and has a complex workload, so that the industry improves the calculation speed through a hardware accelerator. Real data acquired by various sensors can be processed and classified rapidly, as the most advanced application technology in image, video and text processing, a deep neural network plays a pushing role in the development of the internet of things (IoT), and in order to improve the operation speed of the neural network, a hardware accelerator is generally used for accelerating a DNN algorithm.
The mobile terminal equipment of the Internet of things is gradually reduced, the requirement on circuit energy consumption is more severe, and reducing the power supply voltage is one of the methods for saving power consumption faster. However, deep submicron variations in process, voltage, temperature, aging (PVTA), etc. can increase circuit delay, and can cause timing problems to the circuit, thereby reducing the energy efficiency ratio of the circuit. In order to meet the larger time sequence requirement under low pressure, the traditional hardware accelerator design widens the time sequence margin (Timing margin) of the whole circuit, and although the circuit requirement is functionally met, certain time sequence waste is caused under normal pressure, and the energy efficiency ratio of the circuit can be reduced.
The timing elastic circuit (Timing Resilient Circuit) includes a timing error detection (Timing Error Detection, TED) circuit and a correction circuit. When the circuit generates timing delay violations, the timing elastic circuit can timely detect errors and correct output values to be correct results. The time sequence elastic circuit is inserted into the convolutional neural network hardware accelerator, so that the time sequence margin of the circuit and the limit value of the power supply voltage under the current PVTA condition can be continuously reduced, and compared with the traditional design, the method saves a large amount of time sequence margin and power consumption.
The classical TED architecture is a circuit called Razor (Razor) that constructs a same shadow latch beside the main flip-flop, sets different sampling clocks for the two flip-flops, and compares the two sets of sampling data to derive a timing error signal. The Razor structure can accurately reflect the time sequence error of the circuit, but the structure does not have an error recovery function, and a clock needs to be stopped, so that a correct result can be obtained through a circuit rollback mechanism. Meanwhile, the circuit structure has a large number of transistors, and the razor structure is used to bring a large burden to the whole circuit area and power consumption. Since DNN hardware accelerator circuits involve multiple layers of computation and have some complexity in their own right, a timing elastic circuit capable of on-line error correction and with a small number of transistors is required to detect circuit timing errors.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a time sequence elastic circuit for a convolutional neural network hardware accelerator, which is used for reducing the power consumption of the whole circuit and excessive allowance reserved in the traditional design, timely monitoring time sequence errors occurring in the convolutional neural network hardware accelerator and carrying out error correction under severe process conditions such as low pressure, high temperature and the like, and reducing the system power consumption of the hardware accelerator; by combining the fault tolerance of the neural network, the invention can reduce the power consumption of the circuit without losing the accuracy of output data.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a time sequence elastic circuit for a convolutional neural network hardware accelerator, which comprises a time sequence error detection unit, a time sequence error correction unit and a clock control unit; wherein, the liquid crystal display device comprises a liquid crystal display device,
the clock control unit is used for outputting a clock reverse signal and a detection window control signal when receiving the clock signal, wherein the clock reverse signal is input to the time sequence error correction unit, and the detection window control signal is input to the time sequence error detection unit;
the time sequence error detection unit is used for detecting time sequence errors of input data according to the detection window control signal, and outputting a high pulse error signal when the time sequence errors occur;
and the time sequence error correction unit is used for continuously sampling the input data in the period of the clock high pulse under the control of the clock signal and the clock reverse signal and outputting a correctly sampled data signal when a time sequence error occurs.
As a further optimization scheme of the time sequence elastic circuit for the convolutional neural network hardware accelerator, the time sequence error detection unit comprises a first MOS tube, a second MOS tube, a third MOS tube, a fourth MOS tube, a fifth MOS tube, a first inverter, a second inverter, a third inverter and a first transmission gate, wherein,
the input end of the first inverter is connected with the data signal input end of the time sequence elastic circuit, the output end of the first inverter is respectively connected with the input end of the second inverter, the first gate end of the first transmission gate and the source end of the second MOS tube, the output end of the second inverter is respectively connected with the gate end of the first MOS tube, the gate end of the second MOS tube, the gate end of the third MOS tube and the input end of the first transmission gate, the source end of the first MOS tube is respectively connected with the input signal end of the time sequence error detection unit and the second gate end of the first transmission gate, the drain end of the first MOS tube is respectively connected with the drain end of the second MOS tube, the gate end of the fifth MOS tube and the output end of the first transmission gate, the drain end of the third MOS tube and the source end of the fifth MOS tube are grounded, and the drain end of the fourth MOS tube and the input end of the third inverter are respectively connected, and the source end of the fourth MOS tube is connected with the power supply voltage;
the third inverter output terminal is used as the output terminal of the time sequence error detection unit and outputs a time sequence error signal.
As a further optimization scheme of the time sequence elastic circuit for the convolutional neural network hardware accelerator, when the input data signal is delayed to be in a high level and signal inversion occurs in the period, the output end of the third inverter is a high level signal.
As a further optimization scheme of the time sequence elastic circuit for the convolutional neural network hardware accelerator, the width-to-length ratio of the third MOS tube is one order of magnitude larger than that of the rest MOS tubes.
As a further optimization scheme of the timing elastic circuit for the convolutional neural network hardware accelerator of the present invention, the timing error correction unit includes a second transmission gate, a third transmission gate, a fourth inverter, a fifth inverter and a sixth inverter, wherein,
the gate ends of the second transmission gate and the third transmission gate are respectively opened by a high level and a low level of a clock signal, the input end of the second transmission gate is connected with the input end of a data signal, and the output end of the second transmission gate is connected with the input end of a fourth inverter;
the input end of the third transmission gate is respectively connected with the output end of the fourth inverter and the input end of the fifth inverter, and the output end of the third transmission gate is connected with the output end of the sixth inverter;
the fifth inverter output terminal and the sixth inverter input terminal are connected and commonly used as the output terminal of the timing error correction unit.
As a further optimization scheme of the time sequence elastic circuit for the convolutional neural network hardware accelerator, the output end of the time sequence error correction unit is a data value sampled when the clock falls.
As a further optimization scheme of the time sequence elastic circuit for the convolutional neural network hardware accelerator, the clock control unit comprises a seventh inverter, an eighth inverter and a plurality of buffer units, wherein:
the input end of the seventh inverter is a clock signal, the output end of the seventh inverter is used as an output signal of the clock control unit, and the output end of the seventh inverter is a reverse clock signal;
the input end of the eighth inverter is connected with the output end of the seventh inverter, and the output end of the eighth inverter is used as an output signal of the clock control unit and outputs a detection window clock signal;
and a plurality of buffer units connected in series are arranged at the output end of the eighth inverter and are used for controlling the size of the detection clock window.
As a further optimization scheme of the time sequence elastic circuit for the convolutional neural network hardware accelerator, the seventh inverter is a low-threshold inverter, and the eighth inverter is a high-threshold inverter; the delay from the output end to the input end of the clock control unit is increased, so that the tolerance of the time sequence error detection unit for detecting the data change near the rising edge of the clock is improved.
As a further optimization scheme of the time sequence elastic circuit for the convolutional neural network hardware accelerator, the gate end of the fourth MOS tube is connected with a detection window clock signal.
As a further optimization scheme of the timing elastic circuit for the convolutional neural network hardware accelerator, the timing error detection unit detects circuit timing errors during the period of detecting the high level of the window clock signal, and keeps the error detection signal output at the low level during the period of detecting the low level.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
(1) The invention reasonably reduces excessive timing margin reserved by the traditional design circuit, timely monitors the timing error in the convolutional neural network hardware accelerator and performs error correction under severe process conditions such as low pressure, high temperature and the like, and reduces the system power consumption of the hardware accelerator;
(2) Aiming at the time sequence error signals which are difficult to capture, the time sequence error signals are reinforced through the third MOS tube to be reserved, so that the time sequence error detection circuit can acquire the time sequence error signals, and quick identification is realized;
(3) Aiming at different PVTA environments, an adjustable time sequence detection window is provided by utilizing the characteristic that the time sequence margin of a critical path and a non-critical path in a deep neural network hardware accelerator is different;
(4) The time sequence error circuit inserted in the hardware accelerator does not bring loss to data precision by combining the fault tolerance of the deep neural network algorithm, saves the power consumption of the whole circuit and improves the energy efficiency ratio of the hardware accelerator.
Drawings
FIG. 1 is a circuit diagram of the present invention replacing registers of a two-stage pipeline in a hardware accelerator.
Fig. 2 is a timing diagram of the timing elastic circuit according to the present invention under different timing error conditions.
FIG. 3a is a schematic diagram of a convolutional neural network hardware accelerator circuit in an embodiment.
FIG. 3b is a schematic circuit diagram of a hardware accelerator computation processing unit and a dynamic or tree in an embodiment.
Fig. 4 is a schematic diagram of the operation of the dynamic voltage and frequency adjustment system in an embodiment.
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
a time sequence elastic circuit for a convolutional neural network hardware accelerator comprises a time sequence error detection unit, a time sequence error correction unit and a clock control unit; wherein, the liquid crystal display device comprises a liquid crystal display device,
the clock control unit is used for outputting a clock reverse signal and a detection window control signal when receiving the clock signal, wherein the clock reverse signal is input to the time sequence error correction unit, and the detection window control signal is input to the time sequence error detection unit;
the time sequence error detection unit is used for detecting time sequence errors of input data according to the detection window control signal, and outputting a high pulse error signal when the time sequence errors occur;
the time sequence error correction unit is used for continuously sampling input data in a clock high pulse period under the control of a clock signal and a missing reverse signal, and outputting a correctly sampled data signal when a time sequence error occurs;
the time sequence error detection unit comprises a first MOS tube M1, a second MOS tube M2, a third MOS tube M3, a fourth MOS tube M4, a fifth MOS tube M5, a first inverter INV1, a second inverter INV2, a third inverter INV3 and a first transmission gate T1, wherein,
the input end of the first inverter is connected with the data signal input end of the time sequence elastic circuit, the output end of the first inverter is respectively connected with the input end of the second inverter, the first gate end of the first transmission gate and the source end of the second MOS tube, the output end of the second inverter is respectively connected with the gate end of the first MOS tube, the gate end of the second MOS tube, the gate end of the third MOS tube and the input end of the first transmission gate, the source end of the M1 is respectively connected with the input signal end of the time sequence error detection unit and the second gate end of the first transmission gate, the drain end of the first MOS tube is respectively connected with the drain end of the second MOS tube, the gate end of the fifth MOS tube and the output end of the first transmission gate, the drain end of the third MOS tube and the source end of the fifth MOS tube are respectively connected with the drain end of the fourth MOS tube and the input end of the third inverter, and the source end of the fourth MOS tube is connected with the power supply voltage;
the third inverter output terminal is used as the output terminal of the time sequence error detection unit and outputs a time sequence error signal.
When the input data signal is delayed to a high level and signal inversion occurs in the period of time, the output end of the third inverter is a high level signal.
The width-to-length ratio of the third MOS tube is an order of magnitude larger than that of the rest MOS tubes.
The timing error correction unit includes a second transmission gate T2, a third transmission gate T3, a fourth inverter INV4, a fifth inverter INV5, and a sixth inverter INV6, wherein,
the gate ends of the second transmission gate and the third transmission gate are respectively opened by a high level and a low level of a clock signal, the input end of the second transmission gate is connected with the input end of a data signal, and the output end of the second transmission gate is connected with the input end of a fourth inverter;
the input end of the third transmission gate is respectively connected with the output end of the fourth inverter and the input end of the fifth inverter, and the output end of the third transmission gate is connected with the output end of the sixth inverter;
the fifth inverter output terminal and the sixth inverter input terminal are connected and commonly used as the output terminal of the timing error correction unit.
The output end of the time sequence error correction unit is the data value sampled at the time of the clock falling edge.
The clock control unit comprises a seventh inverter INV7, an eighth inverter INV8 and a plurality of buffer units, wherein:
the input end of the seventh inverter is a clock signal, the output end of the seventh inverter is used as an output signal of the clock control unit, and the output end of the seventh inverter is a reverse clock signal;
the input end of the eighth inverter is connected with the output end of the seventh inverter, and the output end of the eighth inverter is used as an output signal of the clock control unit and outputs a detection window clock signal;
and a plurality of buffer units connected in series are arranged at the output end of the eighth inverter and are used for controlling the size of the detection clock window.
The seventh inverter is a low threshold type inverter, and the eighth inverter is a high threshold type inverter; the delay from the output end to the input end of the clock control unit is increased, so that the tolerance of the time sequence error detection unit for detecting the data change near the rising edge of the clock is improved.
And the gate end of the fourth MOS tube is connected with a detection window clock signal.
The timing error detection unit detects a circuit timing error during detection of a window clock signal high level, and the timing error detection unit keeps an error detection signal output low level during low level.
The time sequence error detection unit comprises 13 MOS tubes and performs time sequence error detection based on data jump.
The time sequence error detection unit comprises a first inverter, a second inverter, a third inverter, a first transmission gate, a first MOS tube, a second MOS tube, a third MOS tube, a fourth MOS tube and a fifth MOS tube. The first inverter and the second inverter perform twice inversion on the input data D to obtain a delay signal D_del signal, and the third inverter inverts the drain ends of the fourth MOS tube and the fifth MOS tube to obtain a time sequence error signal. Wherein, the inverter is composed of 1 PMOS transistor and one NMOS transistor.
Aiming at the time sequence error problem of the convolutional neural network hardware accelerator, two time sequence error conditions exist, wherein one time sequence error condition is input data logic 0 state delay, and the other time sequence error condition is input data logic 1 state delay.
When the input data is in a logic low level state, the transmission gate T1 works, the delay data D_del is conducted to the gate end of the fifth MOS tube, when the circuit has time sequence errors, the delay data is in a logic high level state, the fifth MOS tube is started, the drain end of the fifth MOS tube is grounded, an elastic circuit error signal is pulled up to the logic high level state, and the time sequence errors occur in the marking circuit.
The third MOS transistor acts as a capacitor to maintain the logic high level state of the delay data, and captures the rapid change of the input data. The transistor width to length ratio (W/L) acting as a capacitor needs to be an order of magnitude larger than the rest of the MOS transistors to ensure proper operation at low voltage.
When the input data is in a logic high level state, an inverter structure formed by the first MOS tube and the second MOS tube works, the delay data D_del is reversed to the gate end of the fifth MOS tube, when the circuit generates time sequence errors, the delay data is in a logic low level state, the gate end of the fifth MOS tube is in a logic high level state, the fifth MOS tube is started, the drain end of the fifth MOS tube is grounded, a time sequence elastic circuit error signal is pulled up to the logic high level state, and a time sequence error occurs in the marking circuit.
The error correction unit is composed of a conventional latch consisting of the second transmission gate, the third transmission gate, the fourth inverter, the fifth inverter and the sixth inverter, wherein the latch can sample delayed input data during the high level period of the clock and keep correct input data during the low level period.
The clock control unit is composed of a seventh inverter with a low threshold value, an eighth inverter with a high threshold value and a plurality of buffers. The clock inversion signal NCLK is a signal obtained by inverting the clock control signal CLK, the timing error detection unit works in a detection window, and the clock control signal clk_dw of the detection window is an output signal of the eighth inverter.
In the convolutional neural network hardware accelerator, the timing margin between different registers is different, the timing margin of a non-critical path is loose, and the margin of a critical path is tense. Accordingly, the present invention performs a loose error detection process on the input signal that varies near the rising edge of the clock, thereby saving circuit power consumption.
And detecting time sequence errors of the DNN circuit, wherein a detection window can be adjusted according to different PVTA conditions. The timing error detection unit operates within a detection window and the error correction unit performs data sampling during a clock high level.
As shown in fig. 1, the time sequence elastic circuit for the convolutional neural network hardware accelerator is divided into three parts: a timing error detection unit, a timing error correction unit and a clock control unit. The time sequence error detection unit works in a detection window and is controlled by a detection window clock at the fourth MOS tube. The timing error correction unit is constituted by a conventional latch, and samples the input data during the clock high pulse.
The working time sequence of the time sequence elastic circuit is shown in fig. 2, and when the input data changes before the detection window, the time sequence error signal keeps low pulse; when the input data changes in the detection window, the time sequence error of the whole circuit is serious, and the time sequence error signal is set to be high pulse. The width of the detection window can be regulated and controlled by the clock control unit, and the requirements of different PVTA environments on the width of the detection window are different.
As shown in fig. 3a and 3b, the working principle of the convolutional neural network hardware accelerator is that a plurality of isomorphic PE computing units work under clock control, and fig. 3a shows a most promising systolic array (systolic array) architecture in the convolutional neural network.
The hardware accelerator is configured externally, and then data is transferred to computing units (PEs) of different rows and columns in the systolic array step by step through a series of buffers such as a data buffer, a weight (weight) buffer, a map (map) buffer, and the like.
The PE unit mainly comprises a multiplication and accumulation unit (Multiply Accumulate, MAC), the calculation unit performs multiplication operation on current input data and current weight data, multiplication results are transmitted to the adder unit, the adder performs accumulation operation on the current multiplication results and the previous-stage multiplication and accumulation results stored in the summation unit, the obtained results are continuously transmitted to the summation unit, and the initial value of a register of the summation unit is 0. The number of times of multiplication and accumulation of the MAC unit is determined by a counter, the counter controls the two-out-of-one multi-way gating device, and if the current calculation is not completed, the multiplication and accumulation operation is always circulated; if the calculation unit finishes calculation, the calculation result is transmitted to an output register through the alternative multi-path selector, and the output register is transmitted to the outside of the PE unit.
And (3) gradually transferring external input data and calculated data of the current PE unit between the PE units, outputting and storing the calculated data to an external output characteristic diagram buffer, and after a Relu or other activation function and Pooling (Pooling) layer is passed, finishing the work of the hardware accelerator.
The time sequence error detection unit is inserted into the key path in the PE unit to replace the register in the original path for time sequence error detection. If the convolutional neural network hardware accelerator generates a time sequence error under the current PVTA condition, a time sequence error signal is generated, and the time sequence error signals of a plurality of PE units are transmitted to a dynamic or tree.
The dynamic or tree input is the set { Err [0], err [1], … Err [ n-1] } of PE unit time sequence error signals and the reset signal Res, the reset signal is set to 1 under the clock high pulse, and if any PE unit has time sequence error, the error set signal Err_or is high pulse; the Res signal is set to 0 at the low clock level, and the error set signal Err_or is stable to be low pulse to mask the glitch error signal regardless of whether the glitch is generated by the elastic timing circuit in the PE unit. If multiple errors occur in the error pool of the PE unit, the dynamic or tree views it as an error for error signaling. The results of the dynamic or tree are ultimately passed to a Dynamic Voltage and Frequency Scaling (DVFS) system for system-level voltage and frequency scaling.
As shown in fig. 4, the DVFS technique adjusts the system clock and voltage by means of a dynamic or tree-passed error set signal. When the time sequence elastic circuit detects that the time sequence error occurs in the convolutional neural network hardware accelerator circuit, the error signal set is transmitted to a dynamic or tree, and the dynamic or tree transmits the error set signal Err_or to the frequency and voltage regulating module. The frequency adjusting module changes the period of the system clock through the clock driving signal and the clock enabling signal, and reduces the working frequency of the system; the voltage regulating module improves the power supply voltage of the convolutional neural network hardware accelerator circuit, so that the problem of insufficient timing margin of the current critical path is solved.
By combining the fault tolerance of the convolutional neural network, the embodiment can reduce the power consumption of the circuit under the condition of no loss of control data precision, and save larger timing margin in the traditional circuit design.
In summary, the present invention provides an elastic circuit for a convolutional neural network hardware accelerator, which consumes less area and power compared to the conventional timing error solving strategy. And an adjustable detection window is designed which can control the working range of the time sequence error detection unit. The elastic circuit is inserted into the key path of the convolutional neural network hardware accelerator computing unit, so that the time sequence error of the circuit can be detected rapidly and the time sequence error of the current path can be corrected. The dynamic or tree takes shielding operation to the burrs under the low level of the clock, and unnecessary power consumption loss of the system circuit is further reduced. By using the DVFS technology, continuous time sequence errors can be effectively avoided, and dynamic adjustment is carried out on the system level circuit.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention.

Claims (9)

1. A time sequence elastic circuit for a convolutional neural network hardware accelerator is characterized by comprising a time sequence error detection unit, a time sequence error correction unit and a clock control unit; wherein, the liquid crystal display device comprises a liquid crystal display device,
the clock control unit is used for outputting a clock reverse signal and a detection window control signal when receiving the clock signal, wherein the clock reverse signal is input to the time sequence error correction unit, and the detection window control signal is input to the time sequence error detection unit;
the time sequence error detection unit is used for detecting time sequence errors of input data according to the detection window control signal, and outputting a high pulse error signal when the time sequence errors occur;
a time sequence error correction unit for continuously sampling input data in a clock high pulse period under the control of a clock signal and a clock reverse signal, and outputting a correctly sampled data signal when a time sequence error occurs;
the time sequence error detection unit comprises a first MOS tube, a second MOS tube, a third MOS tube, a fourth MOS tube, a fifth MOS tube, a first inverter, a second inverter, a third inverter and a first transmission gate, wherein,
the input end of the first inverter is connected with the data signal input end of the time sequence elastic circuit, the output end of the first inverter is respectively connected with the input end of the second inverter, the first gate end of the first transmission gate and the source end of the second MOS tube, the output end of the second inverter is respectively connected with the gate end of the first MOS tube, the gate end of the second MOS tube, the gate end of the third MOS tube and the input end of the first transmission gate, the source end of the first MOS tube is respectively connected with the input signal end of the time sequence error detection unit and the second gate end of the first transmission gate, the drain end of the first MOS tube is respectively connected with the drain end of the second MOS tube, the gate end of the fifth MOS tube and the output end of the first transmission gate, the drain end of the third MOS tube and the source end of the fifth MOS tube are grounded, and the drain end of the fourth MOS tube and the input end of the third inverter are respectively connected, and the source end of the fourth MOS tube is connected with the power supply voltage;
the third inverter output terminal is used as the output terminal of the time sequence error detection unit and outputs a time sequence error signal.
2. The timing spring circuit for a convolutional neural network hardware accelerator of claim 1, wherein the output of the third inverter is a high signal when a signal flip occurs during a delay error of the input data signal to the high level.
3. The timing elastic circuit for a convolutional neural network hardware accelerator of claim 1, wherein the third MOS transistor has an aspect ratio that is an order of magnitude greater than the remaining MOS transistors.
4. The timing resilience circuit for a convolutional neural network hardware accelerator according to claim 1, wherein the timing error correction unit comprises a second transmission gate, a third transmission gate, a fourth inverter, a fifth inverter, and a sixth inverter, wherein,
the gate ends of the second transmission gate and the third transmission gate are respectively opened by a high level and a low level of a clock signal, the input end of the second transmission gate is connected with the input end of a data signal, and the output end of the second transmission gate is connected with the input end of a fourth inverter;
the input end of the third transmission gate is respectively connected with the output end of the fourth inverter and the input end of the fifth inverter, and the output end of the third transmission gate is connected with the output end of the sixth inverter;
the fifth inverter output terminal and the sixth inverter input terminal are connected and commonly used as the output terminal of the timing error correction unit.
5. The timing resilience circuit for a convolutional neural network hardware accelerator according to claim 1, wherein the output terminal of the timing error correction unit is a data value sampled at a falling edge of a clock.
6. The timing elastic circuit for a convolutional neural network hardware accelerator of claim 1, wherein the clock control unit comprises a seventh inverter, an eighth inverter, and a number of buffer units, wherein:
the input end of the seventh inverter is a clock signal, the output end of the seventh inverter is used as an output signal of the clock control unit, and the output end of the seventh inverter is a reverse clock signal;
the input end of the eighth inverter is connected with the output end of the seventh inverter, and the output end of the eighth inverter is used as an output signal of the clock control unit and outputs a detection window clock signal;
and a plurality of buffer units connected in series are arranged at the output end of the eighth inverter and are used for controlling the size of the detection clock window.
7. The timing spring circuit for a convolutional neural network hardware accelerator of claim 6, wherein the seventh inverter is a low threshold inverter and the eighth inverter is a high threshold inverter; the delay from the output end to the input end of the clock control unit is increased, so that the tolerance of the time sequence error detection unit for detecting the data change near the rising edge of the clock is improved.
8. The timing elastic circuit for a convolutional neural network hardware accelerator of claim 6, wherein the gate terminal of the fourth MOS transistor is connected to a detection window clock signal.
9. The timing resilience circuit for a convolutional neural network hardware accelerator according to claim 6, wherein said timing error detection unit detects a circuit timing error during a detection window clock signal high level, said timing error detection unit keeps an error detection signal output at a low level during a low level.
CN201911093269.8A 2019-11-11 2019-11-11 Time sequence elastic circuit for convolutional neural network hardware accelerator Active CN110932713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911093269.8A CN110932713B (en) 2019-11-11 2019-11-11 Time sequence elastic circuit for convolutional neural network hardware accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911093269.8A CN110932713B (en) 2019-11-11 2019-11-11 Time sequence elastic circuit for convolutional neural network hardware accelerator

Publications (2)

Publication Number Publication Date
CN110932713A CN110932713A (en) 2020-03-27
CN110932713B true CN110932713B (en) 2023-05-16

Family

ID=69853715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911093269.8A Active CN110932713B (en) 2019-11-11 2019-11-11 Time sequence elastic circuit for convolutional neural network hardware accelerator

Country Status (1)

Country Link
CN (1) CN110932713B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI130137B (en) * 2021-04-22 2023-03-09 Univ Of Oulu A method for increase of energy efficiency through leveraging fault tolerant algorithms into undervolted digital systems
CN116088668B (en) * 2023-04-07 2023-06-20 华中科技大学 Ultra-low power consumption time sequence error prediction chip

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201342039A (en) * 2011-11-22 2013-10-16 Silicon Space Technology Corp Memory circuit incorporating radiation-hardened memory scrub engine
CN106505994A (en) * 2015-09-07 2017-03-15 三星电子株式会社 Sequence circuit and its operational approach
CN107423153A (en) * 2017-07-24 2017-12-01 上海交通大学 A kind of correcting circuit for error checking and correction technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201342039A (en) * 2011-11-22 2013-10-16 Silicon Space Technology Corp Memory circuit incorporating radiation-hardened memory scrub engine
CN106505994A (en) * 2015-09-07 2017-03-15 三星电子株式会社 Sequence circuit and its operational approach
CN107423153A (en) * 2017-07-24 2017-12-01 上海交通大学 A kind of correcting circuit for error checking and correction technology

Also Published As

Publication number Publication date
CN110932713A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
US11139805B1 (en) Bi-directional adaptive clocking circuit supporting a wide frequency range
Kim et al. Variation-tolerant, ultra-low-voltage microprocessor with a low-overhead, within-a-cycle in-situ timing-error detection and correction technique
CN110932713B (en) Time sequence elastic circuit for convolutional neural network hardware accelerator
Bowman et al. Energy-efficient and metastability-immune timing-error detection and instruction-replay-based recovery circuits for dynamic-variation tolerance
EP3125430B1 (en) Double sampling state retention flip-flop
US8341436B2 (en) Method and system for power-state transition controllers
EP3012975B1 (en) Error resilient digital signal processing device
CN103218029B (en) Ultra-low power consumption processor pipeline structure
CN110428048B (en) Binaryzation neural network accumulator circuit based on analog delay chain
WO2017197946A1 (en) Pvtm-based, wide-voltage-range clock stretching circuit
Wirnshofer et al. Adaptive voltage scaling by in-situ delay monitoring for an image processing circuit
Alioto Ultra-low power design approaches for IoT.
Bal et al. Revamping timing error resilience to tackle choke points at NTC systems
Koskinen et al. Implementing minimum-energy-point systems with adaptive logic
EP3361637A1 (en) A method, and a synchronous digital circuit, for preventing propagation of set-up timing data errors
EP3552204B1 (en) A pulse-stretcher clock generator circuit and clock generation method for high speed memory subsystems
Veleski et al. Highly configurable framework for adaptive low power and error-resilient system-on-chip
US10282209B2 (en) Speculative lookahead processing device and method
Sato et al. Tolerating aging-induced timing violations via configurable approximations
Das Self-Sensing Processor Systems: Robust sensors and actuator systems
CN112382324B (en) Subthreshold region low-power consumption and calculation integrated CMOS circuit structure
Chitra et al. Razor flip-flop based Detector/Corrector System for Correcting Timing violations in Digital Circuits
Zhou et al. Near-threshold processor design techniques for power-constrained computing devices
Li et al. TICA: A 0.3 V, variation-resilient 64-stage deeply-pipelined Bitcoin mining core with timing slack inference and clock frequency adaption
Agwa et al. A low power self-healing resilient microarchitecture for PVT variability mitigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant