WO2021114701A1 - 处理器及降低电源纹波的方法 - Google Patents

处理器及降低电源纹波的方法 Download PDF

Info

Publication number
WO2021114701A1
WO2021114701A1 PCT/CN2020/108984 CN2020108984W WO2021114701A1 WO 2021114701 A1 WO2021114701 A1 WO 2021114701A1 CN 2020108984 W CN2020108984 W CN 2020108984W WO 2021114701 A1 WO2021114701 A1 WO 2021114701A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
control signal
processed
waiting period
processing unit
Prior art date
Application number
PCT/CN2020/108984
Other languages
English (en)
French (fr)
Inventor
孔庆海
李炜
曹庆新
王和国
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Priority to US17/623,603 priority Critical patent/US20220206554A1/en
Publication of WO2021114701A1 publication Critical patent/WO2021114701A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • G06F1/10Distribution of clock signals, e.g. skew
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/30Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations
    • G06F1/305Means for acting in the event of power-supply failure or interruption, e.g. power-supply fluctuations in the event of power-supply fluctuations

Definitions

  • the present invention relates to the field of computer technology, in particular to a processor and a method for reducing power supply ripple when the processor starts working.
  • This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on December 10, 2019.
  • the application number is 201911261783.8 and the invention title is "Processor and Method for Reducing Power Ripple”. The entire content is incorporated herein by reference. Applying.
  • processors such as central processing units, graphics processors, and neural network processors
  • processors have played more and more important roles, and the energy efficiency ratio of processors has been greatly improved.
  • the current requirements for computing power of processors are getting higher and higher, and high computing power will inevitably bring about an increase in power consumption, which makes the transient power consumption of the processor start-up very large.
  • Severe fluctuations in nanosecond-level current will bring great ripples to the DCDC (Direct current-Direct current, direct current-direct current) power supply, causing the processor to work instability.
  • DCDC Direct current-Direct current, direct current-direct current
  • the first aspect of the present application provides a processor, the processor includes a controller, at least one processing unit, the at least one processing unit includes an input buffer, an arithmetic unit, and an output buffer, the processor is connected to a power supply and An external memory, where the controller is used to determine the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit, and the processor further includes a power control unit for:
  • the processor When the processor starts working, it sends a first control signal to the at least one processing unit according to the initial waiting period number N1 and the waiting period decrement number N2, and the power control unit sends the first control signal for the first time.
  • the waiting time for a control signal is N1 clock cycles of the processor, and the waiting time for each subsequent transmission of the first control signal is decremented by N2 clock cycles. If the waiting time is decremented to less than or equal to 0, Sending the first control signal every clock cycle;
  • the at least one processing unit After receiving the first control signal, the at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and transfers the buffered data to be processed from the The input buffer is transferred to the arithmetic unit for calculation, and the calculation result is stored in the output buffer.
  • the determining the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit includes:
  • the initial waiting period number N1 is calculated according to the number of steps and the waiting period decrement number N2.
  • the power control unit includes a first control register, a second control register, and a control signal generating circuit
  • the first control register stores the number of initial waiting cycles
  • the second control register The decrement number of the waiting period is stored
  • the control signal generating circuit outputs the first control signal according to the data stored in the first control register and the second control register.
  • the power control unit is further configured to:
  • the second control signal is sent every time N1 of the clock cycles are waited until the calculation of the to-be-processed data in the external memory is completed;
  • the at least one processing unit is also used for:
  • the arithmetic unit After receiving the second control signal, read the data to be processed from the external memory, buffer the read data to be processed in the input buffer, and transfer the buffered data to be processed from the input buffer to The arithmetic unit performs an operation and stores the result of the operation in the output buffer.
  • the second aspect of the present application provides a method for reducing power ripple, which is applied to a processor.
  • the processor includes a controller, a power control unit, and at least one processing unit.
  • the at least one processing unit includes an input buffer, an arithmetic unit, and An output buffer, the processor is connected to a power supply and an external memory, the processor further includes a power control unit, and the method includes:
  • the power control unit sends a first control signal to the at least one processing unit according to the initial waiting period number N1 and the waiting period decrement number N2, and the power control unit first
  • the waiting time for sending the first control signal once is N1 clock cycles of the processor, and the waiting time for each subsequent sending of the first control signal is decremented by N2 clock cycles, if the waiting time is decremented to Is less than or equal to 0, the first control signal is sent every clock cycle;
  • the at least one processing unit After receiving the first control signal, the at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and transfers the buffered data to be processed from the The input buffer is transferred to the arithmetic unit for calculation, and the calculation result is stored in the output buffer.
  • the determining the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit includes:
  • the initial waiting period number N1 is calculated according to the number of steps and the waiting period decrement number N2.
  • the decrement number of the waiting period is proportional to the switching period of the power supply and inversely proportional to the clock period of the processor.
  • the decrement number of the waiting period is (T1*n/T2), where T1 is the switching period of the power supply, T2 is the clock period of the processor, and n is a positive value greater than 1. Integer.
  • the power control unit includes a first control register, a second control register, and a control signal generating circuit
  • the first control register stores the number of initial waiting cycles
  • the second control register The decrement number of the waiting period is stored
  • the control signal generating circuit outputs the first control signal according to the data stored in the first control register and the second control register.
  • the method further includes:
  • the power control unit sends a second control signal to the at least one processing unit according to the initial waiting period number N1 and the waiting period decrement number N2, and the power control unit sends the second control signal for the first time.
  • the waiting time is N2 clock cycles, and the waiting time for each subsequent transmission of the second control signal is incremented by N2 clock cycles. If the waiting time is increased to be greater than or equal to N1, it will wait for each N1 clock cycles. Sending the second control signal in a clock cycle until the data to be processed in the external memory is completed;
  • the at least one processing unit After receiving the second control signal, the at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and sends the buffered data to be processed from the input buffer.
  • the input buffer is transferred to the arithmetic unit for calculation, and the calculation result is stored in the output buffer.
  • the present invention determines the initial waiting cycle number N1 and the waiting cycle decrement number N2 of the processing unit of the processor; when the processor starts to work, the power control unit of the processor follows the initial waiting cycle number N1 and the waiting cycle number N1 and the waiting cycle number N2.
  • the cycle decrement number N2 sends the first control signal to the at least one processing unit.
  • the power control unit sends the first control signal for the first time with a waiting time of N1 clock cycles of the processor, and each subsequent sending The waiting time of the first control signal is decremented by N2 clock cycles, and if the waiting time is decremented to be less than or equal to 0, the first control signal is sent every clock cycle; at least After receiving the first control signal, a processing unit reads the data to be processed from the external memory, buffers the read data to be processed into the input buffer, and transfers the buffered data to be processed from the input buffer to the operation The processor performs calculations and stores the results of the calculations in the output buffer.
  • the power control unit of the present invention sends the first control signal to the processing unit according to the initial waiting period number N1 and the waiting period decrement number N2 when the processor starts working.
  • the waiting time for the power control unit to send the first control signal for the first time This is N1 processor clock cycles, and the waiting time for each subsequent transmission of the first control signal is decremented by N2 clock cycles.
  • the power control unit Since the power control unit does not send the first control signal to the processing unit every clock cycle at the initial stage of the processor startup, The first control signal is sent according to a certain waiting time, so that the processing unit does not read data for calculations every clock cycle, but reads data for calculations according to a certain waiting time, and realizes the processing by the arithmetic unit in the control processing unit.
  • the frequency of operation is used to prevent the processor’s current from rising sharply, so that the power consumption requirement of the processor when it starts to work becomes ladder-like, and the voltage of the power supply becomes stable, thereby effectively reducing the power ripple when the processor starts working, and improving the processor The stability.
  • Fig. 1 is a schematic diagram of a processor provided by an embodiment of the present invention.
  • Figure 2 is a schematic diagram of the ripple caused by the transient output current of the power supply changing from 0A to 6A.
  • FIG. 3 is a flowchart of a method for reducing power supply ripple provided by an embodiment of the present invention.
  • Fig. 4 is a detailed flow chart for determining the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit in Fig. 3.
  • FIG. 5 is a flowchart of a method for reducing power supply ripple according to another embodiment of the present invention.
  • Fig. 6 is a schematic diagram of a computer device provided by an embodiment of the present invention.
  • Fig. 7 is a schematic diagram of a power control unit provided by an embodiment of the present invention.
  • Fig. 1 is a schematic diagram of a processor provided by an embodiment of the present invention.
  • the processor 10 includes a controller 100, a power control unit 101, and at least one processing unit 102.
  • Each processing unit 102 includes an input buffer 1020, an arithmetic unit 1021, and an output buffer 1022.
  • the processor 10 is connected to a power source 11 and an external memory 12.
  • the processor 10 may be a central processing unit (Central Processing Unit (CPU), Graphics Processing Unit (GPU), Field-Programmable Gate Array (FPGA) or other types of processors.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • FPGA Field-Programmable Gate Array
  • the processor 10 may be a neural network processor (Neural Network Processor). Processing Unit, NPU).
  • the working principle of the neural network processor is to simulate human neurons and synapses in the circuit layer, and use the deep learning instruction set to directly process large-scale neurons and synapses, and one instruction completes the processing of a group of neurons.
  • NPU realizes the integration of storage and calculation through synaptic weights, thereby improving operating efficiency.
  • the power supply 11 supplies power to the processor 10.
  • the power supply 11 may be a direct current-direct current (DCDC) power supply.
  • DCDC direct current-direct current
  • the external memory 12 stores data to be processed.
  • the external memory 12 may be a synchronous dynamic random access memory (Synchronous Dynamic Random Access Memory, SDRAM), a double-rate synchronous dynamic random access memory (Double Data Rate SDRAM, or DDR SDRAM) or other types of memory.
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDR SDRAM Double Data Rate SDRAM
  • the input buffer 1020 is used to buffer the to-be-processed data read from the external memory 12.
  • the processor 10 is a neural network processor, and the data to be processed stored in the external memory 12 includes input data (for example, an image) and a weight value.
  • the input buffer 1020 includes a data buffer and a weight buffer. The data buffer is used for buffering input data, and the weight buffer is used for buffering weight values.
  • the processor 10 may be included in a chip (not shown in the figure).
  • the chip may include one or more of the processors 10.
  • the existing processor When the existing processor starts, it sends the data to be processed from the external memory to the arithmetic unit every clock cycle, which causes the current demand of the processor to change greatly at the nanosecond level. This change will be serious. Affect the stability of the voltage of the power supply, produce large ripples, and seriously affect the stability of the processor. Especially when multiple processors work in parallel on a chip, the impact is even more serious.
  • Figure 2 is a schematic diagram of the ripple caused by the transient output current of the power supply changing from 0A to 6A. It can be seen from the figure that when the transient output current changes from 0A to 6A, the ripple exceeds +50mV/-50mV, and a large ripple can easily lead to errors in the data transmission of the processor.
  • the controller 100 is used to determine the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit 102.
  • the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit 102 can be set according to empirical values. For example, a table of correspondences between different processors and the initial waiting period number N1 and the waiting period decrement number N2 can be established, and the initial waiting period number N1 and the waiting period decrement number N2 corresponding to the processor 10 can be determined according to the correspondence table. .
  • the controller 100 may determine the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit 102 in the following manner:
  • a simulation tool can be used to estimate the ripple voltage generated by the processor 10 in an extreme working scenario.
  • a simulation tool PTPX (PrimeTime PX) can be used to estimate the ripple voltage generated by the processor 10 in an extreme work scenario.
  • PTPX is a tool for static and dynamic power analysis of the whole chip based on primetime environment.
  • the ripple voltage generated is about +50mV/-50mV.
  • the ripple voltage generated by the processor 10 in an extreme work scenario is +50mV/-50mV
  • the ripple voltage allowed by the processor 10 is +20mV/-20mV
  • the current of the processor 10 changes The number of steps is 3 (that is, 50mV/20mV is rounded up).
  • the decrementing number of waiting periods is proportional to the switching period of the power supply 11 and inversely proportional to the clock period of the processor 10.
  • the decrement number of the waiting period is (T1*n/T2), where T1 is the switching period of the power supply 11, T2 is the clock period of the processor 10, and n is a positive value greater than 1. Integer. n can be a positive integer greater than or equal to 10 and less than or equal to 101. For example, n can be 20.
  • the initial waiting period number N1 is the product of the step number and the waiting period decrement number N2.
  • the power control unit 101 is configured to send a first control signal to the at least one processing unit 102 according to the initial waiting period number N1 and the waiting period decrement number N2 when the processor 10 starts working.
  • the waiting time for the power control unit 101 to transmit the first control signal for the first time is N1 clock cycles of the processor 10, and the waiting time for each subsequent transmission of the first control signal is decremented by N2 clock cycles, If the waiting time decreases to less than or equal to 0, the first control signal is sent every clock cycle.
  • the initial waiting period number N1 is 1010
  • the waiting period decrement number N2 is 200
  • the power control unit 101 waits for 1010 clock cycles and then sends the first control signal for the first time, and waits for 800 clock cycles. Then send the first control signal for the second time, send the first control signal for the third time after waiting for 600 clock cycles, send the first control signal for the fourth time after waiting for 400 clock cycles, and wait for 200 clocks
  • the first control signal is sent for the fifth time after the cycle, and the first control signal is sent every clock cycle thereafter.
  • the power control unit 101 includes a first control register 70, a second control register 72, and a control signal generating circuit 73.
  • the first control register 70 stores the initial waiting
  • the second control register 72 stores the decrement number of the waiting period
  • the control signal generating circuit 73 outputs the first control according to the data stored in the first control register 70 and the second control register 72 signal.
  • the at least one processing unit 102 is configured to, after receiving the first control signal, read the data to be processed from the external memory 12, buffer the read data to be processed in the input buffer 1020, and store the The data to be processed is transferred from the input buffer 1020 to the arithmetic unit 1021 for calculation, and the calculation result is stored in the output buffer 1022.
  • the processing unit 102 After the processing unit 102 receives the first control signal sent by the power control unit 101 for the first time, it reads the first piece of data to be processed from the external memory 12, and caches the read data to be processed to The input buffer 1020 transfers the buffered data to be processed from the input buffer 1020 to the arithmetic unit 1021 for calculation, and stores the calculation result in the output buffer 1022; the processing unit 102 receives the After the first control signal sent by the power control unit 101 for the second time, the second piece of data to be processed is read from the external memory 12, the read data to be processed is buffered in the input buffer 1020, and the buffered data is stored in the input buffer 1020.
  • the data to be processed is transferred from the input buffer 1020 to the arithmetic unit 1021 for calculation, and the calculation result is stored in the output buffer 1022; the processing unit 102 receives the third transmission from the power control unit 101 After the first control signal, read the third piece of data to be processed from the external memory 12, buffer the read data to be processed in the input buffer 1020, and transfer the buffered data to be processed from the input buffer 1020.
  • the current of the power supply 11 is controlled to increase sharply by controlling the frequency of calculation by the arithmetic unit 1021, so that the power consumption demand of the processor 10 when it starts to work is changed. It is stepped, thereby effectively reducing the power ripple when the processor 10 starts to work, and improving the stability of the processor 10.
  • the power control unit 101 is further configured to: if the amount of remaining data to be processed in the external memory 12 is less than or equal to a preset value, according to the initial waiting period number N1 and the waiting period
  • the decrement number N2 sends a second control signal to the at least one processing unit 102.
  • the power supply control unit 101 waits for the first time to send the second control signal for N2 clock cycles, and each subsequent time it sends the second control signal
  • the waiting time of the second control signal is increased by N2 clock cycles. If the waiting time is increased to be greater than or equal to N1, the second control signal is sent every time N1 clock cycles are waited until the external memory 12
  • the data to be processed in is completed.
  • the initial waiting period number N1 is 1010
  • the waiting period decrement number N2 is 200. If the number of remaining data to be processed in the external memory 12 is less than or equal to 10, the power control unit 101 waits for 200
  • the second control signal is sent for the first time after the clock cycle, the second control signal is sent for the second time after waiting for 400 clock cycles, the second control signal is sent for the third time after waiting for 600 clock cycles, and the second control signal is sent for the third time after waiting for 800 clock cycles.
  • the second control signal is sent for the fourth time after 10 clock cycles, the second control signal is sent for the fifth time after waiting for 1010 clock cycles, and the second control signal is sent every 1010 clock cycles thereafter, until all The calculation of the to-be-processed data in the external memory 12 is completed.
  • the at least one processing unit 102 is further configured to, after receiving the second control signal, read the data to be processed from the external memory 12, buffer the read data to be processed in the input buffer 1020, and store The buffered data to be processed is transferred from the input buffer 1020 to the arithmetic unit 1021 for calculation, and the calculation result is stored in the output buffer 1022.
  • control signal generating circuit 72 also outputs the second control signal according to the data stored in the first control register 70 and the second control register 71.
  • This embodiment controls the sharp drop in the current of the power supply 11 by controlling the frequency at which the arithmetic unit 1021 performs calculations when the processor 10 finishes working, so that the power consumption demand of the processor 10 when it finishes working becomes a stepped shape, which is effective
  • the ripple when the power supply 11 finishes working is reduced, and the stability of the processor 10 is further improved.
  • FIG. 3 is a flowchart of a method for reducing power supply ripple provided by an embodiment of the present invention.
  • the method for reducing power supply ripple is applied to a processor.
  • the processor includes a controller, a power control unit, and at least one processing unit.
  • the at least one processing unit includes an input buffer, an arithmetic unit, and an output buffer, and the processor is connected to a power supply and an external memory.
  • the method for reducing power supply ripple is applied to a neural network processor (NPU).
  • NPU neural network processor
  • the working principle of the neural network processor is to simulate human neurons and synapses in the circuit layer, and use the deep learning instruction set to directly process large-scale neurons and synapses, and one instruction completes the processing of a group of neurons.
  • CPU and GPU realizes the integration of storage and calculation through synaptic weights, thereby improving operating efficiency.
  • the method for reducing power supply ripple controls the drastic jump of the current of the power supply by controlling the frequency of the operation of the processor to turn on the process arithmetic operation, so that the power consumption demand of the processor becomes a stepped shape when the processor is started, thereby effectively reducing the power supply.
  • Ripple improves the stability of the processor.
  • the method for reducing power supply ripple specifically includes the following steps:
  • the controller determines the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit.
  • the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit can be set according to empirical values. For example, a table of correspondences between different processors, the initial waiting period number N1 and the waiting period decrement number N2 can be established, and the initial waiting period number N1 and the waiting period decrement number N2 corresponding to the processor can be determined according to the correspondence table.
  • the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit may be determined according to the method described in FIG. 4.
  • the power control unit sends a first control signal to the at least one processing unit according to the initial waiting period number N1 and the waiting period decrement number N2, and the power control unit
  • the waiting time for sending the first control signal for the first time is N1 clock cycles of the processor, and the waiting time for each subsequent sending of the first control signal is decremented by N2 clock cycles, if the waiting time Decrease to less than or equal to 0, then the first control signal is sent every clock cycle.
  • the initial waiting period number N1 is 1010
  • the waiting period decrement number N2 is 200
  • the power control unit waits for 1010 clock cycles and then sends the first control signal for the first time, and waits for 800 clock cycles.
  • Send the first control signal for the second time send the first control signal for the third time after waiting for 600 clock cycles, send the first control signal for the fourth time after waiting for 400 clock cycles, and wait for 200 clock cycles
  • the first control signal is sent for the fifth time, and the first control signal is sent every clock cycle thereafter.
  • the at least one processing unit After receiving the first control signal, the at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and stores the buffered data to be processed Transfer from the input buffer to the arithmetic unit for calculation, and store the calculation result in the output buffer.
  • the processing unit receives the first control signal, it reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and transfers the buffered data to be processed from the input buffer.
  • the input buffer is transferred to the arithmetic unit for calculation, and the calculation result is stored in the output buffer.
  • the processing unit receives the first control signal sent by the power control unit for the first time, it reads the first piece of data to be processed from the external memory, and buffers the read data to be processed to the input A buffer, which transmits the buffered data to be processed from the input buffer to the arithmetic unit for calculation, and stores the calculation result in the output buffer; the processing unit receives the second transmission from the power control unit After the first control signal, read the second piece of data to be processed from the external memory, buffer the read data to be processed in the input buffer, and transfer the buffered data to be processed from the input buffer to The arithmetic unit performs an operation and stores the result of the operation in the output buffer; after the processing unit receives the first control signal sent by the power control unit for the third time, it reads the third data from the external memory For data to be processed, buffer the read data to be processed into the input buffer, transfer the buffered data to be processed from the input buffer to the arithmetic unit for calculation, and store the calculation result in the output buffer
  • the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit of the processor are determined; when the processor starts to work, the power control unit of the processor follows the initial waiting period number N1 and the The decrement number of waiting periods N2 sends the first control signal to the at least one processing unit.
  • the first time the power control unit sends the first control signal for the first time the waiting time is N1 clock cycles of the processor, and each subsequent time The waiting time for sending the first control signal is decreased by N2 clock cycles, and if the waiting time is decreased to less than or equal to 0, the first control signal is sent every clock cycle;
  • at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and transmits the buffered data to be processed from the input buffer to The arithmetic unit performs calculations and stores the calculation results in the output buffer.
  • the power control unit includes a first control register, a second control register, and a control signal generating circuit
  • the first control register stores the number of initial waiting cycles
  • the second control register stores all According to the decrement number of the waiting period
  • the control signal generating circuit outputs the first control signal according to the data stored in the first control register and the second control register.
  • the existing processor When the existing processor starts, it sends the data to be processed from the external memory to the arithmetic unit every clock cycle, which causes the current demand of the processor to change greatly at the nanosecond level. This change will be serious. Affect the stability of the voltage of the power supply, produce large ripples, and seriously affect the stability of the processor. Especially when multiple processors work in parallel on a chip, the impact is even more serious.
  • the frequency of the arithmetic unit is controlled to control the current of the power supply to rise sharply, so that the power consumption demand of the processor when it starts to work becomes a stepped shape, thereby effectively reducing the processor's startup work.
  • the power ripple improves the stability of the processor.
  • Fig. 4 is a detailed flow chart for determining the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit in Fig. 3.
  • determining the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit includes the following steps:
  • a simulation tool can be used to estimate the ripple voltage generated by the processor in an extreme work scenario.
  • a simulation tool PTPX (PrimeTime PX) can be used to estimate the ripple voltage generated by the processor in an extreme work scenario.
  • PTPX is a tool for static and dynamic power analysis of the whole chip based on primetime environment.
  • the ripple voltage generated is about +50mV/-50mV.
  • the ripple voltage generated by the processor in the extreme work scenario is +50mV/-50mV
  • the ripple voltage allowed by the processor is +20mV/-20mV
  • the number of steps of the current change of the processor is 3 (ie 50mV/20mV rounded up).
  • the decrement number of the waiting period is proportional to the switching period of the power supply and inversely proportional to the clock period of the processor.
  • the decrement number of the waiting period is (T1*n/T2), where T1 is the switching period of the power supply, T2 is the clock period of the processor, and n is a positive integer greater than 1.
  • n can be a positive integer greater than or equal to 10 and less than or equal to 101.
  • n can be 20.
  • the initial waiting period number N1 is the product of the step number and the waiting period decrement number N2.
  • FIG. 5 is a flowchart of a method for reducing power supply ripple according to another embodiment of the present invention.
  • the method for reducing power supply ripple specifically includes the following steps:
  • the controller determines the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit.
  • the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit can be set according to empirical values. For example, a table of correspondences between different processors, the initial waiting period number N1 and the waiting period decrement number N2 can be established, and the initial waiting period number N1 and the waiting period decrement number N2 corresponding to the processor can be determined according to the correspondence table.
  • the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit may be determined according to the method described in FIG. 4.
  • the power control unit sends a first control signal to the at least one processing unit according to the initial waiting period number N1 and the waiting period decrement number N2, and the power control unit
  • the waiting time for sending the first control signal for the first time is N1 clock cycles of the processor, and the waiting time for each subsequent sending of the first control signal is decremented by N2 clock cycles, if the waiting time Decrease to less than or equal to 0, then the first control signal is sent every clock cycle.
  • the initial waiting period number N1 is 1010
  • the waiting period decrement number N2 is 200
  • the power control unit waits for 1010 clock cycles and then sends the first control signal for the first time, and waits for 800 clock cycles.
  • Send the first control signal for the second time send the first control signal for the third time after waiting for 600 clock cycles, send the first control signal for the fourth time after waiting for 400 clock cycles, and wait for 200 clock cycles
  • the first control signal is sent for the fifth time, and the first control signal is sent every clock cycle thereafter.
  • the at least one processing unit After receiving the first control signal, the at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and stores the buffered data to be processed Transfer from the input buffer to the arithmetic unit for calculation, and store the calculation result in the output buffer.
  • the processing unit receives the first control signal, it reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and transfers the buffered data to be processed from the input buffer.
  • the input buffer is transferred to the arithmetic unit for calculation, and the calculation result is stored in the output buffer.
  • the processing unit receives the first control signal sent by the power control unit for the first time, it reads the first piece of data to be processed from the external memory, and buffers the read data to be processed to the input A buffer, which transmits the buffered data to be processed from the input buffer to the arithmetic unit for calculation, and stores the calculation result in the output buffer; the processing unit receives the second transmission from the power control unit After the first control signal, read the second piece of data to be processed from the external memory, buffer the read data to be processed in the input buffer, and transfer the buffered data to be processed from the input buffer to The arithmetic unit performs an operation and stores the result of the operation in the output buffer; after the processing unit receives the first control signal sent by the power control unit for the third time, it reads the third data from the external memory For data to be processed, buffer the read data to be processed in the input buffer, transfer the buffered data to be processed from the input buffer to the arithmetic unit for calculation, and store the calculation result in the output buffer
  • the power control unit sends a second control signal to the initial waiting period number N1 and the waiting period decrement number N2.
  • the waiting time for the power control unit to send the second control signal for the first time is N2 clock cycles, and the waiting time for each subsequent sending of the second control signal is incremented by N2 clocks Cycle, if the waiting time is increased to be greater than or equal to N1, the second control signal is sent every N1 clock cycles until the calculation of the to-be-processed data in the external memory is completed.
  • the initial waiting period number N1 is 1010
  • the waiting period decrement number N2 is 200. If the number of remaining data to be processed in the external memory is less than or equal to 10, the power control unit waits for 200 clock cycles Then send the second control signal for the first time, send the second control signal for the second time after waiting for 400 clock cycles, send the second control signal for the third time after waiting for 600 clock cycles, and wait for 800 clocks The second control signal is sent for the fourth time after the cycle, the second control signal is sent for the fifth time after waiting for 1010 clock cycles, and the second control signal is sent every 1010 clock cycles thereafter, until the external The calculation of the data to be processed in the memory is completed.
  • the at least one processing unit After receiving the second control signal, the at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and stores the buffered data to be processed Transfer from the input buffer to the arithmetic unit for calculation, and store the calculation result in the output buffer.
  • the input buffer is transferred to the arithmetic unit for calculation, and the calculation result is stored in the output buffer.
  • the method for reducing power ripple in the second embodiment determines the initial waiting period number N1 and the waiting period decrement number N2 of the processing unit; when the processor starts to work, the power control unit follows the initial waiting period number N1 And the decrement number of waiting periods N2 sending a first control signal to the at least one processing unit, and the first time the power control unit sends the first control signal for the first time the waiting time is N1 clock cycles of the processor, The waiting time for each subsequent sending of the first control signal is decreased by N2 clock cycles, and if the waiting time is decreased to less than or equal to 0, the first control signal is sent every clock cycle; After receiving the first control signal, at least one processing unit reads the data to be processed from the external memory, buffers the read data to be processed in the input buffer, and transfers the buffered data to be processed from the input buffer.
  • the buffer is transmitted to the arithmetic unit for calculation, and the calculation result is stored in the output buffer; if the amount of remaining data to be processed in the external memory is less than or equal to a preset value, the power control unit follows the initial The waiting period number N1 and the waiting period decrement number N2 send a second control signal to the at least one processing unit, and the waiting time for the power control unit to send the second control signal for the first time is N2 clock periods , The waiting time for each subsequent sending of the second control signal is increased by N2 clock cycles, and if the waiting time is increased to be greater than or equal to N1, the second control signal is sent every waiting for N1 clock cycles , Until the calculation of the data to be processed in the external memory is completed; after receiving the second control signal, the at least one processing unit reads the data to be processed from the external memory, and caches the read data in the
  • the input buffer transfers the buffered data to be processed from the input buffer to the arithmetic unit for calculation, and stores the calculation result in the output
  • the method for reducing power supply ripple in the second embodiment not only controls the operating frequency of the arithmetic unit when the processor starts to work, but also controls the operating frequency of the arithmetic unit when the processor ends to control the sharp rise of the current of the power supply. And drop, make the power consumption requirements of the processor start and end work into a stepped shape, thereby effectively reducing the ripple of the power supply during start and end work, and improve the stability of the processor.
  • the power control unit includes a first control register, a second control register, and a control signal generating circuit
  • the first control register stores the number of initial waiting cycles
  • the second control register stores all According to the decrement number of the waiting period
  • the control signal generating circuit outputs the first control signal and the second control signal according to the data stored in the first control register and the second control register.
  • Fig. 6 is a schematic diagram of a computer device provided by an embodiment of the present invention.
  • the computer device 6 includes a processor 60, a memory 61, and at least one communication bus 62.
  • the processor 60 may be the processor 10 in FIG. 1 and implements the steps in the embodiment of the method for reducing power ripple, such as steps 301-303 shown in FIG. 3 or steps 501-505 in FIG. 5.
  • the computer device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the schematic diagram 6 is only an example of the computer device 6 and does not constitute a limitation on the computer device 6. It may include more or less components than those shown in the figure, or combine certain components, or be different.
  • the computer device 6 may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 60 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor 60 can also be any conventional processor, etc.
  • the processor 60 is the control center of the computer device 6, which uses various interfaces and lines to connect the entire computer device 6 Various parts.
  • the memory 61 may be used to store computer programs and/or modules/units.
  • the processor 60 runs or executes the computer programs and/or modules/units stored in the memory 61, and calls the computer programs and/or modules/units stored in the memory 61.
  • the data realizes various functions of the computer device 6.
  • the memory 61 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function, etc.; the data storage area may store data created according to the use of the computer device 6 Wait.
  • the memory 61 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), Secure Digital (Secure Digital, SD) card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), Secure Digital (Secure Digital, SD) card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated modules/units of the computer device 6 When the integrated modules/units of the computer device 6 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signals telecommunications signals
  • software distribution media any entity or device capable of carrying the computer program code
  • recording medium U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media.
  • the functional units in the various embodiments of the present invention may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
  • the embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, each process of the embodiment of the image retrieval method provided by the embodiment of the present invention is implemented, and To achieve the same technical effect, in order to avoid repetition, I will not repeat them here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Microcomputers (AREA)
  • Power Sources (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种处理器及降低电源纹波的方法,包括控制器(100)、处理单元(102),处理单元(102)包括输入缓存器(1020)、运算器(1021)和输出缓存器(1022),处理器(10)连接电源(11)和外部存储器(12),处理器(10)还包括电源控制单元(101),控制器(100)用于确定处理单元(102)的初始等待周期数N1和等待周期递减数N2,电源控制单元(101)用于:在处理器(10)启动工作时,按照N1和N2发送第一控制信号给处理单元(102);处理单元(102)收到第一控制信号后,从外部存储器(12)读取待处理数据,将读取的待处理数据缓存到输入缓存器(1020),将缓存的待处理数据从输入缓存器(1020)传送给运算器(1021)进行运算,将运算结果存入输出缓存器(1022)。该方法能够有效降低处理器(10)启动工作时的电源纹波,提升处理器(10)的稳定性。

Description

处理器及降低电源纹波的方法 技术领域
本发明涉及计算机技术领域,具体涉及一种处理器及降低处理器启动工作时的电源纹波的方法。本申请要求于2019年12月10日提交中国专利局,申请号为201911261783.8、发明名称为“处理器及降低电源纹波的方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
背景技术
随着计算机的发展,处理器(例如中央处理器、图形处理器、神经网络处理器)扮演的角色越来越重,处理器能效比有了极大的提升。然而,当前对处理器(例如神经网络处理器)的算力的要求越来越高,高算力必然带来功耗的提升,使得处理器启动工作的瞬态功耗非常大。纳秒级别电流剧烈波动会给DCDC(Direct current-Direct current,直流-直流)电源带来很大的纹波,引起处理器工作的不稳定。
技术解决方案
鉴于以上内容,有必要提出一种处理器及降低处理器启动工作时的电源纹波的方法,其可以有效降低处理器启动工作时的电源纹波,提升处理器的稳定性。
本申请的第一方面提供一种处理器,所述处理器包括控制器、至少一个处理单元,所述至少一个处理单元包括输入缓存器、运算器和输出缓存器,所述处理器连接电源和外部存储器,所述控制器用于确定所述处理单元的初始等待周期数N1和等待周期递减数N2,所述处理器还包括电源控制单元,用于:
在所述处理器启动工作时,按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号;
所述至少一个处理单元收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
另一种可能的实现方式中,所述确定所述处理单元的初始等待周期数N1和等待周期递减数N2包括:
获取所述处理器在极限工作场景产生的纹波电压;
根据所述处理器在极限工作场景产生的纹波电压和所述处理器允许的纹波电压确定所述处理器的电流变化的阶梯数;
根据所述电源的开关周期和所述处理器的时钟周期确定所述等待周期递减数N2;
根据所述阶梯数和所述等待周期递减数N2计算所述初始等待周期数N1。
另一种可能的实现方式中,所述电源控制单元包括第一控制寄存器、第二控制寄存器和控制信号产生电路,所述第一控制寄存器存储所述初始等待周期数,所述第二控制寄存器存储所述等待周期递减数,所述控制信号产生电路根据所述第一控制寄存器和所述第二控制寄存器存储的数据输出所述第一控制信号。
另一种可能的实现方式中,所述电源控制单元还用于:
若所述外部存储器剩余的待处理数据的数量小于或等于预设值,按照所述初始等待周期数N1和所述等待周期递减数N2发送第二控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第二控制信号的等待时间为N2个所述时钟周期,后续每次发送所述第二控制信号的等待时间递增N2个所述时钟周期,若所述等待时间递增至大于或者等于N1,则每等待N1个所述时钟周期发送所述第二控制信号,直至所述外部存储器中的待处理数据运算完毕;
所述至少一个处理单元还用于:
收到所述第二控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
本申请的第二方面提供降低电源纹波的方法,应用于处理器,所述处理器包括控制器、电源控制单元和至少一个处理单元,所述至少一个处理单元包括输入缓存器、运算器和输出缓存器,所述处理器连接电源和外部存储器,所述处理器还包括电源控制单元,所述方法包括:
确定所述处理单元的初始等待周期数N1和等待周期递减数N2;
在所述处理器启动工作时,所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号;
所述至少一个处理单元收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
另一种可能的实现方式中,所述确定所述处理单元的初始等待周期数N1和等待周期递减数N2包括:
获取所述处理器在极限工作场景产生的纹波电压;
根据所述处理器在极限工作场景产生的纹波电压和所述处理器允许的纹波电压确定所述处理器的电流变化的阶梯数;
根据所述电源的开关周期和所述处理器的时钟周期确定所述等待周期递减数N2;
根据所述阶梯数和所述等待周期递减数N2计算所述初始等待周期数N1。
另一种可能的实现方式中,所述等待周期递减数与所述电源的开关周期成正比,与所述处理器的时钟周期成反比。
另一种可能的实现方式中,所述等待周期递减数为(T1*n/T2),其中T1为所述电源的开关周期,T2为所述处理器的时钟周期,n为大于1的正整数。
另一种可能的实现方式中,所述电源控制单元包括第一控制寄存器、第二控制寄存器和控制信号产生电路,所述第一控制寄存器存储所述初始等待周期数,所述第二控制寄存器存储所述等待周期递减数,所述控制信号产生电路根据所述第一控制寄存器和所述第二控制寄存器存储的数据输出所述第一控制信号。
另一种可能的实现方式中,若所述外部存储器剩余的待处理数据的数量小于或等于预设值,所述方法还包括:
所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第二控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第二控制信号的等待时间为N2个所述时钟周期,后续每次发送所述第二控制信号的等待时间递增N2个所述时钟周期,若所述等待时间递增至大于或者等于N1,则每等待N1个所述时钟周期发送所述第二控制信号,直至所述外部存储器中的待处理数据运算完毕;
所述至少一个处理单元收到所述第二控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
本发明确定处理器的处理单元的初始等待周期数N1和等待周期递减数N2;在所述处理器启动工作时,所述处理器的电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号;所述处理器的至少一个处理单元收到所述第一控制信号后,从外部存储器读取待处理数据,将读取的待处理数据缓存到输入缓存器,将缓存的待处理数据从所述输入缓存器传送给运算器进行运算,将运算结果存入输出缓存器。
现有的处理器在工作启动时,会将外部存储器的待处理数据在每个时钟周期送到运算器进行运算,导致处理器的电流需求在纳秒级别产生很大的变化,这个变化会严重影响电源的电压的稳定性,产生大的纹波,严重影响处理器工作的稳定性。而本发明的电源控制单元在处理器启动工作时按照初始等待周期数N1和等待周期递减数N2发送第一控制信号给处理单元,电源控制单元第一次发送所述第一控制信号的等待时间为N1个处理器的时钟周期,后续每次发送第一控制信号的等待时间递减N2个时钟周期,由于电源控制单元在处理器启动初期不是每个时钟周期发送第一控制信号给处理单元,而是按照一定的等待时间发送第一控制信号,使得处理单元不是每个时钟周期读取数据进行运算,而是按照一定的等待时间读取数据进行运算,实现了通过控制处理单元中的运算器进行运算的频率来避免处理器的电流剧烈上升,使处理器启动工作时的功耗需求变成阶梯状,电源的电压变得稳定,从而有效降低处理器启动工作时的电源纹波,提升处理器的稳定性。
附图说明
图1是本发明实施例提供的处理器的示意图。
图2是电源的瞬态输出电流从0A变化到6A引起的纹波的示意图。
图3是本发明实施例提供的降低电源纹波的方法的流程图。
图4是图3中确定处理单元的初始等待周期数N1和等待周期递减数N2的细化流程图。
图5是本发明另一实施例提供的降低电源纹波的方法的流程图。
图6是本发明实施例提供的计算机装置的示意图。
图7是本发明实施例提供的电源控制单元的示意图。
本发明的实施方式
图1是本发明实施例提供的处理器的示意图。
在本实施例中,所述处理器10包括控制器100、电源控制单元101和至少一个处理单元102。每个处理单元102包括输入缓存器1020、运算器1021和输出缓存器1022。所述处理器10连接电源11和外部存储器12。
所述处理器10可以是中央处理器(Central Processing Unit,CPU)、图形处理器(Graphics Processing Unit,GPU)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或其他类型的处理器。
在一具体实施例中,所述处理器10可以是神经网络处理器(Neural Network Processing Unit,NPU)。神经网络处理器的工作原理是在电路层模拟人类神经元和突触,并且用深度学习指令集直接处理大规模的神经元和突触,一条指令完成一组神经元的处理。相比于CPU和GPU,NPU通过突触权重实现存储和计算一体化,从而提高运行效率。
所述电源11为所述处理器10供电。所述电源11可以是直流-直流(Direct current-Direct current,DCDC)电源。
所述外部存储器12存储待处理数据。所述外部存储器12可以是同步动态随机存储器(Synchronous Dynamic Random Access Memory,SDRAM)、双倍速率同步动态随机存储器(Double Data Rate SDRAM,即DDR SDRAM)或其他类型的存储器。
所述输入缓存器1020用于缓存从所述外部存储器12读取的待处理数据。
在一具体实施例中,所述处理器10是神经网络处理器,所述外部存储器12存储的待处理数据包括输入数据(例如图像)和权重值。所述输入缓存器1020包括数据缓存器和权重缓存器,所述数据缓存器用于缓存输入数据,所述权重缓存器用于缓存权重值。
所述处理器10可以包含在芯片中(图上未示出)。所述芯片可以包括一个或多个所述处理器10。
现有的处理器在工作启动时,会将外部存储器的待处理数据在每个时钟周期送到运算器进行运算,导致处理器的电流需求在纳秒级别产生很大的变化,这个变化会严重影响电源的电压的稳定性,产生大的纹波,严重影响处理器工作的稳定性。尤其是一颗芯片上多个处理器并行工作时,影响就更严重。
图2是电源的瞬态输出电流从0A变化到6A引起的纹波的示意图。从图中可知,当瞬态输出电流从0A变化到6A时,纹波超过了+50mV/-50mV,大的纹波很容易导致处理器的数据传输出错。
本实施例中,所述控制器100用于确定所述处理单元102的初始等待周期数N1和等待周期递减数N2。
所述处理单元102的初始等待周期数N1和等待周期递减数N2可以根据经验值进行设置。例如,可以建立不同的处理器与初始等待周期数N1和等待周期递减数N2的对应关系表,根据所述对应关系表确定所述处理器10对应的初始等待周期数N1和等待周期递减数N2。
或者,所述控制器100可以按照如下方式确定所述处理单元102的初始等待周期数N1和等待周期递减数N2:
(1)获取所述处理器10在极限工作场景产生的纹波电压。
可以通过仿真工具估算所述处理器10在极限工作场景产生的纹波电压。
例如,可以通过仿真工具PTPX(PrimeTime PX)估算所述处理器10在极限工作场景产生的纹波电压。PTPX是基于primetime环境,对全芯片进行静态和动态功耗分析的工具。
在一具体实例中,参阅图2所示,所述处理器10在极限工作场景瞬态输出电流从0A变化到6A时,产生的纹波电压约为+50mV/-50mV。
(2)根据所述处理器10在极限工作场景产生的纹波电压和所述处理器10允许的纹波电压确定所述处理器10的电流变化的阶梯数。
例如,所述处理器10在极限工作场景产生的纹波电压为+50mV/-50mV,所述处理器10允许的纹波电压为+20mV/-20mV,则所述处理器10的电流变化的阶梯数为3(即50mV/20mV向上取整)。
(3)根据所述电源11的开关周期和所述处理器10的时钟周期确定所述等待周期递减数N2。
在本实施例中,所述等待周期递减数与所述电源11的开关周期成正比,与所述处理器10的时钟周期成反比。
在一具体实施例中,所述等待周期递减数为(T1*n/T2),其中T1为所述电源11的开关周期,T2为所述处理器10的时钟周期,n为大于1的正整数。n可以取大于等于10且小于等于101的正整数。例如,n可以取20。
T1*n表示所述处理器10的电流的每个阶梯的长度。例如,所述电源11的开关周期是1010ns,n取20,则所述处理器10的电流的每个阶梯的长度为20000ns。假设处理器10的时钟周期是2ns,则等待周期递减数为20000 ns /2ns=10100。
(4)根据所述阶梯数和所述等待周期递减数N2计算所述初始等待周期数N1。
在本实施例中,所述初始等待周期数N1为所述阶梯数和所述等待周期递减数N2的乘积。
所述电源控制单元101用于在所述处理器10启动工作时,按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元102,所述电源控制单元101第一次发送所述第一控制信号的等待时间为N1个所述处理器10的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号。
例如,所述初始等待周期数N1为1010,所述等待周期递减数N2为200,所述电源控制单元101等待1010个时钟周期后第一次发送所述第一控制信号,等待800个时钟周期后第二次发送所述第一控制信号,等待600个时钟周期后第三次发送所述第一控制信号,等待400个时钟周期后第四次发送所述第一控制信号,等待200个时钟周期后第五次发送所述第一控制信号,之后每个所述时钟周期发送所述第一控制信号。
在一具体实施例中,如图7所示,所述电源控制单元101包括第一控制寄存器70、第二控制寄存器72和控制信号产生电路73,所述第一控制寄存器70存储所述初始等待周期数,所述第二控制寄存器72存储所述等待周期递减数,所述控制信号产生电路73根据所述第一控制寄存器70和所述第二控制寄存器72存储的数据输出所述第一控制信号。
所述至少一个处理单元102用于收到所述第一控制信号后,从所述外部存储器12读取待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022。
所述处理单元102每次收到所述第一控制信号后,从所述外部存储器12读取待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022。例如,所述处理单元102收到所述电源控制单元101第一次发送的第一控制信号后,从所述外部存储器12读取第一条待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022;所述处理单元102收到所述电源控制单元101第二次发送的第一控制信号后,从所述外部存储器12读取第二条待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022;所述处理单元102收到所述电源控制单元101第三次发送的第一控制信号后,从所述外部存储器12读取第三条待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022;所述处理单元102收到所述电源控制单元101第四次发送的第一控制信号后,从所述外部存储器12读取第四条待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022;所述处理单元102收到所述电源控制单元101第五次发送的第一控制信号后,从所述外部存储器12读取第五条待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022;之后每个所述时钟周期接收到所述电源控制单元101发送的第一控制信号后,从所述外部存储器12读取第五条待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022。
本实施例针对处理器10在启动工作时导致的电源纹波问题,通过控制运算器1021进行运算的频率来控制所述电源11的电流剧烈上升,使处理器10启动工作时的功耗需求变成阶梯状,从而有效降低处理器10启动工作时的电源纹波,提升处理器10的稳定性。
在另一实施例中,所述电源控制单元101还用于,若所述外部存储器12剩余的待处理数据的数量小于或等于预设值,按照所述初始等待周期数N1和所述等待周期递减数N2发送第二控制信号给所述至少一个处理单元102,所述电源控制单元101第一次发送所述第二控制信号的等待时间为N2个所述时钟周期,后续每次发送所述第二控制信号的等待时间递增N2个所述时钟周期,若所述等待时间递增至大于或者等于N1,则每等待N1个所述时钟周期发送所述第二控制信号,直至所述外部存储器12中的待处理数据运算完毕。
例如,所述初始等待周期数N1为1010,所述等待周期递减数N2为200,若所述外部存储器12剩余的待处理数据的条数小于或等于10,所述电源控制单元101等待200个时钟周期后第一次发送所述第二控制信号,等待400个时钟周期后第二次发送所述第二控制信号,等待600个时钟周期后第三次发送所述第二控制信号,等待800个时钟周期后第四次发送所述第二控制信号,等待1010个时钟周期后第五次发送所述第二控制信号,之后每等待1010个时钟周期后发送所述第二控制信号,直至所述外部存储器12中的待处理数据运算完毕。
所述至少一个处理单元102还用于,收到所述第二控制信号后,从所述外部存储器12读取待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022。
所述处理单元102每次收到所述第二控制信号后,从所述外部存储器12读取待处理数据,将读取的待处理数据缓存到所述输入缓存器1020,将缓存的待处理数据从所述输入缓存器1020传送给所述运算器1021进行运算,将运算结果存入所述输出缓存器1022。
在该实施例中,所述控制信号产生电路72还根据所述第一控制寄存器70和所述第二控制寄存器71存储的数据输出所述第二控制信号。
该实施例通过在处理器10结束工作时控制运算器1021进行运算的频率,来控制所述电源11的电流的剧烈下降,使处理器10结束工作时的功耗需求变成阶梯状,从而有效降低电源11结束工作时的纹波,进一步提升处理器10的稳定性。
图3是本发明实施例提供的降低电源纹波的方法的流程图。
所述降低电源纹波的方法应用于处理器。所述处理器包括控制器、电源控制单元和至少一个处理单元,所述至少一个处理单元包括输入缓存器、运算器和输出缓存器,所述处理器连接电源和外部存储器。
在一具体实施例中,所述降低电源纹波的方法应用于神经网络处理器(NPU)。神经网络处理器的工作原理是在电路层模拟人类神经元和突触,并且用深度学习指令集直接处理大规模的神经元和突触,一条指令完成一组神经元的处理。相比于CPU和GPU,NPU通过突触权重实现存储和计算一体化,从而提高运行效率。
所述降低电源纹波的方法通过控制处理器工作开启过程运算器进行运算的频率来控制所述电源的电流剧烈跳变,使处理器启动时的功耗需求变成阶梯状,从而有效降低电源的纹波,提升处理器的稳定性。
如图3所示,所述降低电源纹波的方法具体包括以下步骤:
301,控制器确定所述处理单元的初始等待周期数N1和等待周期递减数N2。
所述处理单元的初始等待周期数N1和等待周期递减数N2可以根据经验值进行设置。例如,可以建立不同的处理器与初始等待周期数N1和等待周期递减数N2的对应关系表,根据所述对应关系表确定所述处理器对应的初始等待周期数N1和等待周期递减数N2。
或者,可以根据图4描述的方法确定所述处理单元的初始等待周期数N1和等待周期递减数N2。
302,在所述处理器启动工作时,所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号。
例如,所述初始等待周期数N1为1010,所述等待周期递减数N2为200,所述电源控制单元等待1010个时钟周期后第一次发送所述第一控制信号,等待800个时钟周期后第二次发送所述第一控制信号,等待600个时钟周期后第三次发送所述第一控制信号,等待400个时钟周期后第四次发送所述第一控制信号,等待200个时钟周期后第五次发送所述第一控制信号,之后每个所述时钟周期发送所述第一控制信号。
303,所述至少一个处理单元收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
所述处理单元每次收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。例如,所述处理单元收到所述电源控制单元第一次发送的第一控制信号后,从所述外部存储器读取第一条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第二次发送的第一控制信号后,从所述外部存储器读取第二条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第三次发送的第一控制信号后,从所述外部存储器读取第三条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第四次发送的第一控制信号后,从所述外部存储器读取第四条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第五次发送的第一控制信号后,从所述外部存储器读取第五条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;之后每个所述时钟周期接收到所述电源控制单元发送的第一控制信号后,从所述外部存储器读取第五条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
本实施例确定处理器的处理单元的初始等待周期数N1和等待周期递减数N2;在所述处理器启动工作时,所述处理器的电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号;所述处理器的至少一个处理单元收到所述第一控制信号后,从外部存储器读取待处理数据,将读取的待处理数据缓存到输入缓存器,将缓存的待处理数据从所述输入缓存器传送给运算器进行运算,将运算结果存入输出缓存器。
在一具体实施例中,所述电源控制单元包括第一控制寄存器、第二控制寄存器和控制信号产生电路,所述第一控制寄存器存储所述初始等待周期数,所述第二控制寄存器存储所述等待周期递减数,所述控制信号产生电路根据所述第一控制寄存器和所述第二控制寄存器存储的数据输出所述第一控制信号。
现有的处理器在工作启动时,会将外部存储器的待处理数据在每个时钟周期送到运算器进行运算,导致处理器的电流需求在纳秒级别产生很大的变化,这个变化会严重影响电源的电压的稳定性,产生大的纹波,严重影响处理器工作的稳定性。尤其是一颗芯片上多个处理器并行工作时,影响就更严重。
而本实施在处理器启动工作时控制运算器进行运算的频率来控制所述电源的电流剧烈上升,使处理器启动工作时的功耗需求变成阶梯状,从而有效降低处理器启动工作时的电源纹波,提升处理器的稳定性。
图4是图3中确定处理单元的初始等待周期数N1和等待周期递减数N2的细化流程图。
参阅图4所示,确定处理单元的初始等待周期数N1和等待周期递减数N2包括以下步骤:
401,获取所述处理器在极限工作场景产生的纹波电压。
可以通过仿真工具估算所述处理器在极限工作场景产生的纹波电压。
例如,可以通过仿真工具PTPX(PrimeTime PX)估算所述处理器在极限工作场景产生的纹波电压。PTPX是基于primetime环境,对全芯片进行静态和动态功耗分析的工具。
在一个具体实例中,参阅图2所示,所述处理器在极限工作场景瞬态输出电流从0A变化到6A时,产生的纹波电压约为+50mV/-50mV。
402,根据所述处理器在极限工作场景产生的纹波电压和所述处理器允许的纹波电压确定所述处理器的电流变化的阶梯数。
例如,所述处理器在极限工作场景产生的纹波电压为+50mV/-50mV,所述处理器允许的纹波电压为+20mV/-20mV,则所述处理器的电流变化的阶梯数为3(即50mV/20mV向上取整)。
403,根据所述电源的开关周期和所述处理器的时钟周期确定所述等待周期递减数N2。
在本实施例中,所述等待周期递减数与所述电源的开关周期成正比,与所述处理器的时钟周期成反比。
在一具体实施例中,所述等待周期递减数为(T1*n/T2),其中T1为所述电源的开关周期,T2为所述处理器的时钟周期,n为大于1的正整数。n可以取大于等于10且小于等于101的正整数。例如,n可以取20。
T1*n表示所述处理器的电流的每个阶梯的长度。例如,所述电源的开关周期是1010ns,n取20,则所述处理器的电流的每个阶梯的长度为20000ns。假设处理器的时钟周期是2ns,则等待周期递减数为20000 ns /2ns=10100。
404,根据所述阶梯数和所述等待周期递减数N2计算所述初始等待周期数N1。
在本实施例中,所述初始等待周期数N1为所述阶梯数和所述等待周期递减数N2的乘积。
图5是本发明另一实施例提供的降低电源纹波的方法的流程图。
如图5所示,所述降低电源纹波的方法具体包括以下步骤:
501,控制器确定所述处理单元的初始等待周期数N1和等待周期递减数N2。
所述处理单元的初始等待周期数N1和等待周期递减数N2可以根据经验值进行设置。例如,可以建立不同的处理器与初始等待周期数N1和等待周期递减数N2的对应关系表,根据所述对应关系表确定所述处理器对应的初始等待周期数N1和等待周期递减数N2。
或者,可以根据图4描述的方法确定所述处理单元的初始等待周期数N1和等待周期递减数N2。
502,在所述处理器启动工作时,所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号。
例如,所述初始等待周期数N1为1010,所述等待周期递减数N2为200,所述电源控制单元等待1010个时钟周期后第一次发送所述第一控制信号,等待800个时钟周期后第二次发送所述第一控制信号,等待600个时钟周期后第三次发送所述第一控制信号,等待400个时钟周期后第四次发送所述第一控制信号,等待200个时钟周期后第五次发送所述第一控制信号,之后每个所述时钟周期发送所述第一控制信号。
503,所述至少一个处理单元收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
所述处理单元每次收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。例如,所述处理单元收到所述电源控制单元第一次发送的第一控制信号后,从所述外部存储器读取第一条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第二次发送的第一控制信号后,从所述外部存储器读取第二条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第三次发送的第一控制信号后,从所述外部存储器读取第三条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第四次发送的第一控制信号后,从所述外部存储器读取第四条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;所述处理单元收到所述电源控制单元第五次发送的第一控制信号后,从所述外部存储器读取第五条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;之后每个所述时钟周期接收到所述电源控制单元发送的第一控制信号后,从所述外部存储器读取第五条待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
504,若所述外部存储器剩余的待处理数据的数量小于或等于预设值,所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第二控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第二控制信号的等待时间为N2个所述时钟周期,后续每次发送所述第二控制信号的等待时间递增N2个所述时钟周期,若所述等待时间递增至大于或者等于N1,则每等待N1个所述时钟周期发送所述第二控制信号,直至所述外部存储器中的待处理数据运算完毕。
例如,所述初始等待周期数N1为1010,所述等待周期递减数N2为200,若所述外部存储器剩余的待处理数据的条数小于或等于10,所述电源控制单元等待200个时钟周期后第一次发送所述第二控制信号,等待400个时钟周期后第二次发送所述第二控制信号,等待600个时钟周期后第三次发送所述第二控制信号,等待800个时钟周期后第四次发送所述第二控制信号,等待1010个时钟周期后第五次发送所述第二控制信号,之后每等待1010个时钟周期后发送所述第二控制信号,直至所述外部存储器中的待处理数据运算完毕。
505,所述至少一个处理单元收到所述第二控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
所述处理单元每次收到所述第二控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
实施例二的降低电源纹波的方法确定所述处理单元的初始等待周期数N1和等待周期递减数N2;在所述处理器启动工作时,所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号;所述至少一个处理单元收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器;若所述外部存储器剩余的待处理数据的数量小于或等于预设值,所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第二控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第二控制信号的等待时间为N2个所述时钟周期,后续每次发送所述第二控制信号的等待时间递增N2个所述时钟周期,若所述等待时间递增至大于或者等于N1,则每等待N1个所述时钟周期发送所述第二控制信号,直至所述外部存储器中的待处理数据运算完毕;所述至少一个处理单元收到所述第二控制信号后,从所述外部存储器读取待处理数据,将读取的数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
实施例二的降低电源纹波的方法不仅在处理器启动工作时控制运算器进行运算的频率,还在处理器结束工作时控制运算器进行运算的频率,来控制所述电源的电流的剧烈上升和下降,使处理器启动工作和结束工作时的功耗需求变成阶梯状,从而有效降低电源启动工作和结束工作时的纹波,提升处理器的稳定性。
在一具体实施例中,所述电源控制单元包括第一控制寄存器、第二控制寄存器和控制信号产生电路,所述第一控制寄存器存储所述初始等待周期数,所述第二控制寄存器存储所述等待周期递减数,所述控制信号产生电路根据所述第一控制寄存器和所述第二控制寄存器存储的数据输出所述第一控制信号和所述第二控制信号。
图6是本发明实施例提供的计算机装置的示意图。
在本实施例中,所述计算机装置6包括处理器60、存储器61以及至少一条通信总线62。所述处理器60可以是图1中的处理器10,实现上述降低电源纹波的方法实施例中的步骤,例如图3所示的步骤301-303或图5中的步骤501-505。
所述计算机装置6可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图6仅仅是计算机装置6的示例,并不构成对计算机装置6的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机装置6还可以包括输入输出设备、网络接入设备、总线等。
所称处理器60可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器 (Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现场可编程门阵列 (Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器60也可以是任何常规的处理器等,所述处理器60是所述计算机装置6的控制中心,利用各种接口和线路连接整个计算机装置6的各个部分。
所述存储器61可用于存储计算机程序和/或模块/单元,所述处理器60通过运行或执行存储在所述存储器61内的计算机程序和/或模块/单元,以及调用存储在存储器61内的数据,实现所述计算机装置6的各种功能。所述存储器61可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据计算机装置6的使用所创建的数据等。此外,存储器61可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述计算机装置6集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
在本发明所提供的几个实施例中,应该理解到,所揭露的计算机装置和方法,可以通过其它的方式实现。例如,以上所描述的计算机装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
另外,在本发明各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
本发明实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例提供的图像检索方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。

Claims (10)

  1. 一种处理器,所述处理器包括控制器、至少一个处理单元,所述至少一个处理单元包括输入缓存器、运算器和输出缓存器,所述处理器连接电源和外部存储器,其特征在于,所述控制器用于确定所述处理单元的初始等待周期数N1和等待周期递减数N2,所述处理器还包括电源控制单元,用于:
    在所述处理器启动工作时,按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号;
    所述至少一个处理单元收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
  2. 如权利要求1所述的处理器,其特征在于,所述确定所述处理单元的初始等待周期数N1和等待周期递减数N2包括:
    获取所述处理器在极限工作场景产生的纹波电压;
    根据所述处理器在极限工作场景产生的纹波电压和所述处理器允许的纹波电压确定所述处理器的电流变化的阶梯数;
    根据所述电源的开关周期和所述处理器的时钟周期确定所述等待周期递减数N2;
    根据所述阶梯数和所述等待周期递减数N2计算所述初始等待周期数N1。
  3. 如权利要求1所述的处理器,其特征在于,所述电源控制单元包括第一控制寄存器、第二控制寄存器和控制信号产生电路,所述第一控制寄存器存储所述初始等待周期数,所述第二控制寄存器存储所述等待周期递减数,所述控制信号产生电路根据所述第一控制寄存器和所述第二控制寄存器存储的数据输出所述第一控制信号。
  4. 如权利要求1至3中任一项所述的处理器,其特征在于,所述电源控制单元还用于:
    若所述外部存储器剩余的待处理数据的数量小于或等于预设值,按照所述初始等待周期数N1和所述等待周期递减数N2发送第二控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第二控制信号的等待时间为N2个所述时钟周期,后续每次发送所述第二控制信号的等待时间递增N2个所述时钟周期,若所述等待时间递增至大于或者等于N1,则每等待N1个所述时钟周期发送所述第二控制信号,直至所述外部存储器中的待处理数据运算完毕;
    所述至少一个处理单元还用于:
    收到所述第二控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
  5. 一种降低电源纹波的方法,应用于处理器,所述处理器包括控制器、电源控制单元和至少一个处理单元,所述至少一个处理单元包括输入缓存器、运算器和输出缓存器,所述处理器连接电源和外部存储器,其特征在于,所述处理器还包括电源控制单元,所述方法包括:
    确定所述处理单元的初始等待周期数N1和等待周期递减数N2;
    在所述处理器启动工作时,所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第一控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第一控制信号的等待时间为N1个所述处理器的时钟周期,后续每次发送所述第一控制信号的等待时间递减N2个所述时钟周期,若所述等待时间递减至小于或者等于0,则每个所述时钟周期发送所述第一控制信号;
    所述至少一个处理单元收到所述第一控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
  6. 如权利要求5所述的方法,其特征在于,所述确定所述处理单元的初始等待周期数N1和等待周期递减数N2包括:
    获取所述处理器在极限工作场景产生的纹波电压;
    根据所述处理器在极限工作场景产生的纹波电压和所述处理器允许的纹波电压确定所述处理器的电流变化的阶梯数;
    根据所述电源的开关周期和所述处理器的时钟周期确定所述等待周期递减数N2;
    根据所述阶梯数和所述等待周期递减数N2计算所述初始等待周期数N1。
  7. 如权利要求5所述的方法,其特征在于,所述等待周期递减数与所述电源的开关周期成正比,与所述处理器的时钟周期成反比。
  8. 如权利要求7所述的方法,其特征在于,所述等待周期递减数为(T1*n/T2),其中T1为所述电源的开关周期,T2为所述处理器的时钟周期,n为大于1的正整数。
  9. 如权利要求5所述的方法,其特征在于,所述电源控制单元包括第一控制寄存器、第二控制寄存器和控制信号产生电路,所述第一控制寄存器存储所述初始等待周期数,所述第二控制寄存器存储所述等待周期递减数,所述控制信号产生电路根据所述第一控制寄存器和所述第二控制寄存器存储的数据输出所述第一控制信号。
  10. 如权利要求5至9中任一项所述的方法,其特征在于,若所述外部存储器剩余的待处理数据的数量小于或等于预设值,所述方法还包括:
    所述电源控制单元按照所述初始等待周期数N1和所述等待周期递减数N2发送第二控制信号给所述至少一个处理单元,所述电源控制单元第一次发送所述第二控制信号的等待时间为N2个所述时钟周期,后续每次发送所述第二控制信号的等待时间递增N2个所述时钟周期,若所述等待时间递增至大于或者等于N1,则每等待N1个所述时钟周期发送所述第二控制信号,直至所述外部存储器中的待处理数据运算完毕;
    所述至少一个处理单元收到所述第二控制信号后,从所述外部存储器读取待处理数据,将读取的待处理数据缓存到所述输入缓存器,将缓存的待处理数据从所述输入缓存器传送给所述运算器进行运算,将运算结果存入所述输出缓存器。
PCT/CN2020/108984 2019-12-10 2020-08-13 处理器及降低电源纹波的方法 WO2021114701A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/623,603 US20220206554A1 (en) 2019-12-10 2020-08-13 Processor and power supply ripple reduction method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911261783.8 2019-12-10
CN201911261783.8A CN111026258B (zh) 2019-12-10 2019-12-10 处理器及降低电源纹波的方法

Publications (1)

Publication Number Publication Date
WO2021114701A1 true WO2021114701A1 (zh) 2021-06-17

Family

ID=70208670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/108984 WO2021114701A1 (zh) 2019-12-10 2020-08-13 处理器及降低电源纹波的方法

Country Status (3)

Country Link
US (1) US20220206554A1 (zh)
CN (1) CN111026258B (zh)
WO (1) WO2021114701A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026258B (zh) * 2019-12-10 2020-12-15 深圳云天励飞技术有限公司 处理器及降低电源纹波的方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198414A1 (en) * 2013-01-11 2014-07-17 Qualcomm Incorporated Electrostatic discharge clamp with disable
CN106936307A (zh) * 2015-12-29 2017-07-07 德克萨斯仪器股份有限公司 用于具有改进的瞬态响应的低备用电流dc‑dc电源控制器的方法和设备
CN108092503A (zh) * 2018-01-18 2018-05-29 上海贝岭股份有限公司 电荷泵电路
CN109300493A (zh) * 2017-07-25 2019-02-01 三星电子株式会社 纹波补偿器、数据驱动电路及半导体器件
CN111026258A (zh) * 2019-12-10 2020-04-17 深圳云天励飞技术有限公司 处理器及降低电源纹波的方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001034530A (ja) * 1999-07-16 2001-02-09 Mitsubishi Electric Corp マイクロコンピュータおよびメモリアクセス制御方法
DE102005013237B4 (de) * 2005-03-22 2014-11-27 Qimonda Ag Einrichtung in einer Speicherschaltung zum Festlegen von Wartezeiten
CN100459392C (zh) * 2007-04-28 2009-02-04 电子科技大学 具有电压纹波检测电路的稳流开关电源
US20100284284A1 (en) * 2009-05-08 2010-11-11 Qualcomm Incorporated VOICE OVER INTERNET PROTOCOL (VoIP) ACCESS TERMINAL
US8918666B2 (en) * 2011-05-23 2014-12-23 Intel Mobile Communications GmbH Apparatus for synchronizing a data handover between a first and second clock domain through FIFO buffering
US9224442B2 (en) * 2013-03-15 2015-12-29 Qualcomm Incorporated System and method to dynamically determine a timing parameter of a memory device
US20160093345A1 (en) * 2014-09-26 2016-03-31 Qualcomm Incorporated Dynamic random access memory timing adjustments
US9703313B2 (en) * 2014-10-20 2017-07-11 Ambiq Micro, Inc. Peripheral clock management

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198414A1 (en) * 2013-01-11 2014-07-17 Qualcomm Incorporated Electrostatic discharge clamp with disable
CN106936307A (zh) * 2015-12-29 2017-07-07 德克萨斯仪器股份有限公司 用于具有改进的瞬态响应的低备用电流dc‑dc电源控制器的方法和设备
CN109300493A (zh) * 2017-07-25 2019-02-01 三星电子株式会社 纹波补偿器、数据驱动电路及半导体器件
CN108092503A (zh) * 2018-01-18 2018-05-29 上海贝岭股份有限公司 电荷泵电路
CN111026258A (zh) * 2019-12-10 2020-04-17 深圳云天励飞技术有限公司 处理器及降低电源纹波的方法

Also Published As

Publication number Publication date
US20220206554A1 (en) 2022-06-30
CN111026258B (zh) 2020-12-15
CN111026258A (zh) 2020-04-17

Similar Documents

Publication Publication Date Title
WO2018107934A1 (zh) 数据处理方法、装置及电子设备
US9285860B2 (en) Apparatus and methods employing variable clock gating hysteresis for a communications port
WO2022105805A1 (zh) 数据的处理方法及存算一体芯片
US9158359B2 (en) Adaptive voltage scaling using a serial interface
CN112671232B (zh) Llc谐振电路的控制方法、装置及终端设备
EP3561739A1 (en) Data accelerated processing system
JP7110278B2 (ja) 負荷を均衡化するための方法、装置、機器、コンピュータ読み取り可能な記憶媒体及びコンピュータプログラム
WO2021163866A1 (zh) 神经网络权重矩阵调整方法、写入控制方法以及相关装置
WO2021114701A1 (zh) 处理器及降低电源纹波的方法
CN116610607A (zh) 一种人工智能模型的训练方法、装置、设备及介质
US11809263B2 (en) Electronic circuit for controlling power
CN112477621B (zh) 一种电动车扭矩控制方法、设备及计算机可读存储介质
CN114065676A (zh) 标准单元库的形成方法及相关装置
CN110413561A (zh) 数据加速处理系统
CN115860080B (zh) 计算核、加速器、计算方法、装置、设备、介质及系统
WO2023284130A1 (zh) 用于卷积计算的芯片及其控制方法、电子装置
WO2023082531A1 (zh) Avs调节系统、方法、装置、设备及存储介质
Li et al. Towards power efficient high performance packet I/O
US20210064444A1 (en) Proactive management of inter-gpu network links
US9292295B2 (en) Voltage droop reduction by delayed back-propagation of pipeline ready signal
Yang et al. Toward low-bit neural network training accelerator by dynamic group accumulation
WO2021169914A1 (zh) 数据量化处理方法、装置、电子设备和存储介质
CN115617717B (zh) 一种基于忆阻器的协处理器设计方法
WO2021214944A1 (ja) 構成変換装置、構成変換方法、および構成変換プログラム
US20230168891A1 (en) In-memory computing processor, processing system, processing apparatus, deployment method of algorithm model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900448

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20900448

Country of ref document: EP

Kind code of ref document: A1