CN113885691A - Chip power consumption adjustment method, device and chip system, and neural network training method and device - Google Patents

Chip power consumption adjustment method, device and chip system, and neural network training method and device Download PDF

Info

Publication number
CN113885691A
CN113885691A CN202111157365.1A CN202111157365A CN113885691A CN 113885691 A CN113885691 A CN 113885691A CN 202111157365 A CN202111157365 A CN 202111157365A CN 113885691 A CN113885691 A CN 113885691A
Authority
CN
China
Prior art keywords
chip
power consumption
chip system
redundant
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111157365.1A
Other languages
Chinese (zh)
Inventor
王勇
丁雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shangtangqian Technology Co ltd
Original Assignee
Shanghai Shangtangqian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shangtangqian Technology Co ltd filed Critical Shanghai Shangtangqian Technology Co ltd
Priority to CN202111157365.1A priority Critical patent/CN113885691A/en
Publication of CN113885691A publication Critical patent/CN113885691A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3237Power saving characterised by the action undertaken by disabling clock generation or distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3296Power saving characterised by the action undertaken by lowering the supply or operating voltage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Abstract

The disclosure provides a chip power consumption adjustment method, a neural network training method, a device and a chip system, wherein the method comprises the following steps: acquiring chip test information of the chip system; the chip test information is used for indicating the running state of each processing core contained in each chip; determining power consumption to be allocated to the redundant element by the chip system under the condition that the chip system is determined to have the redundant element based on the chip test information; the redundant element includes: redundant chips and/or redundant processing cores; adjusting the working parameters of a target element in the chip system based on the power consumption to be distributed so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.

Description

Chip power consumption adjustment method, device and chip system, and neural network training method and device
Technical Field
The disclosure relates to the technical field of chips, in particular to a method, a device and a chip system for chip power consumption adjustment and neural network training.
Background
In the field of chip design, in order to improve the yield of chips, a qualification rule of chips may be set, which may indicate an upper limit value of a processing core included in a chip that is qualified and cannot normally operate. However, this approach reduces the performance of the chip system while increasing the chip yield. For example, if the upper limit value of a processing core that cannot normally operate is one, the chip can only operate with the corresponding performance of the remaining processing cores although the yield of the chip can be improved in the prior art. Particularly, a chip having a high requirement for the operating frequency of the chip, for example, an AI (Artificial Intelligence) training chip. Due to the high computational power requirement on the AI training chip, the design area of the chip system is increased, the line width is reduced, the cost of the chip system design is increased, and the qualification rate of the chip system is also affected. When the qualified rate of the chips is improved by adopting the qualified rules, the loss of power consumption can be caused, so that the performance of the AI training chip is influenced, and further the working efficiency of the AI training chip is influenced.
Disclosure of Invention
The embodiment of the disclosure at least provides a chip power consumption adjusting method, a neural network training method, a device and a chip system.
In a first aspect, an embodiment of the present disclosure provides a method for adjusting chip power consumption, where the method is applied to a chip system including at least one chip, where each chip includes at least one processing core, and the method includes: acquiring chip test information of the chip system; the chip test information is used for indicating the running state of each processing core contained in each chip; determining power consumption to be allocated to the redundant element by the chip system under the condition that the chip system is determined to have the redundant element based on the chip test information; the redundant element includes: redundant chips and/or redundant processing cores; adjusting the working parameters of a target element in the chip system based on the power consumption to be distributed so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.
As can be seen from the above description, in the embodiment of the present disclosure, by obtaining chip test information of a chip system, a redundant element included in the chip system may be determined, and power consumption to be allocated to the redundant element in the chip system may also be determined, so as to adjust working parameters of some or all elements (that is, target elements) except the redundant element in the chip system by using the power consumption to be allocated, thereby adjusting the power consumption of the target element in the chip system. By adjusting the working parameters of the target element according to the power consumption to be allocated of the redundant element, the utilization rate of the power consumption in the chip system can be improved, so that the loss of the power consumption in the chip system is reduced, and meanwhile, the performance of the chip system can be improved.
In an optional embodiment, the obtaining chip test information of the chip system includes at least one of: reading the chip test information in a memory of the chip system in the process of detecting that the chip system executes power-on operation each time; under the condition that the current moment is detected to reach the preset detection moment of the chip system, reading the chip test information in a memory of the chip system; reading the chip test information in a memory of the chip system if a modify operation for the memory of the chip system is detected.
In the embodiment of the present disclosure, by setting the trigger condition for obtaining the chip test information, the change of the working state of the element in the chip system can be detected in real time by periodically reading the chip test information, so as to find out the failed element in time, and further, the power consumption of the processing core of each chip in the chip system and the allocation of the task can be adjusted in time, so that the chip system continuously keeps high-efficiency operation.
In an optional embodiment, the adjusting an operating parameter of a target element in the system on chip based on the power consumption to be allocated includes: acquiring the element priority of a first remaining element in the chip system; the first remaining element is a remaining chip and/or a remaining processing core in the chip system except the redundant element; determining the target element in the first remaining element based on the element priority, and adjusting an operating parameter of the target element based on the power consumption to be allocated, wherein the operating parameter includes: an operating voltage and/or an operating frequency.
In the embodiment of the present disclosure, by obtaining the element priority of the first remaining element in the chip system and determining the target element in the first remaining element based on the element priority, effective utilization of power consumption to be allocated in the first remaining element can be achieved, and more power consumption is preferentially allocated to an element with a higher element priority, so that the task completion progress of the chip system is accelerated, and the performance of the chip system is improved.
In an optional embodiment, the adjusting an operating parameter of a target element in the system on chip based on the power consumption to be allocated includes: acquiring power consumption constraint information of the target element; and adjusting the working parameters of the target element based on the power consumption constraint information and the power consumption to be distributed.
In the above embodiment, power consumption constraint information of the target element is used to allocate power consumption for the target element in power consumption to be allocated, so that performance loss of the chip system can be reduced, performance of the chip system can be improved, and efficient operation of the chip system can be ensured continuously.
In an optional embodiment, the adjusting the operating parameter of the target element based on the power consumption constraint information and the power consumption to be allocated includes: acquiring an adjustment range of preset working parameters of the target element in the chip system; the adjustment range is used for indicating the adjustment range of the working parameters of each chip and each processing core in the chip system in a normal working state; and in the adjusting range, adjusting the working parameters of the target element based on the power consumption constraint information so as to allocate corresponding power consumption to the target element in the power consumption to be allocated.
In an optional embodiment, the adjustment range is determined based on a first range and a second range, the first range is a parameter range of an operating parameter of the target element in a normal operating state determined based on a standard operating parameter of the target element, the second range is a parameter range of an operating parameter of the target element in a normal operating state determined based on an adjusted operating parameter of the target element, and the adjusted operating parameter is a parameter after the operating parameter is adjusted based on a preset adjustment value.
In the embodiment of the disclosure, the adjustment ranges of the working voltage and the working frequency are set for the chip system in advance, and then the chip system is subjected to time sequence convergence based on the adjustment ranges, so that the chip system can work in the adjustment ranges, and the adjustment ranges of the working parameters of each element in the chip system are expanded. On the basis, when the working parameters of the target element are adjusted based on the adjustment range and the power consumption constraint information, the working performance of the target element can be improved while the target element is ensured to work normally, so that power consumption resources in the chip system are fully utilized, the performance of the chip system is improved to the maximum extent, and the chip system can better complete a training task.
In an optional embodiment, the method further comprises: and generating a clock closing instruction to close a clock circuit corresponding to the redundant element in the chip system through the clock closing instruction under the condition that the chip system is determined to have the redundant element based on the chip test information.
In the embodiment of the present disclosure, by generating the clock closing instruction, the clock circuit corresponding to the redundant element may be closed, so as to save all power consumption of the redundant element.
In an optional embodiment, the method further comprises: in the working process of the chip system, checking abnormal working elements in the chip system according to a preset checking period, wherein the abnormal working elements comprise chips and/or processing cores in an abnormal working state; and in the case that the abnormal working element is detected, updating the chip test information and adjusting the working parameters of a second remaining element in the chip system except the redundant element and the abnormal working element.
Through the processing mode, the abnormal working elements in the chip system are detected periodically, so that the elements in abnormal operation can be found in time, the power consumption of the elements in normal operation in the chip system can be adjusted in time under the condition of ensuring the normal execution of the training task, the performance loss of the chip system is reduced, the performance of the normal working elements is improved, and the corresponding training task can be completed better by the chip system.
In a second aspect, an embodiment of the present disclosure further provides a neural network training method, including: acquiring a target training task; determining a training element for the target training task in the remaining elements except the redundant element in the chip system, so as to execute the target training task through the training element.
By the processing mode, in the using process of the chip system, the target training task is distributed to the rest elements except the redundant elements in the chip system to be executed, so that the normal running of the target training task can be ensured, and the processing efficiency and the processing quality of the chip system are improved.
In a third aspect, an embodiment of the present disclosure further provides a device for adjusting chip power consumption, including: the first acquisition unit is used for acquiring chip test information of the chip system; the chip test information is used for indicating the running state of each processing core contained in each chip; a first determination unit, configured to determine power consumption to be allocated to the redundant element by the chip system, if it is determined that the chip system has the redundant element based on the chip test information; the redundant element includes: redundant chips and/or redundant processing cores; the adjusting unit is used for adjusting the working parameters of a target element in the chip system based on the power consumption to be distributed so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.
In a fourth aspect, an embodiment of the present disclosure further provides a neural network training device, including: the second acquisition unit is used for acquiring a target training task aiming at the neural network to be trained; a second determining unit, configured to determine, in remaining elements in a chip system except for redundant elements, a training element for the target training task, so as to execute the target training task through the training element; wherein the remaining elements are elements after power consumption adjustment by the chip power consumption adjustment method according to any one of the first aspect.
In a fifth aspect, an embodiment of the present disclosure further provides a chip system, including: at least one chip and a controller, each chip comprising at least one processing core; the controller is configured to adjust power consumption of a target element in the chip, except for a redundant element, by the chip power consumption adjustment method according to any one of the first aspect, where the redundant element includes: redundant chips and/or redundant processing cores.
In a sixth aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any one of the possible implementations of the first aspect, and performing the steps of the second aspect described above.
In a seventh aspect, this disclosed embodiment also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program, when executed by a processor, performs the steps in the first aspect, or any one of the possible implementation manners of the first aspect, and performs the steps in the second aspect.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 illustrates a flowchart of a method for adjusting chip power consumption according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a chip system including 8 chips according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating effects of a first range and a second range in a chip system provided by an embodiment of the disclosure;
FIG. 4 illustrates a flow chart of a neural network training method provided by an embodiment of the present disclosure;
fig. 5 is a schematic diagram illustrating an apparatus for adjusting chip power consumption according to an embodiment of the disclosure;
FIG. 6 is a schematic diagram of a neural network training device provided by an embodiment of the present disclosure;
fig. 7 illustrates a schematic diagram of a chip system provided by an embodiment of the disclosure;
fig. 8 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure;
fig. 9 shows a schematic diagram of another electronic device provided by an embodiment of the disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Research shows that in the field of chip design, in order to improve the qualification rate of a chip, a qualification rule of the chip may be set, and the qualification rule may indicate an upper limit value of a processing core which cannot normally operate and is included in the qualified chip. However, this method brings about a loss of power consumption while improving the yield of chips.
Based on the research, the disclosure provides a chip power consumption adjustment method, a neural network training method, a device and a chip system. As can be seen from the above description, in the embodiment of the present disclosure, by obtaining chip test information of a chip system, a redundant element included in the chip system may be determined, and power consumption to be allocated to the redundant element in the chip system may also be determined, so as to adjust working parameters of some or all elements (that is, target elements) except the redundant element in the chip system by using the power consumption to be allocated, thereby adjusting the power consumption of the target element in the chip system. By adjusting the working parameters of the target element according to the power consumption to be allocated of the redundant element, the utilization rate of the power consumption in the chip system can be improved, so that the loss of the power consumption in the chip system is reduced, and meanwhile, the performance of the chip system can be improved.
In order to facilitate understanding of the present embodiment, a detailed description is first given of a chip power consumption adjustment method disclosed in the embodiments of the present disclosure, and an execution subject of the chip power consumption adjustment method provided in the embodiments of the present disclosure is generally an electronic device with certain computing capability.
Referring to fig. 1, a flowchart of a method for adjusting chip power consumption provided in the embodiment of the present disclosure is shown, where the method includes steps S101 to S105, where:
s101: acquiring chip test information of the chip system; the chip test information is used to indicate an operation state of each processing core included in each chip.
In the embodiment of the present disclosure, the chip system may be any type of chip system, and may also be a chip system having specific requirements on the working frequency and/or the working voltage of the chip, for example, an AI training chip, where the AI training chip is mainly used for an AI training task, where the AI training task may be understood as a training task of a neural network model. In this regard, the present disclosure does not specifically limit the type of the chip system to enable implementation.
The chip system comprises at least one chip, and each chip comprises at least one processing core. The chip test information of the chip system may be information indicating an operation state of each processing core included in the chip. The chip test information is recorded in a register of the chip system.
Suppose that 2 chips, respectively chip 1 and chip 2, are included in a chip system, and each chip includes 3 processing cores, respectively core1, core2, and core3, at this time, the chip test information of the chip system may be "chip 1, normal operation", "chip 1, core1, normal operation", "chip 1, core2, normal operation", "chip 1, core3, abnormal operation", "chip 2, normal operation", "chip 2, core1, normal operation", "chip 2, core2, normal operation", "chip 2, core3, normal operation".
In the embodiment of the present disclosure, the operation state included in the chip test information is not fixed, and the information in the chip test information may be set to change with the change of the operation state of the processing core of each chip in the chip system.
S103: determining power consumption to be allocated to the redundant element by the chip system under the condition that the chip system is determined to have the redundant element based on the chip test information; the redundant element includes: redundant chips and/or redundant processing cores.
Here, the redundant element is a chip and/or a processing core that is determined to be unable to work normally in the chip system based on a preset redundancy rule in a design stage of the chip system. The number of redundant elements is associated with a redundancy rule, that is, an upper limit value of the redundant elements allowed to be included in each system on chip is preset in the redundancy rule.
In particular, the redundancy rule can be understood as: the number of failed chips in each chipset does not exceed N, and/or the number of failed processing cores in the chips of each chipset does not exceed M, where N may be set to 1, and M may also be set to 1.
For example, a system of chips includes 8 chips, each chip includes 8 processing cores, and then the system of chips includes 64 processing cores. At this time, the redundancy rule may be determined as: the number of failing chips in the system of chips cannot exceed one and/or the number of failing processing cores cannot exceed one for each chip.
In the embodiment of the present disclosure, if the number of processing cores in a chip that have a failure is greater than the number specified in the redundancy rule, it is determined that the chip is a failed chip or a redundant chip. At this time, the chip cannot normally complete the task assigned by the chip system. If the number of failed processing cores in a chip is less than the number (e.g., 1) specified in the redundancy rule, it is determined that the chip is not a redundant chip, and at this time, the chip can normally complete the tasks allocated by the chip system, and "1 failed processing core" included in the chip is a redundant processing core. If a failed processing core is not included in a chip, the chip is not a redundant chip.
In the embodiment of the present disclosure, the power consumption to be allocated may be understood as power consumption allocated to the redundant element in advance in a design stage of the chip system. If the number of the redundant elements is multiple, the power consumption to be allocated is the sum of the power consumptions of the multiple redundant elements.
Illustratively, assume that 2 chips are included in one chip system, and each chip includes 3 processing cores, in this case, the elements included in the chip system are chip 1, chip 2, chip 1-core1, chip 1-core2, chip 1-core3, chip 2-core1, chip 2-core2, and chip 2-core 3. When the redundant element included in the chipset system is the chip 2-core1, the power consumption to be allocated to the chipset system is the power consumption allocated to the chip 2-core1 in advance.
S105: adjusting the working parameters of a target element in the chip system based on the power consumption to be distributed so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.
Here, the target element in the chip system is used to characterize elements included in the chip system except for the redundant element, and in this case, the number of the target elements may be 1 or more.
Taking the chip system including the chip 1 and the chip 2 as an example, assuming that the redundant element is "chip 2-core 1", in this case, the target element in the chip system may be part or all of the elements in the chip 1, the chip 2-core2, and the chip 2-core 3.
As can be seen from the above description, in the embodiment of the present disclosure, by obtaining chip test information of a chip system, a redundant element included in the chip system may be determined, and power consumption to be allocated to the redundant element in the chip system may also be determined, so as to adjust working parameters of some or all elements (that is, target elements) except the redundant element in the chip system by using the power consumption to be allocated, thereby adjusting the power consumption of the target element in the chip system. By adjusting the working parameters of the target element according to the power consumption to be allocated of the redundant element, the utilization rate of the power consumption in the chip system can be improved, so that the loss of the power consumption in the chip system is reduced, and meanwhile, the performance of the chip system can be improved.
In an optional embodiment, for S101, the obtaining chip test information of the chip system specifically may be implemented in several ways, including:
the first method is as follows:
and reading the chip test information in a memory of the chip system in the process of detecting that the chip system executes the power-on operation each time.
In the embodiment of the present disclosure, the power-on operation of the chip system may be detected by a detection unit built in the chip system. After the detection Unit detects the power-on operation of the chip system, an MCU (micro controller Unit) built in the chip system may read the chip test information from a memory of the chip system during each power-on process.
Here, the chip test information stored in the memory may be chip test information burned into an eFuse of a corresponding chip in the chip system at a test stage of the chip system. As is apparent from the above description, the chip test information is used to indicate the operating state of each processing core included in each chip.
In particular, the operational status of the processing cores in each chip of the chip system may be indicated by recording information such as "Good Die" and "Partial Good Die" in the eFuses. When the information is "Good Die", it is characterized that the processing core in an abnormal operation state (for example, a fault operation state) is not included in the chip system; when the information is "Partial Good Die", it is characterized that the processing core in the chip system is included in an abnormal operation state (for example, a fault operation state). Meanwhile, the information "Partial Good Die" may also indicate the relevant information of the processing core in the abnormal operation state.
The second method comprises the following steps:
and reading the chip test information in a memory of the chip system under the condition that the current moment is detected to reach the preset detection moment of the chip system.
For the second mode, whether the current time reaches the preset detection time of the chip system is detected, which specifically includes:
judging whether the current moment meets the detection period of the chip system; and when the judgment result is satisfied, determining that the current moment reaches the preset detection moment.
That is, it is detected whether the current time reaches a detection period since the chip test information was read from the memory last time. When a detection period is reached and a preset detection time is reached, at this time, the chip test information in the memory of the chip system needs to be read again.
For example, the detection period of the chip system may be 5 hours, and it is assumed that the last time the chip system reads the chip test information is 12 months, 12 days, 8 minutes, 12 seconds in 2020, 12 months, 12 days, 12 minutes, 12 seconds. When the continuous working time of the chip system reaches the detection period of the chip system, namely 5 hours, at this time, the time reaches 12 months, 12 days, 12 minutes and 12 seconds at 13 days of 2020, 12 months and 12 days, at this time, the current time reaches the predicted detection time, and at this time, the chip test information in the memory needs to be read again.
The third method comprises the following steps: a combination of the first and second aspects.
Reading the chip test information in a memory of the chip system in the process of detecting that the chip system executes power-on operation each time; and reading the chip test information in a memory of the chip system under the condition that the current moment is detected to reach the preset detection moment of the chip system.
In the embodiment of the present disclosure, the power-on operation of the chip system may be detected by a detection unit built in the chip system. After the detection unit detects the power-on operation of the chip system, the MCU arranged in the chip system can read the chip test information from the memory of the chip system in each power-on process. After the chip system is powered on, whether the current time reaches the preset detection time or not can be detected through the first embodiment and the second embodiment described in the second embodiment, and the operation of acquiring the chip test information is executed.
For example, the detection period of the chip system may be 5 hours, and it is assumed that the chip system is powered on for the first time in 12 months, 12 days, 8 hours, 12 minutes and 12 seconds in 2020, and the chip test information in the chip system is read at this time, and then the chip system starts to operate. When the continuous working time of the chip system reaches the detection period of the chip system, namely 5 hours, at this time, the time reaches 12 months, 12 days, 12 minutes and 12 seconds at 13 days of 2020, 12 months and 12 days, at this time, the current time reaches the predicted detection time, and at this time, the chip test information in the memory needs to be read again.
Here, the above-mentioned detection period starts to be counted again after the system-on-chip performs the power-on operation each time.
In addition, the method can be set to continue to count the detection period on the basis of the last counted detection period under the condition that the chip system meets the preset power-on requirement. The condition that the preset power-on requirement is met can be understood that a time interval between the power-off time of the chip system and the power-on time of the current moment meets the preset requirement.
For example, after the chip system is continuously operated for 1 hour after the chip test information in the memory is read again, the power is cut off. If the chip system is powered on again after one minute, and the preset power-on requirement is met, the detection period is determined to be reached after the chip system continues to work for 4 hours, namely the preset detection time is reached. At this time, the chip system may read the chip test information from the memory.
The method is as follows:
reading the chip test information in a memory of the chip system if a modify operation for the memory of the chip system is detected.
For the fourth mode, detecting a modification operation for the memory of the chip system specifically includes:
creating a monitoring task, wherein the monitoring task is a task for monitoring the modification operation of a memory of the chip system; and determining that the condition for reading the chip test information is met under the condition that the modification operation aiming at the memory is monitored based on the monitoring task, and reading the chip test information in the memory of the chip system.
Here, the modifying operation may include at least one of: data write operations, data update (or replacement) operations, data delete operations, and the like.
Taking the above-described chip system including the chip 1 and the chip 2 as an example, assuming that the processing core1 in the chip 1 in the chip system has a failure, it may trigger to modify the chip test information and write the information of the processing core1 in the chip 1 into a memory (e.g., an eFuse); and when the monitoring task monitors the data writing operation of the memory and determines that the current moment reaches the preset detection moment, reading the chip test information in the memory of the chip system.
In the embodiment of the present disclosure, by setting the trigger condition for obtaining the chip test information, the change of the working state of the element in the chip system can be detected in real time by periodically reading the chip test information, so as to find out the failed element in time, and further, the power consumption of the processing core of each chip in the chip system and the allocation of the task can be adjusted in time, so that the chip system continuously keeps high-efficiency operation.
For the above step S103, after the chip test information is acquired, it may be determined whether a redundant element is included in the chip system based on information recorded in the chip test information.
In specific implementation, whether the chip test information is 'Good Die' or not can be judged, and if yes, the chip system is determined not to contain redundant elements; if the chip test information is the Partial Good Die, the chip system is determined to contain the redundant element, and at the moment, the specific element information of the redundant element can be determined through the Partial Good Die.
In the case that it is determined that the chip system includes the redundant element, based on step S103, the power consumption to be allocated in the chip system for the redundant element may be determined, and the operating parameter of the target element in the chip system may be adjusted based on the power consumption to be allocated, so as to adjust the power consumption of the target element.
In an optional implementation manner, for S105, adjusting an operating parameter of a target element in the chip system based on the power consumption to be allocated specifically includes the following processes:
step S11: acquiring the element priority of a first remaining element in the chip system; the first remaining element is a remaining chip and/or a remaining processing core except the redundant element in the chip system;
step S12: determining the target element in the first remaining element based on the element priority, and adjusting an operating parameter of the target element based on the power consumption to be allocated, wherein the operating parameter includes: an operating voltage and/or an operating frequency.
Here, the element priority may be a priority order of each element set in advance by the system-on-chip. The priority order may be determined according to the amount of tasks that each component needs to complete, and/or according to the importance of the tasks that each component needs to complete.
For example, as shown in the chip system of fig. 2, it is assumed that 8 chips, i.e., chip 1 to chip 8, are included in the chip system shown in fig. 2, and each chip includes 8 processing cores, i.e., core1 to core 8. At this time, the elements included in the chip system are chip 1, chip 2, … …, chip 8, chip 1-core1, chip 1-core2, chip 1-core3, … …, and chip 1-core 8; chip 2-core1, chip 2-core2, chip 2-core3, … … and chip 2-core 8; … …, respectively; chip 8-core1, chip 8-core2, chip 8-core3, … …, and chip 8-core 8. At this time, assuming that the redundant elements included in the chip system are chip 1 and chip 2-core1, at this time, the first remaining elements in the chip system may be determined according to the redundant elements, for example, the first remaining elements may be other elements except for chip 1 and chip 2-core1, for example, chip 2-core2, chip 2-core3, … …, chip 2-core 4; … …, respectively; chip 8-core1, chip 8-core2, chip 8-core3, … …, and chip 8-core 8.
In determining the priority order, the priority order of the first remaining element may be determined according to the amount of tasks that the first remaining element needs to complete and/or the priority order may be determined according to the importance of the tasks that the first remaining element needs to complete. For example, suppose that, in a system on a chip, chip 2 needs to process parallel computing tasks, where chip 2-core1 is used to receive data, chip 2-core2 and chip 2-core3 are used to perform parallel computing tasks on the received data, and chip 2-core4 is used to store results obtained by parallel computing and transmit the results to the next module; the chip 3 processes the result analysis task, and at this time, it may be determined that the task amount of the chip 2 is greater than that of the chip 3, and the task amounts of the chip 2-core2, the chip 2-core3 are greater than that of the chip 2-core1, and the chip 2-core4, and at this time, it may be determined that the priority order in the first remaining elements is: chip 2-core2, chip 2-core3, chip 2-core1, chip 2-core4, chip 3-core1, chip 3-core2, chip 3-core3 and chip 3-core 4.
When determining the target component in the first remaining component based on the component priority, the chip priorities of the remaining chips in the chip system may be first obtained, and then the target chip may be selected from the remaining chips according to the chip priorities; and adjusting the working parameters of the target chip based on the power consumption to be distributed, thereby realizing the adjustment of the power consumption of the target chip. After the power consumption of the target chip is adjusted, if the power consumption to be allocated also includes the remaining power consumption, the processing core priorities of the remaining processing cores in the chip system can be obtained, the target processing core is selected from the remaining processing cores according to the processing core priorities, and the working parameter of the target processing core is adjusted based on the remaining power consumption to be allocated, so that the power consumption of the target processing core is adjusted, wherein the target chip and the target processing core are the target element.
In the embodiment of the present disclosure, by obtaining the element priority of the first remaining element in the chip system and determining the target element in the first remaining element based on the element priority, effective utilization of power consumption to be allocated in the first remaining element can be achieved, and more power consumption is preferentially allocated to an element with a higher element priority, so that the task completion progress of the chip system is accelerated, and the performance of the chip system is improved.
In an optional implementation manner, for S105, adjusting an operating parameter of a target element in the chip system based on the power consumption to be allocated specifically further includes:
step S21: acquiring power consumption constraint information of the target element;
step S22: and adjusting the working parameters of the target element based on the power consumption constraint information and the power consumption to be distributed.
Here, the power consumption constraint information is used to represent constraint information of an operating voltage determined for each element and constraint information of an operating frequency when each element in the system-on-chip satisfies a timing closure requirement. The power consumption constraint information is used to indicate an upper limit value and a lower limit value of an operating voltage of the element, and to indicate an upper limit value and a lower limit value of an operating frequency of the element. When an operating voltage and an operating frequency are assigned to a component, the power consumption constraint information should be satisfied, i.e., an operating voltage and an operating range are assigned to the component within the power consumption constraint information. When the operating voltage and the operating range allocated to the element do not satisfy the power consumption constraint information, the element will not operate normally.
In the embodiment of the present disclosure, the operating parameter (e.g., the operating voltage and/or the operating frequency) of the target element may be adjusted based on the power consumption constraint information of the target element, for example, the operating parameter of the target element may be increased, so as to allocate the corresponding power consumption to the target element among the power consumptions to be allocated.
In the above embodiment, power consumption constraint information of the target element is used to allocate power consumption for the target element in power consumption to be allocated, so that performance loss of the chip system can be reduced, performance of the chip system can be improved, and efficient operation of the chip system can be ensured continuously.
In an optional implementation manner, in step S22, the adjusting the operating parameter of the target element based on the power consumption constraint information and the power consumption to be allocated specifically includes the following steps:
step S221: acquiring an adjustment range of preset working parameters of the target element in the chip system; the adjustment range is used for indicating the adjustment range of the working parameters of each chip and each processing core in the chip system in a normal working state;
step S222: and in the adjusting range, adjusting the working parameters of the target element based on the power consumption constraint information so as to allocate corresponding power consumption to the target element in the power consumption to be allocated.
A typical integrated chip system is capable of supporting a fixed frequency at a fixed voltage, thereby meeting fixed performance requirements. If the chip system is an AI training chip, the main role of the AI training chip is considered to be to train the AI model, so the AI training chip does not require a fixed number for performance, but faster and better. This means that the frequency requirements of the AI training chip are also not fixed, the higher the better. To increase the operating frequency (i.e., clock frequency) of the chip requires an increase in the operating voltage. In order to increase the operating voltage of the chip, the adjustment range of the operating parameters of each element of the chip system can be expanded through the timing convergence process at the stage of designing the chip system.
That is, in the timing convergence stage, the convergence is not performed by using a fixed operating voltage and operating frequency, but a corresponding adjustment range is preset, so that the timing convergence operation is performed at the upper and lower limits of the adjustment range, and the designed system-on-chip can operate in the voltage frequency range. In the actual use process of the chip, under the constraint of the power consumption constraint information, the corresponding power consumption is distributed to the target element in the power consumption to be distributed by adjusting the working voltage and the working frequency of the chip.
In the embodiment of the present disclosure, the adjustment range is determined based on a first range and a second range, the first range is a parameter range of an operating parameter of the target element in a normal operating state, which is determined based on a standard operating parameter of the target element, the second range is a parameter range of an operating parameter of the target element in a normal operating state, which is determined based on an adjustment operating parameter of the target element, and the adjustment operating parameter is a parameter after the operating parameter is adjusted based on a preset adjustment value.
Here, the first range is a parameter range of the operating parameter that is determined based on the standard operating parameter of the target component and in which the target component is in a normal operating state. For example, the standard operating voltage of TSMC (bench power) 7nm is 0.75V, and can be understood as a range around 0.75V due to errors and noises in actual operation.
The second range is a parameter range of the operating parameter of the target element in the normal operating state determined based on the adjusted operating parameter of the target element. In addition to this 0.75 standard operating voltage application scenario, in the disclosed embodiment, a 100mV boost operating scenario is provided, as shown in FIG. 3, i.e., a range around 0.85V, for example, the second range may be 0.8075V-0.8925V.
In the embodiment of the disclosure, as shown in fig. 3, for a range (a first range) corresponding to 0.75 and a range (a second range) corresponding to 0.85, a setup corner (a first corner) and a hold corner (a second corner) corresponding to each range may be set, so that a processing core in a chip system can be ensured to operate in a region between 0.75V and 0.85V, and an operating frequency is from 1GHz to 1.4 GHz.
Here, the setup time is used to represent the time when data is stable before the rising edge of the clock signal of the flip-flop arrives, and the hold time is used to represent the time when data is stable after the rising edge of the clock signal of the flip-flop arrives, so as to ensure that the data can be driven into the flip-flop at the rising edge of the clock.
In the embodiment of the present disclosure, after the adjustment range is set, the chip system can operate in the operating voltage range and the operating frequency range corresponding to the adjustment range.
In the embodiment of the disclosure, the adjustment ranges of the working voltage and the working frequency are set for the chip system in advance, and then the chip system is subjected to time sequence convergence based on the adjustment ranges, so that the chip system can work in the adjustment ranges, and the adjustment ranges of the working parameters of each element in the chip system are expanded. On the basis, when the working parameters of the target element are adjusted based on the adjustment range and the power consumption constraint information, the working performance of the target element can be improved while the target element is ensured to work normally, so that power consumption resources in the chip system are fully utilized, the performance of the chip system is improved to the maximum extent, and the chip system can better complete a training task.
In an alternative embodiment, the disclosed method further comprises the steps of:
in the case that it is determined that a redundant element exists in the chip system based on the chip test information, a clock shutdown instruction may be further generated to shut down a clock circuit corresponding to the redundant element in the chip system through the clock shutdown instruction.
In the embodiment of the present disclosure, by generating the clock closing instruction, the clock circuit corresponding to the redundant element may be closed, so as to save all power consumption of the redundant element.
In an alternative embodiment, the disclosed method further comprises the steps of:
step S41: in the working process of the chip system, checking abnormal working elements in the chip system according to a preset checking period, wherein the abnormal working elements comprise chips and/or processing cores in an abnormal working state;
step S42: and in the case that the abnormal working element is detected, updating the chip test information and adjusting the working parameters of a second remaining element in the chip system except the redundant element and the abnormal working element.
Here, the second remaining element is a remaining chip and/or a remaining processing core, except for the redundant element, of the first remaining element of the chip system.
In the embodiment of the present disclosure, a preset check period may be preset. In the process of power-on work of the chip system, the working state of each element in the chip system can be checked according to a preset check period, and when the element is detected to be in an abnormal working state, the chip test information in the chip system is updated. For example, the operation state of the element in the abnormal operation state may be written in the chip test information.
As can be seen from the above description, when the preset monitoring task monitors that the chip test information changes, the chip test information stored in the memory of the chip system can be acquired, and an element in an abnormal working state, that is, an abnormal working element, is determined based on the chip test information; and further determining power consumption information of the chip system aiming at the abnormal working element, and adjusting working parameters of the second residual element based on the power consumption information.
In this embodiment of the present disclosure, when adjusting the operating parameters of the second remaining components, a part or all of the components may be selected from the second remaining components as components to be subjected to power consumption allocation based on the component priority of each component, and a specific selection process is described in the foregoing embodiment and is not described in detail here.
In the disclosed embodiment, after an abnormal working element is detected, if the abnormal working element is in a working state, the training task processed by the abnormal working element may be assigned to a second remaining element to be executed. When the second remaining element executes the training task, the training task can be continuously executed based on the task execution progress recorded by the chip system; in addition to this, the second remaining element may also re-perform the training task.
Through the processing mode, the abnormal working elements in the chip system are detected periodically, so that the elements in abnormal operation can be found in time, the power consumption of the elements in normal operation in the chip system can be adjusted in time under the condition of ensuring the normal execution of the training task, the performance loss of the chip system is reduced, the performance of the normal working elements is improved, and the corresponding training task can be completed better by the chip system.
Referring to fig. 4, a flowchart of a neural network training method provided in the embodiment of the present disclosure is shown, where the method includes steps S401 to S403, where:
step S401: acquiring a target training task aiming at a neural network to be trained;
step S403: determining training elements for the target training task in the remaining elements except redundant elements in the chip system, so as to execute the target training task through the training elements; the remaining elements are elements whose power consumption is adjusted by the chip power consumption adjusting method described in the above embodiment.
In the embodiment of the disclosure, the chip system may acquire a target training task, and when a redundant element exists in the chip system, the chip system may allocate the target training task to an element other than the redundant element in the chip system to execute the target training task.
In the embodiment of the present disclosure, if the target training task is obtained when the chip test information of the chip system is obtained, it may be determined whether a redundant element exists in the chip system based on the chip test information. And if the power consumption to be distributed of the chip system aiming at the redundant element is determined to exist, and after the working parameters of the target element in the chip system are adjusted based on the power consumption to be distributed, the step of determining a training element for the target training task in the rest elements except the redundant element in the chip system is executed.
By the processing mode, in the using process of the chip system, the target training task is distributed to the rest elements except the redundant elements in the chip system to be executed, so that the normal running of the target training task can be ensured, and the processing efficiency and the processing quality of the chip system are improved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a chip power consumption adjusting apparatus corresponding to the chip power consumption adjusting method, and as the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the chip power consumption adjusting method in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
Referring to fig. 5, a schematic diagram of an apparatus for adjusting chip power consumption according to an embodiment of the present disclosure is shown, where the apparatus includes: a first acquisition unit 51, a first determination unit 52, an adjustment unit 53; wherein the content of the first and second substances,
a first obtaining unit 51, configured to obtain chip test information of a chip system; the chip test information is used for indicating the running state of each processing core contained in each chip;
a first determination unit 52, configured to determine, if it is determined that a redundant element exists in the chip system based on the chip test information, power consumption to be allocated to the redundant element by the chip system; the redundant element includes: redundant chips and/or redundant processing cores;
an adjusting unit 53, configured to adjust an operating parameter of a target element in the chip system based on the power consumption to be allocated, so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.
As can be seen from the above description, in the embodiment of the present disclosure, by obtaining chip test information of a chip system, a redundant element included in the chip system may be determined, and power consumption to be allocated to the redundant element in the chip system may also be determined, so as to adjust working parameters of some or all elements (that is, target elements) except the redundant element in the chip system by using the power consumption to be allocated, thereby adjusting the power consumption of the target element in the chip system. By adjusting the working parameters of the target element according to the power consumption to be allocated of the redundant element, the utilization rate of the power consumption in the chip system can be improved, so that the loss of the power consumption in the chip system is reduced, and meanwhile, the performance of the chip system can be improved.
In a possible embodiment, the obtaining unit is further configured to obtain chip test information of the chip system by at least one of: reading the chip test information in a memory of the chip system in the process of detecting that the chip system executes power-on operation each time; under the condition that the current moment is detected to reach the preset detection moment of the chip system, reading the chip test information in a memory of the chip system; reading the chip test information in a memory of the chip system if a modify operation for the memory of the chip system is detected.
In a possible embodiment, the adjusting unit is further configured to: acquiring the element priority of a first remaining element in the chip system; the first remaining element is a remaining chip and/or a remaining processing core in the chip system except the redundant element; determining the target element in the first remaining element based on the element priority, and adjusting an operating parameter of the target element based on the power consumption to be allocated, wherein the operating parameter includes: an operating voltage and/or an operating frequency.
In a possible embodiment, the adjusting unit is further configured to: acquiring power consumption constraint information of the target element; and adjusting the working parameters of the target element based on the power consumption constraint information and the power consumption to be distributed.
In a possible embodiment, the adjusting unit is further configured to: acquiring an adjustment range of preset working parameters of the target element in the chip system; the adjustment range is used for indicating the adjustment range of the working parameters of each chip and each processing core in the chip system in a normal working state; and in the adjusting range, adjusting the working parameters of the target element based on the power consumption constraint information so as to allocate corresponding power consumption to the target element in the power consumption to be allocated.
In a possible embodiment, the adjusting unit is further configured to: the adjustment range is determined based on a first range and a second range, the first range is a parameter range of the working parameter of the target element in the normal operation state determined based on the standard working parameter of the target element, the second range is a parameter range of the working parameter of the target element in the normal operation state determined based on the adjustment working parameter of the target element, and the adjustment working parameter is a parameter after the working parameter is adjusted based on a preset adjustment value.
In a possible embodiment, the adjusting unit is further configured to: and generating a clock closing instruction to close a clock circuit corresponding to the redundant element in the chip system through the clock closing instruction under the condition that the chip system is determined to have the redundant element based on the chip test information.
In a possible embodiment, the adjusting unit is further configured to: in the working process of the chip system, checking abnormal working elements in the chip system according to a preset checking period, wherein the abnormal working elements comprise chips and/or processing cores in an abnormal working state; and in the case that the abnormal working element is detected, updating the chip test information and adjusting the working parameters of a second remaining element in the chip system except the redundant element and the abnormal working element.
Referring to fig. 6, a schematic diagram of a neural network training device provided in an embodiment of the present disclosure is shown, where the device includes: a second acquisition unit 61, a second determination unit 62; wherein the content of the first and second substances,
a second obtaining unit 61, configured to obtain a target training task for a neural network to be trained;
a second determining unit 62, configured to determine, among remaining elements in the chip system except for the redundant element, a training element for the target training task, so as to execute the target training task by the training element; wherein the remaining components are components after power consumption adjustment by the chip power consumption adjustment method according to any one of claims 1 to 8.
By the processing mode, in the using process of the chip system, the target training task is distributed to the rest elements except the redundant elements in the chip system to be executed, so that the normal running of the target training task can be ensured, and the processing efficiency and the processing quality of the chip system are improved.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Referring to fig. 7, which is a schematic diagram of a chip system provided in an embodiment of the present disclosure, the chip system 700 includes: at least one chip 71 and a controller 72, each chip comprising at least one processing core 701; the controller 72 is configured to adjust power consumption of target elements in the chip except for redundant elements by the chip power consumption adjusting method, where the redundant elements include: redundant chips and/or redundant processing cores.
Corresponding to the method for adjusting the power consumption of the chip in fig. 1, an embodiment of the present disclosure further provides an electronic device 800, as shown in fig. 8, a schematic structural diagram of the electronic device 800 provided in the embodiment of the present disclosure includes:
a processor 81, a memory 82, and a bus 83; the memory 82 is used for storing execution instructions and includes a memory 821 and an external memory 822; the memory 821 herein is also referred to as an internal memory, and is configured to temporarily store operation data in the processor 81 and data exchanged with the external memory 822 such as a hard disk, the processor 81 exchanges data with the external memory 822 through the memory 821, and when the electronic device 800 operates, the processor 81 communicates with the memory 82 through the bus 83, so that the processor 81 executes the following instructions:
acquiring chip test information of the chip system; the chip test information is used for indicating the running state of each processing core contained in each chip;
determining power consumption to be allocated to the redundant element by the chip system under the condition that the chip system is determined to have the redundant element based on the chip test information; the redundant element includes: redundant chips and/or redundant processing cores;
adjusting the working parameters of a target element in the chip system based on the power consumption to be distributed so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.
Corresponding to the neural network training method in fig. 4, an embodiment of the present disclosure further provides an electronic device 900, as shown in fig. 9, which is a schematic structural diagram of the electronic device 900 provided in the embodiment of the present disclosure, and includes:
a processor 91, a memory 92, and a bus 93; the memory 92 is used for storing execution instructions and includes a memory 921 and an external memory 922; the memory 921 is also referred to as an internal memory, and is configured to temporarily store operation data in the processor 91 and data exchanged with an external memory 922 such as a hard disk, the processor 91 exchanges data with the external memory 922 through the memory 921, and when the electronic apparatus 900 operates, the processor 91 communicates with the memory 92 through the bus 93, so that the processor 91 executes the following instructions:
acquiring a target training task aiming at a neural network to be trained;
determining training elements for the target training task in the remaining elements except redundant elements in the chip system, so as to execute the target training task through the training elements; wherein, the rest elements are elements after power consumption adjustment is carried out by the chip power consumption adjustment method.
The embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the chip power consumption adjustment method in the above method embodiments are executed. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the chip power consumption adjustment method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A method for adjusting chip power consumption is applied to a chip system comprising at least one chip, wherein each chip comprises at least one processing core, and the method comprises the following steps:
acquiring chip test information of the chip system; the chip test information is used for indicating the running state of each processing core contained in each chip;
determining power consumption to be allocated to the redundant element by the chip system under the condition that the chip system is determined to have the redundant element based on the chip test information; the redundant element includes: redundant chips and/or redundant processing cores;
adjusting the working parameters of a target element in the chip system based on the power consumption to be distributed so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.
2. The method of claim 1, wherein the obtaining chip test information of the chip system comprises at least one of:
reading the chip test information in a memory of the chip system in the process of detecting that the chip system executes power-on operation each time;
under the condition that the current moment is detected to reach the preset detection moment of the chip system, reading the chip test information in a memory of the chip system;
reading the chip test information in a memory of the chip system if a modify operation for the memory of the chip system is detected.
3. The method of claim 1, wherein adjusting the operating parameter of the target component in the system-on-chip based on the power consumption to be allocated comprises:
acquiring the element priority of a first remaining element in the chip system; the first remaining element is a remaining chip and/or a remaining processing core in the chip system except the redundant element;
determining the target element in the first remaining element based on the element priority, and adjusting an operating parameter of the target element based on the power consumption to be allocated, wherein the operating parameter includes: an operating voltage and/or an operating frequency.
4. The method of claim 1, wherein adjusting the operating parameter of the target component in the system-on-chip based on the power consumption to be allocated comprises:
acquiring power consumption constraint information of the target element;
and adjusting the working parameters of the target element based on the power consumption constraint information and the power consumption to be distributed.
5. The method of claim 4, wherein adjusting the operating parameter of the target element based on the power consumption constraint information and the power consumption to be allocated comprises:
acquiring an adjustment range of preset working parameters of the target element in the chip system; the adjustment range is used for indicating the adjustment range of the working parameters of each chip and each processing core in the chip system in a normal working state;
and in the adjusting range, adjusting the working parameters of the target element based on the power consumption constraint information so as to allocate corresponding power consumption to the target element in the power consumption to be allocated.
6. The method according to claim 5, wherein the adjustment range is determined based on a first range and a second range, the first range is a parameter range of an operating parameter of the target element in a normal operating state determined based on a standard operating parameter of the target element, the second range is a parameter range of an operating parameter of the target element in a normal operating state determined based on an adjusted operating parameter of the target element, and the adjusted operating parameter is a parameter after the operating parameter is adjusted based on a preset adjustment value.
7. The method according to any one of claims 1 to 6, further comprising:
and generating a clock closing instruction to close a clock circuit corresponding to the redundant element in the chip system through the clock closing instruction under the condition that the chip system is determined to have the redundant element based on the chip test information.
8. The method according to any one of claims 1 to 7, further comprising:
in the working process of the chip system, checking abnormal working elements in the chip system according to a preset checking period, wherein the abnormal working elements comprise chips and/or processing cores in an abnormal working state;
and in the case that the abnormal working element is detected, updating the chip test information and adjusting the working parameters of a second remaining element in the chip system except the redundant element and the abnormal working element.
9. A neural network training method, comprising:
acquiring a target training task aiming at a neural network to be trained;
determining training elements for the target training task in the remaining elements except redundant elements in the chip system, so as to execute the target training task through the training elements; wherein the remaining components are components after power consumption adjustment by the chip power consumption adjustment method according to any one of claims 1 to 8.
10. An apparatus for adjusting power consumption of a chip, comprising:
the first acquisition unit is used for acquiring chip test information of the chip system; the chip test information is used for indicating the running state of each processing core contained in each chip;
a first determination unit, configured to determine power consumption to be allocated to the redundant element by the chip system, if it is determined that the chip system has the redundant element based on the chip test information; the redundant element includes: redundant chips and/or redundant processing cores;
the adjusting unit is used for adjusting the working parameters of a target element in the chip system based on the power consumption to be distributed so as to adjust the power consumption of the target element; the target elements are part or all of the elements except the redundant elements in the chip system.
11. A neural network training device, comprising:
the second acquisition unit is used for acquiring a target training task aiming at the neural network to be trained;
a second determining unit, configured to determine, in remaining elements in a chip system except for redundant elements, a training element for the target training task, so as to execute the target training task through the training element; wherein the remaining components are components after power consumption adjustment by the chip power consumption adjustment method according to any one of claims 1 to 8.
12. A chip system, comprising: at least one chip and a controller, each chip comprising at least one processing core; the controller configured to adjust power consumption of a target element other than a redundant element in the chip by the chip power consumption adjustment method according to any one of claims 1 to 8, the redundant element including: redundant chips and/or redundant processing cores.
13. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine readable instructions when executed by the processor performing the steps of the chip power consumption adjusting method according to any one of claims 1 to 8 or performing the steps of the neural network training method according to claim 9.
14. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, performs the steps of the chip power consumption adjustment method according to any one of claims 1 to 8, or performs the steps of the neural network training method according to claim 9.
CN202111157365.1A 2021-09-30 2021-09-30 Chip power consumption adjustment method, device and chip system, and neural network training method and device Pending CN113885691A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111157365.1A CN113885691A (en) 2021-09-30 2021-09-30 Chip power consumption adjustment method, device and chip system, and neural network training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111157365.1A CN113885691A (en) 2021-09-30 2021-09-30 Chip power consumption adjustment method, device and chip system, and neural network training method and device

Publications (1)

Publication Number Publication Date
CN113885691A true CN113885691A (en) 2022-01-04

Family

ID=79004738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111157365.1A Pending CN113885691A (en) 2021-09-30 2021-09-30 Chip power consumption adjustment method, device and chip system, and neural network training method and device

Country Status (1)

Country Link
CN (1) CN113885691A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116400201A (en) * 2023-06-06 2023-07-07 中诚华隆计算机技术有限公司 Core particle working state monitoring method and device, electronic equipment and storage medium
WO2023221360A1 (en) * 2022-05-19 2023-11-23 北京百度网讯科技有限公司 Training method, apparatus and system for deep learning model, and device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110252260A1 (en) * 2010-04-08 2011-10-13 International Business Machines Corporation Reducing Power Requirements of a Multiple Core Processor
CN109917900A (en) * 2017-12-07 2019-06-21 技嘉科技股份有限公司 Power supply managing method and computer system
CN110703898A (en) * 2019-09-06 2020-01-17 无锡江南计算技术研究所 Dynamic management system and method for processor power consumption based on periodic query and interrupt
CN112269466A (en) * 2020-10-16 2021-01-26 苏州浪潮智能科技有限公司 Power supply method of power chip and server mainboard
CN112363609A (en) * 2020-10-21 2021-02-12 海光信息技术股份有限公司 Method and device for reducing power consumption of network on chip, CPU chip and server
CN112433091A (en) * 2020-12-04 2021-03-02 武汉轻工大学 Real-time detection system for power consumption of chip

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110252260A1 (en) * 2010-04-08 2011-10-13 International Business Machines Corporation Reducing Power Requirements of a Multiple Core Processor
CN109917900A (en) * 2017-12-07 2019-06-21 技嘉科技股份有限公司 Power supply managing method and computer system
CN110703898A (en) * 2019-09-06 2020-01-17 无锡江南计算技术研究所 Dynamic management system and method for processor power consumption based on periodic query and interrupt
CN112269466A (en) * 2020-10-16 2021-01-26 苏州浪潮智能科技有限公司 Power supply method of power chip and server mainboard
CN112363609A (en) * 2020-10-21 2021-02-12 海光信息技术股份有限公司 Method and device for reducing power consumption of network on chip, CPU chip and server
CN112433091A (en) * 2020-12-04 2021-03-02 武汉轻工大学 Real-time detection system for power consumption of chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈力颖: "《基于55 nm 工艺的MCU 低功耗物理设计》", 《天津工业大学学报》, vol. 40, no. 3, pages 77 - 82 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023221360A1 (en) * 2022-05-19 2023-11-23 北京百度网讯科技有限公司 Training method, apparatus and system for deep learning model, and device and medium
CN116400201A (en) * 2023-06-06 2023-07-07 中诚华隆计算机技术有限公司 Core particle working state monitoring method and device, electronic equipment and storage medium
CN116400201B (en) * 2023-06-06 2023-08-11 中诚华隆计算机技术有限公司 Core particle working state monitoring method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US8627143B2 (en) Dynamically modeling and selecting a checkpoint scheme based upon an application workload
CN103415840B (en) Mistake management across hardware layer and software layer
CN113885691A (en) Chip power consumption adjustment method, device and chip system, and neural network training method and device
US20060075286A1 (en) System and method for logging hardware usage data, and uses for such logged hardware usage data
CN103026316B (en) Computer component power-consumption database
CN104335175A (en) Methods and systems to identify and migrate threads among system nodes based on system performance metrics
CN103477324A (en) Dynamic mapping of logical cores
CN102483646A (en) Altering Performance Of Computational Units Heterogeneously According To Performance Sensitivity
TWI564810B (en) Characterization of within-die variations of many-core processors
US20140082346A1 (en) Method and System for Managing Basic Input/Output System (BIOS) Configuration Data of BIOS
CN103827834A (en) Migration method of in-memory data, computer and device
CN111966449B (en) Virtual machine backup management method, system, terminal and storage medium
CN101634960A (en) Method for revising BIOS parameter and regenerating checksum
CN103838539A (en) Performance measurement unit, processor core comprising thereof and process profiling method
CN109933504B (en) Hard disk delay test method, device, terminal and storage medium
CN110488673A (en) A kind of data processing module and data processing method of low-power consumption mode
CN102736013B (en) A kind of idle condition method of testing of SoC chip, system and proving installation
CN102736957A (en) Resetting method and device
CN109542351B (en) Power consumption control method of solid state disk and solid state disk
CN109298992B (en) Electronic device and boot time calculation method
US10776240B2 (en) Non-intrusive performance monitor and service engine
US8560873B1 (en) Determination of transitional characteristic attributes of components during scheduled wake-up power transition of computing device
CN112269725A (en) Recording device and method for storing power-on and power-off time of controller
CN110703988B (en) Storage pool creating method, system, terminal and storage medium for distributed storage
CN113110729A (en) Power supply method, system and storage medium for improving data security of server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Floor 1-3, No. 24, Lane 315, Fenggu Road, Xuhui District, Shanghai, 201103

Applicant after: Shanghai Qianshi Technology Co.,Ltd.

Address before: 201103 unit 6-78, building 6, No. 1900, Hongmei Road, Xuhui District, Shanghai

Applicant before: Shanghai shangtangqian Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information