CN109902807B - Many-core chip distributed thermal modeling method based on recurrent neural network - Google Patents
Many-core chip distributed thermal modeling method based on recurrent neural network Download PDFInfo
- Publication number
- CN109902807B CN109902807B CN201910148729.6A CN201910148729A CN109902807B CN 109902807 B CN109902807 B CN 109902807B CN 201910148729 A CN201910148729 A CN 201910148729A CN 109902807 B CN109902807 B CN 109902807B
- Authority
- CN
- China
- Prior art keywords
- core
- chip
- many
- cores
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 34
- 230000000306 recurrent effect Effects 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 title claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 abstract description 2
- 125000004122 cyclic group Chemical group 0.000 abstract 1
- 230000006870 function Effects 0.000 description 10
- 238000012549 training Methods 0.000 description 9
- 238000012795 verification Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000009022 nonlinear effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the field of electronic design automation and discloses a many-core chip distributed thermal modeling method based on a recurrent neural network. Dynamic thermal management can manage the temperature of many-core chips very efficiently, while a good many-core chip thermal modeling can help dynamic thermal management well. However, in the conventional many-core chip lumped thermal modeling, the calculation cost is exponentially increased as the number of chip cores is increased. In order to solve the problem of overlarge calculation cost of the lumped thermal model, the invention provides a many-core chip distributed thermal modeling method based on a cyclic neural network. The invention can simulate the temperature characteristic of many-core chips with quite high speed and high precision.
Description
Technical Field
The invention belongs to the field of electronic design automation, relates to the technical field of deep learning, and particularly relates to a many-core chip distributed thermal modeling method based on a recurrent neural network.
Background
As the feature size of chips continues to decrease with advances in semiconductor processing, commercial chips at 7nm have begun to be produced in volume by 2018. After the nano-scale chip enters the nano-scale, the dominant frequency of the chip is difficult to improve due to the influence of leakage current, so that the high performance development direction of the chip is to increase the core number of the chip instead of improving the dominant frequency, and remarkable effect is achieved.
The performance of many-core chips is greatly improved due to the increase of the number of cores, but the performance of many-core chips also brings serious chip thermal reliability problems, and the main reason for the problem is that the chip temperature is too high due to high power density.
To address the many-core die thermal reliability issue, an efficient and less costly solution to dynamic thermal management is proposed. The scheme is based on an automatic control theory, and the ideal temperature distribution is obtained through accurate estimation and real-time adjustment of power consumption. Dynamic thermal management techniques can ensure that temperatures are managed efficiently at a lower performance overhead, especially when the number of cores is small. However, when the number of core chips in the core network is too large, the size of the lumped thermal model is too large, and the calculation cost exponentially increases along with the increase of the number of the core chips, so that the processor performance cost caused by thermal management is too large.
In order to solve the problems, the public core chip distributed thermal modeling is one of the problems to be solved urgently at present.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a many-core chip distributed thermal modeling method based on a recurrent neural network. The modeling method decomposes the many-core chip thermal model into a plurality of small models, and the more extreme example is to establish a thermal model for each core of the many-core chip and carry out limited information exchange between the cores. The method first builds a recurrent neural network model and then trains the network through offline temperature and power data. The trained recurrent neural network can predict the temperature of each core on the chip. Performing thermal modeling on each core of the many-core chip, wherein the positions of the cores of the chips are different, and the thermal modeling is also different; given the position of a chip core, the temperature of the chip core can be calculated according to the power of the chip core and the temperature of surrounding cores; establishing a thermal model of each core by using a recurrent neural network, simulating a nonlinear function, and processing data of a vector sequence; the input layer of the recurrent neural network is subjected to weight value disassembly, an input weight value matrix is disassembled into two parts, one part is used for supplying power, and the other part is used for supplying temperature, so that the problem that the input has both power and temperature is solved; for each core of the many-core chip, the selection mode of the ambient temperature is fixed to reduce errors, the cores are not arranged from small to large according to the serial numbers of the cores, the cores are arranged right above the chips, and then the chips are rotated clockwise until the temperatures of all adjacent cores are taken, so that errors are not prone to occur when the outermost cores are processed, because the cores are adjacent to the external environment and the adjacent positions are different.
The invention adopts the following technical scheme to solve the problems:
step one, many-core chip thermal model parameters, mainly thermal capacitance and thermal resistance parameters on the whole chip, are extracted from Hotspot, and a many-core chip thermal model is established.
And step two, acquiring a plurality of groups of data (a plurality of time nodes, each node has power and temperature information of each core) by using the thermal models, and then making the plurality of groups of data into a training set and a verification set, wherein the training set is used for training the recurrent neural network, the verification set only verifies the trained neural network, and the data in the verification set is not used for training.
And step three, sending the training set into a circulating neural network model which is not trained (the weight matrix in the model is initialized randomly), so as to obtain the output of the circulating neural network. Because the recurrent neural network has not been trained, there is a large gap between the output of the recurrent neural network and the true output. In order to obtain an accurate recurrent neural network model, the weight matrix can be adjusted to make the temperature output of the weight matrix as close as possible to the output of the training set. Therefore, the goal becomes to minimize the loss function by adjusting the weight matrix of the recurrent neural network, the smaller the loss function, the closer the output of the recurrent neural network is to the true output.
And step four, optimizing the loss function by adopting a gradient descent optimization algorithm, setting a learning rate, calculating partial derivatives of the weight matrix of the circular neural network by the loss function respectively, and then carrying out iterative updating on the partial derivatives. And after multiple iterations, the training is finished until the loss function is not reduced or reaches the set maximum iteration number, and the loss function value at the moment, namely the training error, is recorded. Meanwhile, the verification set is sent to a trained recurrent neural network, iteration is not updated on the verification set, and only the loss function value, namely the verification error, is recorded. And then changing the number of the hidden layers and the number of neurons of the hidden layers, retraining a new model, and recording the training error and the verification error of the new model. Finally, from these models, the one with the smallest verification error is selected as the thermal model of the chip core. For each core on the many-core chip, a thermal model is trained for each core, and the cores are combined to form the many-core chip distributed thermal model.
Compared with the prior art, the invention has the beneficial results that: the recurrent neural network can effectively fit a nonlinear function, so that the many-core chip distributed thermal model established by using the recurrent neural network can simulate the temperature characteristics of the many-core chip with high accuracy and quick response speed.
Drawings
The invention is further illustrated with reference to the following figures and examples.
Fig. 1 is a layout and numbering of a 16-core chip and a position diagram of a 6 th core.
Fig. 2 is a thermal model structure of the 6 th core and a positional relationship with other adjacent cores.
Fig. 3 is a diagram of a recurrent neural network whose recurrent structure is from the output layer to the hidden layer.
FIG. 4 is a graph comparing a predicted temperature value and an actual temperature value for a 6 th nuclear thermal model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the examples of the present invention will be clearly and completely described below with reference to the accompanying drawings in the examples of the present invention, and it is obvious that the described examples are a part of examples of the present invention, but not all examples. All other examples, which can be obtained by a person skilled in the art without inventive step based on the examples of the present invention, are within the scope of the present invention.
Fig. 1 is a layout number of a 16-core chip and a position diagram of a 6 th core.
In the example of the invention, a 16-core many-core chip is provided, which is numbered as shown in fig. 1, and the distributed thermal model of the whole many-core chip is formed by modeling each core and then combining the models, namely the distributed thermal model of the whole many-core chip. Where the 6 th kernel is highlighted, the process of building the model will be explained taking the 6 th kernel as an example.
Fig. 2 is a thermal model structure of the 6 th core and a positional relationship with other adjacent cores.
The thermal modeling of the 6 th core and its positional relationship to other adjacent cores is depicted, where it is adjacent to cores 2, 5, 7, and 10, connected by thermal resistors; there is also a grounded thermal capacitor; there is also externally input power.
Fig. 3 is a structural diagram of a recurrent neural network, and in consideration of the non-linear effect, the present invention uses a neural network to build a thermal model for each core, and since the temperature has a series of values in time to form a vector sequence, the recurrent neural network is a neural network specialized in sequence modeling, which can well process data in the form of a vector sequence, and thus the recurrent neural network is finally used to build a thermal model for each core. Here, the input is Pi(k) And Ti_near(k) The power of the ith core and the temperature of the core adjacent to the ith core at the time k are respectively represented, and the state Hi(k) The method is also called a hidden layer in the recurrent neural network and represents the state of the ith core at the k moment; output Ti(k) Indicating the temperature of the ith nucleus at time k. WihIs a weight matrix, W, from the input layer to the hidden layerhoIs a weight matrix, W, from the hidden layer to the output layerohIs the weight matrix from the output layer to the hidden layer. In order to enable the recurrent neural network to better fit the function, the hidden layer may have multiple layers, and here is shown a cycle with only one hidden layerA recurrent neural network.
FIG. 4 is a graph comparing a predicted temperature value and an actual temperature value for a 6 th nuclear thermal model.
The trained thermal model of the 6 th core based on the recurrent neural network is used for predicting the temperature of the 6 th core, and it can be seen that the temperature predicted by the model can be well fitted with the real temperature.
The invention discloses a many-core chip distributed thermal model method based on a recurrent neural network, which is described in detail in the above examples, but the invention is not limited to the above examples, and the technical scheme described in the previous examples can be modified afterwards, so that the essence of the corresponding technical scheme does not depart from the spirit and scope of the technical scheme of each example of the invention.
Claims (1)
1. A many-core chip distributed thermal modeling method based on a recurrent neural network is characterized in that: performing thermal modeling on each core of the many-core chip, wherein the thermal modeling of the chip cores is different due to different positions of the chip cores; given the position of a chip core, the temperature of the chip core can be calculated according to the power of the chip core and the temperature of surrounding cores; establishing a thermal model of each core by using a recurrent neural network, simulating a nonlinear function, and processing data of a vector sequence; the input layer of the recurrent neural network is subjected to weight value disassembly, an input weight value matrix is disassembled into two parts, one part is used for supplying power, and the other part is used for supplying temperature, so that the problem that the input has both power and temperature is solved; for each core of the many-core chip, the selection mode of the ambient temperature is fixed to reduce errors, the cores are not arranged from small to large according to the serial numbers of the cores, the cores are arranged right above the chips, and then the chips are rotated clockwise until the temperatures of all adjacent cores are taken, so that errors are not prone to occur when the outermost cores are processed, because the cores are adjacent to the external environment and the adjacent positions are different.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910148729.6A CN109902807B (en) | 2019-02-27 | 2019-02-27 | Many-core chip distributed thermal modeling method based on recurrent neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910148729.6A CN109902807B (en) | 2019-02-27 | 2019-02-27 | Many-core chip distributed thermal modeling method based on recurrent neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109902807A CN109902807A (en) | 2019-06-18 |
CN109902807B true CN109902807B (en) | 2022-07-05 |
Family
ID=66945767
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910148729.6A Expired - Fee Related CN109902807B (en) | 2019-02-27 | 2019-02-27 | Many-core chip distributed thermal modeling method based on recurrent neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109902807B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902807B (en) * | 2019-02-27 | 2022-07-05 | 电子科技大学 | Many-core chip distributed thermal modeling method based on recurrent neural network |
CN113778655A (en) * | 2020-06-09 | 2021-12-10 | 北京灵汐科技有限公司 | Network precision quantification method and system |
CN113467590B (en) * | 2021-09-06 | 2021-12-17 | 南京大学 | Many-core chip temperature reconstruction method based on correlation and artificial neural network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9015093B1 (en) * | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
CN106681453A (en) * | 2016-11-24 | 2017-05-17 | 电子科技大学 | Dynamic heat treatment method of high-performance multi-core microprocessor |
GB201810314D0 (en) * | 2018-06-22 | 2018-08-08 | Moixa Energy Holdings Ltd | Systems for machine learning, optimising and managing local multi-asset flexibility of distributed energy storage resources |
CN108563949A (en) * | 2018-04-16 | 2018-09-21 | 电子科技大学 | For the duty mapping method of multi-core processor information security |
CN109685207A (en) * | 2018-11-27 | 2019-04-26 | 电子科技大学 | A kind of multi core chip thermal management algorithm based on Recognition with Recurrent Neural Network |
CN109902807A (en) * | 2019-02-27 | 2019-06-18 | 电子科技大学 | A kind of hot modeling method of many-core chip distribution formula based on Recognition with Recurrent Neural Network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8571847B2 (en) * | 2010-01-08 | 2013-10-29 | International Business Machines Corporation | Efficiency of static core turn-off in a system-on-a-chip with variation |
US10248675B2 (en) * | 2013-10-16 | 2019-04-02 | University Of Tennessee Research Foundation | Method and apparatus for providing real-time monitoring of an artifical neural network |
-
2019
- 2019-02-27 CN CN201910148729.6A patent/CN109902807B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9015093B1 (en) * | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
CN106681453A (en) * | 2016-11-24 | 2017-05-17 | 电子科技大学 | Dynamic heat treatment method of high-performance multi-core microprocessor |
CN108563949A (en) * | 2018-04-16 | 2018-09-21 | 电子科技大学 | For the duty mapping method of multi-core processor information security |
GB201810314D0 (en) * | 2018-06-22 | 2018-08-08 | Moixa Energy Holdings Ltd | Systems for machine learning, optimising and managing local multi-asset flexibility of distributed energy storage resources |
CN109685207A (en) * | 2018-11-27 | 2019-04-26 | 电子科技大学 | A kind of multi core chip thermal management algorithm based on Recognition with Recurrent Neural Network |
CN109902807A (en) * | 2019-02-27 | 2019-06-18 | 电子科技大学 | A kind of hot modeling method of many-core chip distribution formula based on Recognition with Recurrent Neural Network |
Non-Patent Citations (7)
Title |
---|
A Fast Leakage-Aware Full-Chip Transient Thermal Estimation Method;Hai Wang;《IEEE Transactions on Computers》;20171201;第67卷(第5期);第617-630页 * |
Distributed model predictive control for dynamic thermal management of multi-core systems;Lei Xu等;《2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)》;20170803;第1-3页 * |
众核芯片分布式热建模与热管理技术研究;胥雷;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20180915(第9期);第I135-153页 * |
众核芯片热建模与功耗管理技术研究;张明;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215(第2期);第I135-603页 * |
基于循环神经网络的多核芯片热管理技术研究;郭星星;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200115(第1期);第I135-390页 * |
基于深度学习的3DIC分布式热管理技术研究;肖涛;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115(第1期);第I135-389页 * |
基于混合优化的多核处理器动态热管理方法;黄柯衡等;《电子工艺技术》;20180318;第39卷(第2期);第71-75+91页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109902807A (en) | 2019-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109902807B (en) | Many-core chip distributed thermal modeling method based on recurrent neural network | |
CN116702678B (en) | DTCO optimization method integrating deep learning and swarm intelligence algorithm | |
CN110083125B (en) | Machine tool thermal error modeling method based on deep learning | |
CN110289987B (en) | Multi-agent system network anti-attack capability assessment method based on characterization learning | |
CN115017850A (en) | Digital integrated circuit optimization method | |
CN109670145A (en) | Helicopter Main Reducer vibration performance threshold setting method based on observer | |
CN113760660A (en) | Three-dimensional multi-core chip temperature prediction method and system based on long-time memory neural network | |
CN116151324A (en) | RC interconnection delay prediction method based on graph neural network | |
CN113642236A (en) | Multidisciplinary-oriented digital twin graph network modeling method | |
CN117057229A (en) | Multi-objective optimization method based on deep reinforcement learning power module | |
CN117220278B (en) | Long-term wind power prediction method considering non-stationary space-time dependence | |
CN116894180B (en) | Product manufacturing quality prediction method based on different composition attention network | |
CN109816174B (en) | Air route opening prediction method based on network evolution influence factors | |
CN114757111B (en) | Intelligent equipment health management method based on digital twinning technology | |
KR102138227B1 (en) | An apparatus for optimizing fluid dynamics analysis and a method therefor | |
CN114266349A (en) | Load flow calculation method based on adaptive neural network | |
Qiu et al. | Deep Gated Recurrent Unit Network for High-Speed Links Modeling | |
CN113033695A (en) | Method for predicting faults of electronic device | |
CN111414724A (en) | Method for optimizing circuit simulation | |
CN117668743B (en) | Time sequence data prediction method of association time-space relation | |
CN112101631B (en) | Product construction period prediction method based on recurrent neural network | |
KR102130096B1 (en) | An apparatus for diagnosing analysis and a method thereof | |
CN114332723B (en) | Video behavior detection method based on semantic guidance | |
CN116911245B (en) | Layout method, system, equipment and storage medium of integrated circuit | |
CN113806996B (en) | Grate cooler optimization method based on time memory modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220705 |