CN109902807B - Many-core chip distributed thermal modeling method based on recurrent neural network - Google Patents

Many-core chip distributed thermal modeling method based on recurrent neural network Download PDF

Info

Publication number
CN109902807B
CN109902807B CN201910148729.6A CN201910148729A CN109902807B CN 109902807 B CN109902807 B CN 109902807B CN 201910148729 A CN201910148729 A CN 201910148729A CN 109902807 B CN109902807 B CN 109902807B
Authority
CN
China
Prior art keywords
core
chip
many
cores
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910148729.6A
Other languages
Chinese (zh)
Other versions
CN109902807A (en
Inventor
王海
肖涛
唐迪娅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910148729.6A priority Critical patent/CN109902807B/en
Publication of CN109902807A publication Critical patent/CN109902807A/en
Application granted granted Critical
Publication of CN109902807B publication Critical patent/CN109902807B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of electronic design automation and discloses a many-core chip distributed thermal modeling method based on a recurrent neural network. Dynamic thermal management can manage the temperature of many-core chips very efficiently, while a good many-core chip thermal modeling can help dynamic thermal management well. However, in the conventional many-core chip lumped thermal modeling, the calculation cost is exponentially increased as the number of chip cores is increased. In order to solve the problem of overlarge calculation cost of the lumped thermal model, the invention provides a many-core chip distributed thermal modeling method based on a cyclic neural network. The invention can simulate the temperature characteristic of many-core chips with quite high speed and high precision.

Description

Many-core chip distributed thermal modeling method based on recurrent neural network
Technical Field
The invention belongs to the field of electronic design automation, relates to the technical field of deep learning, and particularly relates to a many-core chip distributed thermal modeling method based on a recurrent neural network.
Background
As the feature size of chips continues to decrease with advances in semiconductor processing, commercial chips at 7nm have begun to be produced in volume by 2018. After the nano-scale chip enters the nano-scale, the dominant frequency of the chip is difficult to improve due to the influence of leakage current, so that the high performance development direction of the chip is to increase the core number of the chip instead of improving the dominant frequency, and remarkable effect is achieved.
The performance of many-core chips is greatly improved due to the increase of the number of cores, but the performance of many-core chips also brings serious chip thermal reliability problems, and the main reason for the problem is that the chip temperature is too high due to high power density.
To address the many-core die thermal reliability issue, an efficient and less costly solution to dynamic thermal management is proposed. The scheme is based on an automatic control theory, and the ideal temperature distribution is obtained through accurate estimation and real-time adjustment of power consumption. Dynamic thermal management techniques can ensure that temperatures are managed efficiently at a lower performance overhead, especially when the number of cores is small. However, when the number of core chips in the core network is too large, the size of the lumped thermal model is too large, and the calculation cost exponentially increases along with the increase of the number of the core chips, so that the processor performance cost caused by thermal management is too large.
In order to solve the problems, the public core chip distributed thermal modeling is one of the problems to be solved urgently at present.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a many-core chip distributed thermal modeling method based on a recurrent neural network. The modeling method decomposes the many-core chip thermal model into a plurality of small models, and the more extreme example is to establish a thermal model for each core of the many-core chip and carry out limited information exchange between the cores. The method first builds a recurrent neural network model and then trains the network through offline temperature and power data. The trained recurrent neural network can predict the temperature of each core on the chip. Performing thermal modeling on each core of the many-core chip, wherein the positions of the cores of the chips are different, and the thermal modeling is also different; given the position of a chip core, the temperature of the chip core can be calculated according to the power of the chip core and the temperature of surrounding cores; establishing a thermal model of each core by using a recurrent neural network, simulating a nonlinear function, and processing data of a vector sequence; the input layer of the recurrent neural network is subjected to weight value disassembly, an input weight value matrix is disassembled into two parts, one part is used for supplying power, and the other part is used for supplying temperature, so that the problem that the input has both power and temperature is solved; for each core of the many-core chip, the selection mode of the ambient temperature is fixed to reduce errors, the cores are not arranged from small to large according to the serial numbers of the cores, the cores are arranged right above the chips, and then the chips are rotated clockwise until the temperatures of all adjacent cores are taken, so that errors are not prone to occur when the outermost cores are processed, because the cores are adjacent to the external environment and the adjacent positions are different.
The invention adopts the following technical scheme to solve the problems:
step one, many-core chip thermal model parameters, mainly thermal capacitance and thermal resistance parameters on the whole chip, are extracted from Hotspot, and a many-core chip thermal model is established.
And step two, acquiring a plurality of groups of data (a plurality of time nodes, each node has power and temperature information of each core) by using the thermal models, and then making the plurality of groups of data into a training set and a verification set, wherein the training set is used for training the recurrent neural network, the verification set only verifies the trained neural network, and the data in the verification set is not used for training.
And step three, sending the training set into a circulating neural network model which is not trained (the weight matrix in the model is initialized randomly), so as to obtain the output of the circulating neural network. Because the recurrent neural network has not been trained, there is a large gap between the output of the recurrent neural network and the true output. In order to obtain an accurate recurrent neural network model, the weight matrix can be adjusted to make the temperature output of the weight matrix as close as possible to the output of the training set. Therefore, the goal becomes to minimize the loss function by adjusting the weight matrix of the recurrent neural network, the smaller the loss function, the closer the output of the recurrent neural network is to the true output.
And step four, optimizing the loss function by adopting a gradient descent optimization algorithm, setting a learning rate, calculating partial derivatives of the weight matrix of the circular neural network by the loss function respectively, and then carrying out iterative updating on the partial derivatives. And after multiple iterations, the training is finished until the loss function is not reduced or reaches the set maximum iteration number, and the loss function value at the moment, namely the training error, is recorded. Meanwhile, the verification set is sent to a trained recurrent neural network, iteration is not updated on the verification set, and only the loss function value, namely the verification error, is recorded. And then changing the number of the hidden layers and the number of neurons of the hidden layers, retraining a new model, and recording the training error and the verification error of the new model. Finally, from these models, the one with the smallest verification error is selected as the thermal model of the chip core. For each core on the many-core chip, a thermal model is trained for each core, and the cores are combined to form the many-core chip distributed thermal model.
Compared with the prior art, the invention has the beneficial results that: the recurrent neural network can effectively fit a nonlinear function, so that the many-core chip distributed thermal model established by using the recurrent neural network can simulate the temperature characteristics of the many-core chip with high accuracy and quick response speed.
Drawings
The invention is further illustrated with reference to the following figures and examples.
Fig. 1 is a layout and numbering of a 16-core chip and a position diagram of a 6 th core.
Fig. 2 is a thermal model structure of the 6 th core and a positional relationship with other adjacent cores.
Fig. 3 is a diagram of a recurrent neural network whose recurrent structure is from the output layer to the hidden layer.
FIG. 4 is a graph comparing a predicted temperature value and an actual temperature value for a 6 th nuclear thermal model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the examples of the present invention will be clearly and completely described below with reference to the accompanying drawings in the examples of the present invention, and it is obvious that the described examples are a part of examples of the present invention, but not all examples. All other examples, which can be obtained by a person skilled in the art without inventive step based on the examples of the present invention, are within the scope of the present invention.
Fig. 1 is a layout number of a 16-core chip and a position diagram of a 6 th core.
In the example of the invention, a 16-core many-core chip is provided, which is numbered as shown in fig. 1, and the distributed thermal model of the whole many-core chip is formed by modeling each core and then combining the models, namely the distributed thermal model of the whole many-core chip. Where the 6 th kernel is highlighted, the process of building the model will be explained taking the 6 th kernel as an example.
Fig. 2 is a thermal model structure of the 6 th core and a positional relationship with other adjacent cores.
The thermal modeling of the 6 th core and its positional relationship to other adjacent cores is depicted, where it is adjacent to cores 2, 5, 7, and 10, connected by thermal resistors; there is also a grounded thermal capacitor; there is also externally input power.
Fig. 3 is a structural diagram of a recurrent neural network, and in consideration of the non-linear effect, the present invention uses a neural network to build a thermal model for each core, and since the temperature has a series of values in time to form a vector sequence, the recurrent neural network is a neural network specialized in sequence modeling, which can well process data in the form of a vector sequence, and thus the recurrent neural network is finally used to build a thermal model for each core. Here, the input is Pi(k) And Ti_near(k) The power of the ith core and the temperature of the core adjacent to the ith core at the time k are respectively represented, and the state Hi(k) The method is also called a hidden layer in the recurrent neural network and represents the state of the ith core at the k moment; output Ti(k) Indicating the temperature of the ith nucleus at time k. WihIs a weight matrix, W, from the input layer to the hidden layerhoIs a weight matrix, W, from the hidden layer to the output layerohIs the weight matrix from the output layer to the hidden layer. In order to enable the recurrent neural network to better fit the function, the hidden layer may have multiple layers, and here is shown a cycle with only one hidden layerA recurrent neural network.
FIG. 4 is a graph comparing a predicted temperature value and an actual temperature value for a 6 th nuclear thermal model.
The trained thermal model of the 6 th core based on the recurrent neural network is used for predicting the temperature of the 6 th core, and it can be seen that the temperature predicted by the model can be well fitted with the real temperature.
The invention discloses a many-core chip distributed thermal model method based on a recurrent neural network, which is described in detail in the above examples, but the invention is not limited to the above examples, and the technical scheme described in the previous examples can be modified afterwards, so that the essence of the corresponding technical scheme does not depart from the spirit and scope of the technical scheme of each example of the invention.

Claims (1)

1. A many-core chip distributed thermal modeling method based on a recurrent neural network is characterized in that: performing thermal modeling on each core of the many-core chip, wherein the thermal modeling of the chip cores is different due to different positions of the chip cores; given the position of a chip core, the temperature of the chip core can be calculated according to the power of the chip core and the temperature of surrounding cores; establishing a thermal model of each core by using a recurrent neural network, simulating a nonlinear function, and processing data of a vector sequence; the input layer of the recurrent neural network is subjected to weight value disassembly, an input weight value matrix is disassembled into two parts, one part is used for supplying power, and the other part is used for supplying temperature, so that the problem that the input has both power and temperature is solved; for each core of the many-core chip, the selection mode of the ambient temperature is fixed to reduce errors, the cores are not arranged from small to large according to the serial numbers of the cores, the cores are arranged right above the chips, and then the chips are rotated clockwise until the temperatures of all adjacent cores are taken, so that errors are not prone to occur when the outermost cores are processed, because the cores are adjacent to the external environment and the adjacent positions are different.
CN201910148729.6A 2019-02-27 2019-02-27 Many-core chip distributed thermal modeling method based on recurrent neural network Expired - Fee Related CN109902807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910148729.6A CN109902807B (en) 2019-02-27 2019-02-27 Many-core chip distributed thermal modeling method based on recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910148729.6A CN109902807B (en) 2019-02-27 2019-02-27 Many-core chip distributed thermal modeling method based on recurrent neural network

Publications (2)

Publication Number Publication Date
CN109902807A CN109902807A (en) 2019-06-18
CN109902807B true CN109902807B (en) 2022-07-05

Family

ID=66945767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910148729.6A Expired - Fee Related CN109902807B (en) 2019-02-27 2019-02-27 Many-core chip distributed thermal modeling method based on recurrent neural network

Country Status (1)

Country Link
CN (1) CN109902807B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902807B (en) * 2019-02-27 2022-07-05 电子科技大学 Many-core chip distributed thermal modeling method based on recurrent neural network
CN113778655A (en) * 2020-06-09 2021-12-10 北京灵汐科技有限公司 Network precision quantification method and system
CN113467590B (en) * 2021-09-06 2021-12-17 南京大学 Many-core chip temperature reconstruction method based on correlation and artificial neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9015093B1 (en) * 2010-10-26 2015-04-21 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
CN106681453A (en) * 2016-11-24 2017-05-17 电子科技大学 Dynamic heat treatment method of high-performance multi-core microprocessor
GB201810314D0 (en) * 2018-06-22 2018-08-08 Moixa Energy Holdings Ltd Systems for machine learning, optimising and managing local multi-asset flexibility of distributed energy storage resources
CN108563949A (en) * 2018-04-16 2018-09-21 电子科技大学 For the duty mapping method of multi-core processor information security
CN109685207A (en) * 2018-11-27 2019-04-26 电子科技大学 A kind of multi core chip thermal management algorithm based on Recognition with Recurrent Neural Network
CN109902807A (en) * 2019-02-27 2019-06-18 电子科技大学 A kind of hot modeling method of many-core chip distribution formula based on Recognition with Recurrent Neural Network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8571847B2 (en) * 2010-01-08 2013-10-29 International Business Machines Corporation Efficiency of static core turn-off in a system-on-a-chip with variation
US10248675B2 (en) * 2013-10-16 2019-04-02 University Of Tennessee Research Foundation Method and apparatus for providing real-time monitoring of an artifical neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9015093B1 (en) * 2010-10-26 2015-04-21 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
CN106681453A (en) * 2016-11-24 2017-05-17 电子科技大学 Dynamic heat treatment method of high-performance multi-core microprocessor
CN108563949A (en) * 2018-04-16 2018-09-21 电子科技大学 For the duty mapping method of multi-core processor information security
GB201810314D0 (en) * 2018-06-22 2018-08-08 Moixa Energy Holdings Ltd Systems for machine learning, optimising and managing local multi-asset flexibility of distributed energy storage resources
CN109685207A (en) * 2018-11-27 2019-04-26 电子科技大学 A kind of multi core chip thermal management algorithm based on Recognition with Recurrent Neural Network
CN109902807A (en) * 2019-02-27 2019-06-18 电子科技大学 A kind of hot modeling method of many-core chip distribution formula based on Recognition with Recurrent Neural Network

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
A Fast Leakage-Aware Full-Chip Transient Thermal Estimation Method;Hai Wang;《IEEE Transactions on Computers》;20171201;第67卷(第5期);第617-630页 *
Distributed model predictive control for dynamic thermal management of multi-core systems;Lei Xu等;《2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)》;20170803;第1-3页 *
众核芯片分布式热建模与热管理技术研究;胥雷;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20180915(第9期);第I135-153页 *
众核芯片热建模与功耗管理技术研究;张明;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215(第2期);第I135-603页 *
基于循环神经网络的多核芯片热管理技术研究;郭星星;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200115(第1期);第I135-390页 *
基于深度学习的3DIC分布式热管理技术研究;肖涛;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115(第1期);第I135-389页 *
基于混合优化的多核处理器动态热管理方法;黄柯衡等;《电子工艺技术》;20180318;第39卷(第2期);第71-75+91页 *

Also Published As

Publication number Publication date
CN109902807A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN109902807B (en) Many-core chip distributed thermal modeling method based on recurrent neural network
CN116702678B (en) DTCO optimization method integrating deep learning and swarm intelligence algorithm
CN110083125B (en) Machine tool thermal error modeling method based on deep learning
CN110289987B (en) Multi-agent system network anti-attack capability assessment method based on characterization learning
CN115017850A (en) Digital integrated circuit optimization method
CN109670145A (en) Helicopter Main Reducer vibration performance threshold setting method based on observer
CN113760660A (en) Three-dimensional multi-core chip temperature prediction method and system based on long-time memory neural network
CN116151324A (en) RC interconnection delay prediction method based on graph neural network
CN113642236A (en) Multidisciplinary-oriented digital twin graph network modeling method
CN117057229A (en) Multi-objective optimization method based on deep reinforcement learning power module
CN117220278B (en) Long-term wind power prediction method considering non-stationary space-time dependence
CN116894180B (en) Product manufacturing quality prediction method based on different composition attention network
CN109816174B (en) Air route opening prediction method based on network evolution influence factors
CN114757111B (en) Intelligent equipment health management method based on digital twinning technology
KR102138227B1 (en) An apparatus for optimizing fluid dynamics analysis and a method therefor
CN114266349A (en) Load flow calculation method based on adaptive neural network
Qiu et al. Deep Gated Recurrent Unit Network for High-Speed Links Modeling
CN113033695A (en) Method for predicting faults of electronic device
CN111414724A (en) Method for optimizing circuit simulation
CN117668743B (en) Time sequence data prediction method of association time-space relation
CN112101631B (en) Product construction period prediction method based on recurrent neural network
KR102130096B1 (en) An apparatus for diagnosing analysis and a method thereof
CN114332723B (en) Video behavior detection method based on semantic guidance
CN116911245B (en) Layout method, system, equipment and storage medium of integrated circuit
CN113806996B (en) Grate cooler optimization method based on time memory modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220705