CN116225181A - Temperature adjusting method and electronic equipment - Google Patents

Temperature adjusting method and electronic equipment Download PDF

Info

Publication number
CN116225181A
CN116225181A CN202211605086.1A CN202211605086A CN116225181A CN 116225181 A CN116225181 A CN 116225181A CN 202211605086 A CN202211605086 A CN 202211605086A CN 116225181 A CN116225181 A CN 116225181A
Authority
CN
China
Prior art keywords
memory bank
triggering
frequency
controller
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211605086.1A
Other languages
Chinese (zh)
Inventor
唐亚军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202211605086.1A priority Critical patent/CN116225181A/en
Publication of CN116225181A publication Critical patent/CN116225181A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/20Cooling means
    • G06F1/206Cooling means comprising thermal management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Control Of Temperature (AREA)

Abstract

The application provides a temperature adjusting method and electronic equipment. The method comprises the following steps: acquiring the triggering frequency of the correctable error of the triggering of each memory bank in the electronic equipment; when detecting that the memory bank with the trigger frequency exceeding the threshold exists, controlling the heat radiating unit to perform cooling treatment. According to the method, the triggering frequency of the memory bank triggering correctable errors is used as the basis of temperature adjustment, and compared with the fact that the actual temperature of the memory bank is used as the basis of temperature adjustment, the current working state of the memory bank can be directly and truly represented, corresponding reasonable cooling treatment is facilitated, the probability of the correctable errors is reduced, and the influence on the running performance of electronic equipment caused by treatment and error correction is further reduced; further, uncorrectable errors caused by excessive triggering times of correctable errors can be avoided, the probability of downtime or restarting of the electronic equipment is reduced, and the operation reliability of the electronic equipment is improved.

Description

Temperature adjusting method and electronic equipment
Technical Field
The application relates to the technical field of electronic equipment, in particular to a temperature adjusting method and electronic equipment.
Background
With the improvement of the integration level of the memory of the electronic equipment, the data holding capacity of the capacitor in the memory storage unit is more sensitive to the temperature, and the probability of data inversion of a random address at high temperature is higher, namely the probability of correctable errors (Correctable Error, CE for short) is improved; the probability of this error at low temperatures is greatly reduced. Therefore, it is necessary to control the temperature of the memory within a reasonable range.
Currently, the methods for controlling the memory temperature are: the controller of the electronic equipment obtains the actual working temperature of each memory bank through a temperature sensor; the controller is pre-stored with a temperature threshold, and when the actual working temperature exceeds the preset temperature threshold, the controller cools the memory bank. Because the probabilities of triggering CEs at the same working temperature of different memory strips are different, the temperature control method can lead the working state of part of the memory strips to be poor and reduce the running performance of the electronic equipment.
Therefore, it is necessary to provide a temperature control method to ensure the working state of each memory bank and improve the operation performance of the electronic device.
Disclosure of Invention
The application provides a temperature adjusting method and electronic equipment, which are used for guaranteeing the working state of each memory bank and improving the operation performance of the electronic equipment.
In a first aspect, the present application provides a temperature adjustment method, applied to an electronic device, including: acquiring the triggering frequency of the correctable error of the triggering of each memory bank in the electronic equipment; when detecting that the memory bank with the trigger frequency exceeding the threshold exists, controlling the heat radiating unit to perform cooling treatment.
The triggering frequency of the memory bank triggering the correctable errors is used as the basis of temperature regulation, and the probability of the memory bank exceeding the threshold value that the memory bank generates the correctable errors is reduced through cooling treatment, so that the influence on the instantaneous performance of the system caused by treatment and error correction is reduced; further, the method is beneficial to avoiding uncorrectable errors (Uncorrectable Error, UCE for short) caused by excessive triggering times of correctable errors, and reduces the probability of downtime or restarting of electronic equipment. Compared with a method taking the actual temperature of the memory strip as the basis of temperature regulation, the method has the advantages that the error-correcting trigger frequency can more directly and truly represent the current working state of the memory strip, and is favorable for making corresponding reasonable cooling treatment.
In some embodiments, obtaining a trigger frequency of a trigger correctable error for each memory bank in an electronic device includes: and acquiring the trigger frequency of the correctable errors of the triggering of each memory bank through the controller.
The implementation mode of obtaining the memory bank trigger frequency by adopting the controller configured by the electronic equipment is beneficial to avoiding adding additional processing chips and improving the integration level of the electronic equipment.
In some embodiments, obtaining a trigger frequency of a trigger correctable error for each memory bank in an electronic device includes: and reading the triggering frequency of the correctable error of the triggering of each memory bank recorded in the register of the central processing unit, wherein the triggering frequency is obtained by monitoring each memory bank by the central processing unit.
The above provides an embodiment of obtaining the triggering frequency of the memory bank by using the cooperation of the central processing unit configured by the electronic device and the controller, wherein the triggering frequency is obtained by the central processing unit. No additional processing chip is needed, which is beneficial to improving the integration level of the electronic equipment. The reading time interval is not particularly limited, can be set according to practical application conditions, and is beneficial to realizing flexible monitoring of the working state of the memory bank.
In some embodiments, obtaining a trigger frequency for triggering a correctable error for each memory bank includes: reading the triggering times of the correctable errors of the triggering of each memory bank recorded in a register of the central processing unit, wherein the triggering times are obtained by monitoring each memory bank by the central processing unit; and acquiring the triggering frequency of the triggering correctable errors of each memory bank according to the increment of the triggering times in the time interval of two adjacent readings.
The above provides another implementation mode of obtaining the triggering frequency of the memory bank by using the cooperation of the central processing unit configured by the electronic equipment and the controller, wherein the central processing unit obtains the triggering times, and the controller calculates and obtains the triggering frequency. No additional processing chip is needed, which is beneficial to improving the integration level of the electronic equipment. The reading time interval is not particularly limited, can be set according to practical application conditions, and is beneficial to realizing flexible monitoring of the working state of the memory bank.
On the basis of the above embodiment, the time interval between any two adjacent readings is the same, which is a preset duration.
The controller is further limited to read at regular time according to the preset time, namely, the situation that the memory bank reports the CE is continuously monitored, so that the stable monitoring of the working state of the memory bank is realized.
In some embodiments, the electronic device includes a plurality of heat dissipating units, and at least one memory bank is disposed in a temperature adjusting area of the heat dissipating units; controlling the heat dissipation unit to perform cooling treatment, comprising: acquiring slot information of a memory bank with the trigger frequency exceeding a threshold value, wherein the slot information represents the position of the memory bank; and determining a heat radiating unit of the area where the memory bank is located based on the position of the memory bank, and controlling the heat radiating unit to cool down all the memory banks in the temperature adjusting area.
The temperature adjusting method under the condition that a plurality of radiating units exist is further limited, the memory strips are positioned through the slot position information, and the corresponding radiating units are controlled to conduct targeted cooling treatment on the area where the memory strips are located. Compared with the control of all the radiating units for cooling treatment, the cooling device is favorable for saving electricity.
In some embodiments, the heat dissipating unit comprises an air cooling unit; controlling the heat dissipation unit to perform cooling treatment, comprising: and (3) increasing the rotating speed of a fan of the air cooling unit so as to cool the memory bank.
The temperature regulating method of the electronic equipment adopting air cooling is provided.
In some embodiments, the heat dissipation unit comprises a liquid cooling unit; controlling the heat dissipation unit to perform cooling treatment, comprising: and (3) improving the flow rate of the refrigerant of the liquid cooling unit so as to cool the memory bank.
The above provides a temperature adjusting method for electronic equipment adopting liquid cooling.
The following is an electronic device corresponding to the temperature adjustment method provided in the first aspect of the present application, and the content and technical effects thereof may refer to the first aspect and will not be described in detail.
In a second aspect, the present application provides an electronic device, including a controller and a heat dissipating unit; the controller is in communication connection with the heat dissipation unit; the controller is used for acquiring the triggering frequency of the triggering correctable errors of each memory bank in the electronic equipment; the controller is also used for controlling the heat dissipation unit to conduct cooling treatment when the memory bank with the triggering frequency exceeding the threshold value is detected.
In some embodiments, the controller is communicatively coupled to the central processor; the controller is also used for reading the triggering frequency of the triggering correctable errors of each memory bank recorded in the register of the central processing unit, wherein the triggering frequency is obtained by monitoring each memory bank by the central processing unit.
In some embodiments, the controller is communicatively coupled to the central processor; the controller is also used for reading the triggering times of the correctable errors of the triggering of each memory bank recorded in the register of the central processing unit, wherein the triggering times are obtained by monitoring each memory bank by the central processing unit; the controller is also used for acquiring the triggering frequency of the triggering correctable errors of each memory bank according to the time interval of two adjacent readings and the increment of the triggering times.
In some embodiments, the time interval between any two adjacent reads is the same, which is a predetermined duration.
In some embodiments, the electronic device includes a plurality of heat dissipating units, and at least one memory bank is disposed in a temperature adjusting area of the heat dissipating units; the controller is specifically used for acquiring slot information of the memory bank with the triggering frequency exceeding a threshold value, wherein the slot information represents the position of the memory bank; the controller is specifically further configured to determine a heat dissipation unit in an area where the memory bank is located based on a position of the memory bank, and control the heat dissipation unit to perform cooling treatment on all the memory banks in the temperature adjustment area.
In some embodiments, the heat dissipating unit comprises an air cooling unit; the controller is particularly used for increasing the rotating speed of a fan of the air cooling unit so as to cool the memory bank.
In some embodiments, the heat dissipation unit comprises a liquid cooling unit; the controller is particularly used for improving the flow rate of the refrigerant of the liquid cooling unit so as to cool the memory bank.
The application provides a temperature adjusting method and electronic equipment, wherein the method comprises the following steps: acquiring the triggering frequency of the correctable error of the triggering of each memory bank in the electronic equipment; when detecting that the memory bank with the trigger frequency exceeding the threshold exists, controlling the heat radiating unit to perform cooling treatment. By taking the triggering frequency of the memory bank triggering correctable errors as the basis of temperature regulation, compared with the actual temperature of the memory bank as the basis of temperature regulation, the current working state of the memory bank can be directly and truly represented, and the corresponding reasonable cooling treatment is facilitated, so that the probability of the correctable errors is reduced, and the influence on the running performance of the electronic equipment caused by the treatment and correction is further reduced; further, uncorrectable errors caused by excessive triggering times of correctable errors can be avoided, the probability of downtime or restarting of the electronic equipment is reduced, and the operation reliability of the electronic equipment is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is an application scenario provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an exemplary temperature regulation method;
FIG. 3 is a flow chart of a temperature adjustment method according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 5 is a second schematic structural diagram of a server according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a temperature adjusting device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with aspects of the present application.
The method and the device are suitable for the electronic equipment provided with the memory bank. Alternatively, the electronic device is a server, a computer, or the like. Fig. 1 is an application scenario provided in the embodiments of the present application, where an electronic device is exemplified by a server. The motherboard is provided with a plurality of memory strips. Each memory bank is distributed with a plurality of memory particles and auxiliary chips. The memory granule is typically a dynamic random access memory (Dynamic Random Access Memory, DRAM for short). An auxiliary chip, such as a register clock driver (Register Clock Driver, RCD for short), may be used to improve signal integrity, particularly in a buffer for buffering address, command or control signals from a memory controller. The interface modes of the slots of the server Memory bank include Single Inline Memory Modules (SIMMs) and Dual Inline Memory Modules (DIMMs).
It should be noted that the structure of the memory stripe shown in fig. 1 is only used as an example, the number and the position distribution of the memory particles on the memory stripe are not limited in the embodiment of the present application, and whether the memory stripe is provided with the service chip is not limited in the embodiment of the present application.
With the continuous miniaturization of the memory process, the data retention capacity of the internal capacitor of the DRAM is more sensitive to temperature, and the probability of random address data inversion at high temperature is higher, namely the probability of correctable errors (Correctable Error, abbreviated as CE) is improved; the probability of this error at low temperatures is greatly reduced. Therefore, it is necessary to control the temperature of the memory within a reasonable range.
At present, the temperature adjusting method for the internal memory of the electronic equipment is as follows: based on monitoring the temperature detected by an independent or integrated temperature sensor on the memory strip, after the temperature reaches a certain threshold value, the refrigeration on the electronic equipment is enhanced. Specifically, the electronic device takes a server as an example, and fig. 2 is a schematic diagram of a typical architecture of a temperature adjustment method. As shown in fig. 2, the server includes a memory bank, on which a temperature sensor is provided, and the temperature sensor monitors an actual temperature of the memory bank. The controller is connected with the temperature sensor and is used for acquiring the actual working temperature of each memory bank; the controller is connected with the heat dissipation unit and used for enhancing refrigeration when the actual working temperature of the memory bank exceeds a preset temperature threshold.
On the one hand, as the memory technology is developed and updated, the requirement of each generation of memory bank on the working temperature changes, and in order to ensure the working state of each generation of memory, the value of the temperature threshold needs to be adaptively adjusted according to the type of the server memory, which brings a great deal of and tedious work tasks. On the other hand, the difference of the use time or the memory model can cause the working states of different memory strips at the same working temperature to be different, and the frequency of reporting CE by the different memory strips at the same temperature is different. If the same temperature threshold is used for each memory bank by the method shown in fig. 2, the working state of part of the memory banks may be poor, and the running performance of the server may be reduced.
Therefore, it is necessary to provide a temperature control method to ensure the working state of each memory bank and improve the operation performance of the electronic device. According to the temperature regulation method and the electronic equipment, the triggering frequency of the memory bank triggering correctable errors is used as the basis of temperature regulation, and compared with the fact that the actual temperature of the memory bank is used as the basis of temperature regulation, the current working state of the memory bank can be directly and truly represented, the temperature regulation method and the electronic equipment are suitable for each generation of memory bank, and are favorable for making corresponding reasonable cooling treatment so as to reduce the probability of the correctable errors.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Example 1
Fig. 3 is a flowchart of a temperature adjustment method according to an embodiment of the present disclosure. The method is applied to electronic equipment, and the electronic equipment can be equipment provided with memory strips, such as a server, a computer and the like. The subject of the method may be a temperature regulating device. The temperature regulating means may be implemented by a computer program, for example, application software or the like. The temperature adjusting device may be a medium storing a related computer program, for example, a usb disk, a cloud disk, etc.; still alternatively, the apparatus may be implemented by a physical apparatus, e.g., a chip or the like, into which the relevant computer program is integrated or installed. The following description will take an example in which an execution body is a temperature control device.
As shown in fig. 3, the method may include the steps of:
s100, acquiring the triggering frequency of the triggering correctable errors of each memory bank; acquiring the triggering frequency of the correctable error of the triggering of each memory bank in the electronic equipment;
and S200, when detecting that the memory bank with the trigger frequency exceeding the threshold exists, controlling the heat radiating unit to perform cooling treatment.
Specifically, in step S100, the correctable error CE is an error that can be corrected, and relatively the uncorrectable error UCE. The error here is embodied as a data flip of a random address. Taking the error checking and correcting () technique as an example, the technique can correct a 1-bit data flip error, and detect but not correct a 2-bit data flip error. It can be understood that a 1bit data flip error is a correctable error CE; the 2bits data flip error is an uncorrectable error UCE.
In practical application, after a 2bit error occurs in 64bit data transmitted at the same time in the X86 architecture, the system cannot correct, and downtime or restarting occurs. The probability of occurrence of 1bit error in 64bit data transmitted at the same time is reduced by reducing the probability of occurrence of 1bit error in 64bit data transmitted at the same time, namely the probability of occurrence of UCE in the memory is reduced by reducing the probability of occurrence of CE in the memory, and finally the downtime or restarting probability of electronic equipment is reduced.
It should be noted that the ECC technique is not effective for data flip errors of 3bits or more. The probability of data flip errors which are larger than or equal to 3bits is extremely low; secondly, if the data inversion error larger than or equal to 3bits occurs, relevant measures need to be taken to avoid the data inversion error larger than or equal to 3bits, instead of error correction by means of ECC technology.
Further, in step S100, the trigger frequency refers to the number of times the CE is reported by the memory bank in a unit time. After the memory reports the CE, it occupies a part of resources of the central processing unit (central processing unit, CPU for short) to repair the error, and the more times CE occurs in unit time, the greater the impact on the instantaneous performance of the system.
Further, in step S200, the frequency of the CE reported by the memory is used as a temperature speed regulation term. The frequency of the memory reporting CE may more directly reflect the current working state of the memory bank, and the working temperature is an important factor affecting the working state: the higher the temperature, the more times the memory generates CE in the same time, and when the CE reaches a certain value (threshold), the CE is reported to the operating system OS, so that under the condition that the set CE reporting threshold is unchanged, the higher the temperature, the higher the frequency of reporting CE by the memory. Based on the above, the temperature can be adjusted according to the frequency of the memory reporting CE, so as to maintain the frequency of the memory reporting CE within a preset threshold, and maintain the memory bank in a good working state.
Specifically, if the monitoring time is set to 2h, a first threshold value of the memory bank for reporting CE is set to 500, that is, when the CE value reaches 500, the first threshold value is reported to the OS, when the temperature is T1, the memory bank generates CE once, the register records the CE, when the number of times of generating CE reaches 500, the first threshold value is reported to the OS, and the register is cleared, if the number of times of reporting CE by the memory bank is 30 times in 2h, the frequency of reporting CE by the memory bank in 2h is: 30/2=15; when the temperature is increased to T2, the number of times of CE generation in the memory is increased in the same time due to the temperature increase, so when the temperature is T2, if the first threshold value of CE reporting by the memory bank is set to be unchanged, then, when the number of times of CE generation in the memory bank reaches 500 in 2 hours, assuming that the number of times of CE generation in the memory bank is 50, and when the temperature is T2, the frequency of CE reporting by the memory bank is 50/2=25, the running state of the memory bank can be monitored by monitoring the frequency of CE reporting by the memory bank, and when the frequency of CE reporting by the memory bank exceeds the second threshold value, the temperature can be considered to be too high at this time, and the temperature needs to be adjusted in time.
In one embodiment, the threshold value of the memory bank reporting CE may be set to 1, that is, the memory bank reports to the operating system every time the memory bank generates a CE, and the operation state of the memory bank may be monitored by monitoring the frequency of the memory bank reporting CE.
Compared with a method taking the actual temperature of the memory bank as the basis of temperature regulation, the embodiment of the invention can correct the false triggering frequency to more directly and truly represent the current working state of the memory bank, and is beneficial to making corresponding reasonable cooling treatment. Further, the probability of correctable errors of the memory bank exceeding the threshold value is reduced through cooling treatment, so that the influence on the instantaneous performance of the system caused by processing error correction can be reduced; further, the method is beneficial to avoiding uncorrectable errors (Uncorrectable Error, UCE for short) caused by excessive triggering times of correctable errors, and can reduce the probability of downtime or restarting of electronic equipment.
In some embodiments, step S100 obtains a trigger frequency of a correctable error of triggering each memory bank in the electronic device, including: and acquiring the trigger frequency of the correctable error of the trigger of each memory bank in the electronic equipment through the controller. For the electronic equipment configured with the controller, the controller can be adopted to record the number of times of reporting the CE to each memory bank, so as to obtain the number of times of triggering the CE by each memory bank in unit time, namely the memory bank triggering frequency. The controller configured by the electronic equipment is beneficial to avoiding adding additional processing chips and improving the integration level of the electronic equipment.
The electronic device may be a server; the controller may be an out-of-band management controller, and may specifically be a baseboard management controller (Baseboard Management Controller, abbreviated as BMC). For the server provided with the baseboard management controller BMC, the BMC can be used for obtaining the memory bank triggering frequency, so that the new addition of an additional processing chip is avoided, and the integration level of the server is improved.
In other embodiments, a special processing chip may be added to perform step S100, which is suitable for an electronic device without a controller. Alternatively, the processing chip may be a complex programmable logic device (Complex Programmable logic device, abbreviated as CPLD) or a field programmable gate array (Field Programmable GateArray, abbreviated as FPGA). In addition, the special processing chip is adopted, so that the emergency that the temperature adjustment cannot be performed due to the failure of the controller can be dealt with.
In some embodiments, step S100 obtains a trigger frequency of a correctable error of triggering each memory bank in the electronic device, which specifically includes:
s101, reading triggering times of correctable errors of triggering of each memory bank recorded in a register of a central processing unit, wherein the triggering times are obtained by monitoring each memory bank by the central processing unit;
S102, acquiring the triggering frequency of the triggering correctable errors of each memory bank according to the time interval of two adjacent readings and the increment of the triggering times in the adjacent time interval.
Alternatively, step S101 and step S102 may be performed by a controller configured by the electronic device itself, wherein the acquisition trigger frequency is monitored by the central processing unit CPU, and calculated by the controller from the trigger frequency and the time interval. No additional processing chip is needed, which is beneficial to improving the integration level of the electronic equipment; in addition, it can be executed by an additional special processing chip, wherein the processing chip can be a complex programmable logic device CPLD. In addition, the special processing chip is adopted, so that the emergency that the temperature adjustment cannot be performed due to the failure of the controller can be dealt with.
In other embodiments, step S100 obtains a trigger frequency of a correctable error of triggering each memory bank in the electronic device, which specifically includes:
s103, reading the triggering frequency of the correctable errors of the triggering of each memory bank recorded in the register of the central processing unit, wherein the triggering frequency is obtained by monitoring each memory bank by the central processing unit.
Optionally, step S103 may be performed by a controller configured by the electronic device itself, where the central processing unit CPU monitors and acquires the number of CE triggers and records the number of CE triggers in a register thereof, and the CPU may also calculate the CE trigger frequency of each memory bank according to the time interval between two adjacent reads and the increment of the number of triggers in the adjacent time interval, and record the trigger frequency of each CE in the MR register, where the controller may directly read the trigger frequency recorded by the central processing unit CPU. No additional processing chip is needed, which is beneficial to improving the integration level of the electronic equipment; in addition, it can be executed by an additional special processing chip, wherein the processing chip can be a complex programmable logic device CPLD. In addition, the special processing chip is adopted, so that the emergency that the temperature adjustment cannot be performed due to the failure of the controller can be dealt with.
In some embodiments, the electronic device is exemplified by a server, and the controller is exemplified by a baseboard management controller, BMC. Specifically, fig. 4 is a schematic structural diagram of a server provided in the embodiment of the present application, which illustrates an implementation manner in which a central processor of the server cooperates with a baseboard management controller BMC to obtain a memory bank trigger frequency.
As shown in fig. 4, the server includes a plurality of memory banks: memory bank 1, memory bank 2, … …, memory bank N, central processing unit CPU, baseboard management controller BMC, and heat dissipation unit. The BMC is connected with a CPU and a heat dissipation unit, and the CPU is connected with a plurality of memory strips. The CPU is used for monitoring each memory bank, obtaining the triggering times of the correctable error CE triggered by each memory bank, and storing the triggering times in the register. The BMC reads the register of the CPU to obtain the triggering times of each memory bank.
In a possible implementation, the register continuously accumulates the trigger times after power-up, i.e. the register records the accumulated value of the trigger times. After the BMC obtains the trigger times, the newly increased times of the trigger times of the current reading compared with the last reading are further calculated, and then the newly increased times are divided by the interval time of the two readings, so that the frequency of reporting the CE by the memory bank in the time is obtained. In another possible implementation, the BMC clears the register after obtaining the trigger number of each memory bank per read. Thus, the register records the newly increased number of times after the last reading, and the BMC can directly divide the trigger number of each reading by the interval duration of the two readings to obtain the frequency of reporting the CE by the memory bank in the time.
The above-mentioned baseboard management controller BMC configured by the server itself executes steps S101 and S102, and uses the CPU of the server itself to execute the monitoring of the memory bank, without adding additional processing chips, which is beneficial to improving the integration level of the server.
On the basis of the above embodiment, the time interval between any two adjacent readings is the same, which is a preset duration. That is, the BMC periodically reads the number of times each memory bank recorded by the register of the CPU reports CE. In the process of calculating the trigger frequency, the trigger frequency is divided by a preset time interval, and the time interval between two adjacent readings is not required to be monitored, so that the operation steps are saved. Furthermore, the continuous and stable monitoring of the working state of the memory bank is realized.
In still other embodiments, fig. 5 is a schematic diagram of a second structure of a server according to an embodiment of the present application, which illustrates an implementation manner in which temperature and a correctable false triggering frequency are simultaneously used as a basis for temperature adjustment. As shown in fig. 5, taking a memory bank as an example, a temperature sensor is disposed on the memory bank, the temperature sensor is connected with a baseboard management controller BMC, and the baseboard management controller BMC monitors the actual working temperature of the memory bank through the temperature sensor.
The server also includes a Mode Register (MR) of the central processing unit, i.e., a CPU MR Register shown in fig. 5, which stores the number of triggers for triggering the correctable errors per memory bank. The baseboard management controller BMC is connected with the CPU MR register and is used for reading the triggering times of the memory bank triggering correctable errors and obtaining the triggering frequency of the memory bank triggering correctable errors according to the triggering times.
Considering that the probability of occurrence of a correctable error is high at high temperature, the probability of occurrence of a correctable error is low at low temperature. Therefore, optionally, according to the current temperature, the time interval of the BMC reading the CPU MR register is adjusted, so that the energy consumption can be reduced to a certain extent. In a possible implementation manner, the temperature threshold and the trigger frequency threshold are set at the same time, and when the temperature reaches the temperature threshold or the trigger frequency reaches the trigger frequency threshold, the BMC controls the heat dissipation unit to enhance refrigeration. The above method of setting two determination conditions for temperature adjustment may be more reliable than a single determination condition.
On the basis of the above embodiment, the method of temperature adjustment according to temperature and the method of temperature adjustment according to correctable false triggering frequency may be further combined in depth. For example, the BMC read time interval is adjusted according to the current temperature. Specifically, when the current temperature is low, the time interval between two adjacent times of reading of the BMC can be prolonged, and the BMC does not need to monitor the working state of the memory bank frequently because the probability of occurrence of correctable errors is low under the condition of low temperature. When the current temperature is higher, the time interval between two adjacent times of reading of the BMC can be shortened, and because the probability of occurrence of correctable errors is higher under the condition of higher temperature, the BMC can improve the monitoring frequency of the working state of the memory bank. The arrangement is beneficial to improving the reliability of the temperature regulation method; adjusting the read time interval in dependence on temperature is also advantageous for reducing power consumption.
In some embodiments, the electronic device is cooled by air cooling, i.e., the heat dissipation unit includes an air cooling unit. In step S200, the cooling process performed by the heat dissipation unit is controlled, including: and (3) increasing the rotating speed of a fan of the air cooling unit so as to cool the memory bank.
In still other embodiments, the electronic device employs liquid cooling, i.e., the heat sink unit includes a liquid cooling unit. In step S200, the cooling process performed by the heat dissipation unit is controlled, including: the flow rate of the refrigerant (such as working medium water) of the liquid cooling unit is increased or the temperature of the refrigerant is further reduced so as to cool the memory bank.
In addition, as shown in fig. 5, if the electronic device has two cooling configurations, i.e., air cooling and liquid cooling, one cooling mode may be selected, or two cooling modes cooperate to cool.
In some embodiments, the electronic device includes a plurality of heat dissipating units, and at least one memory bank is disposed in a temperature adjusting area of the heat dissipating units; in step S200, the cooling process performed by the heat dissipation unit is controlled, including:
s201, acquiring slot information of a memory bank with the trigger frequency exceeding a threshold value, wherein the slot information represents the position of the memory bank;
s202, determining a heat dissipation unit of the area where the memory bank is located based on the position of the memory bank, and controlling the heat dissipation unit to cool down all the memory banks in the temperature adjustment area.
Specifically, a plurality of memory strips are distributed on the electronic equipment main board, and dividing areas are arranged according to the physical distribution of the memory strips, and each area is correspondingly provided with a heat dissipation unit for cooling. The arrangement can avoid the difference of cooling effects caused by the distance between the memory strip and the radiating unit, and ensure good and even cooling effects.
Optionally, the slot information in step S201 is used to characterize the location of the memory banks, where each slot corresponds to one memory bank. Specifically, a server and a baseboard management controller BMC are taken as examples. The BMC can acquire slot information through the CPU MR register, wherein the CPU MR register is recorded with the slot information and the correctable error triggering times of the memory bank corresponding to the slot information.
Further, the BMC periodically reads the CPU MR register, acquires the times of the memory reporting CE and the error reporting memory slot position information, and analyzes the information into the frequency of the memory bar fixed period reporting CE on each slot position; judging whether the frequency of the memory reporting CE exceeds a set threshold value one by one according to the single slot position; when detecting that the frequency of reporting CE in a certain slot memory exceeds a set threshold value, the BMC controls the air cooling unit or the liquid cooling unit, so as to increase the fan rotating speed or the cooling liquid flow rate of the area where the slot memory is located and reduce the temperature of the slot memory. Above-mentioned through the slot position information to the memory bank location, control corresponding heat dissipation unit carries out the targeted cooling to the region that this memory bank is located and handles, compares in controlling all heat dissipation unit and carries out the cooling and handle, is favorable to reducing the consumption.
In addition, it can be understood that the cooling effect of the heat dissipating unit is regional, and when the heat dissipating unit performs cooling, all memory banks in the temperature adjusting area of the heat dissipating unit are cooled.
The temperature regulation method provided by the embodiment of the application comprises the following steps: acquiring the triggering frequency of the correctable error of the triggering of each memory bank in the electronic equipment; when detecting that the memory bank with the trigger frequency exceeding the threshold exists, controlling the heat radiating unit to perform cooling treatment. By taking the triggering frequency of the memory bank triggering correctable errors as the basis of temperature regulation, compared with the actual temperature of the memory bank as the basis of temperature regulation, the current working state of the memory bank can be directly and truly represented, and the corresponding reasonable cooling treatment is facilitated, so that the probability of the correctable errors is reduced, and the influence on the running performance of the electronic equipment caused by the treatment and correction is further reduced; further, uncorrectable errors caused by excessive triggering times of correctable errors can be avoided, the probability of downtime or restarting of the electronic equipment is reduced, and the operation reliability of the electronic equipment is improved.
Example two
The embodiment of the application provides electronic equipment, which comprises a controller and a heat dissipation unit; the controller is in communication connection with the heat dissipation unit; the controller is used for acquiring the triggering frequency of the triggering correctable errors of each memory bank in the electronic equipment; the controller is also used for controlling the heat dissipation unit to conduct cooling treatment when the memory bank with the triggering frequency exceeding the threshold value is detected.
In particular, a correctable error CE is an error that can be corrected, and relatively an uncorrectable error UCE. The error here is embodied as a data flip of a random address. Taking an error checking and correcting (Error Checking and Correcting, abbreviated as ECC) technique as an example, the technique can correct a 1-bit data flip error, and detect but not correct a 2-bit data flip error. The data inversion error of 1bit can be understood as the data inversion error of 1 bit; the 2bits data flip error is an uncorrectable error UCE.
In practical application, after a 2bit error occurs in 64bit data transmitted at the same time in the X86 architecture, the system cannot correct, and downtime or restarting occurs. The probability of occurrence of 1bit error in 64bit data transmitted at the same time is reduced by reducing the probability of occurrence of 1bit error in 64bit data transmitted at the same time, namely the probability of occurrence of UCE in the memory is reduced by reducing the probability of occurrence of CE in the memory, and finally the downtime or restarting probability of electronic equipment is reduced.
It should be noted that the ECC technique is not effective for data flip errors of 3bits or more. The probability of data flip errors which are larger than or equal to 3bits is extremely low; secondly, if the data inversion error larger than or equal to 3bits occurs, relevant measures need to be taken to avoid the data inversion error larger than or equal to 3bits, instead of error correction by means of ECC technology.
Further, the trigger frequency refers to the number of times the memory reports CE in a unit time. After the memory reports the CE, it occupies a part of resources of the central processing unit (central processing unit, CPU for short) to repair the error, and the more times CE occurs in unit time, the greater the impact on the instantaneous performance of the system.
Further, the frequency of the memory reporting CE is used as a temperature speed regulation item. The frequency of the memory reporting CE may more directly reflect the current working state of the memory bank, and the working temperature is an important factor affecting the working state: the higher the temperature, the higher the frequency with which the memory reports CE. Therefore, the temperature can be adjusted according to the frequency of the memory reporting CE, so that the frequency of the memory reporting CE is maintained within a preset range, and the memory bank is kept in a good working state.
According to the embodiment of the application, the triggering frequency capable of correcting errors is used as the basis of temperature regulation, so that the current working state of the memory bank can be more directly and truly represented, and corresponding reasonable cooling treatment is facilitated. Further, the probability of correctable errors of the memory bank exceeding the threshold value is reduced through cooling treatment, so that the influence on the instantaneous performance of the system caused by processing error correction can be reduced; further, the method is beneficial to avoiding uncorrectable errors (Uncorrectable Error, UCE for short) caused by excessive triggering times of correctable errors, and can reduce the probability of downtime or restarting of electronic equipment.
In some embodiments, the controller is communicatively coupled to the central processor; the controller is specifically used for reading the triggering times of the correctable errors of the triggering of each memory bank recorded in the register of the central processing unit, wherein the triggering times are obtained by monitoring each memory bank by the central processing unit; the controller is specifically further configured to obtain a trigger frequency of the correctable error of the trigger of each memory bank according to the time interval between two adjacent readings and the increment of the trigger number.
In the method, a Central Processing Unit (CPU) monitors and acquires the trigger times, and a controller calculates and acquires the trigger frequency according to the trigger times and time intervals. The BMC and the CPU configured by the electronic equipment are utilized to execute the operation, and no additional processing chip is needed, so that the integration level of the electronic equipment is improved.
In other embodiments, the controller is specifically configured to read a trigger frequency of a correctable error of triggering of each memory bank recorded in a register of the central processing unit, where the trigger frequency is obtained by monitoring each memory bank by the central processing unit.
In the method, the CPU monitors and acquires the trigger frequency, and the controller directly reads the trigger frequency recorded by the CPU. The BMC and the CPU configured by the electronic equipment are utilized to execute the operation, and no additional processing chip is needed, so that the integration level of the electronic equipment is improved.
In some embodiments, the electronic device is exemplified by a server, and the controller is exemplified by a baseboard management controller, BMC. Specifically, as shown in fig. 4, fig. 4 is a schematic structural diagram of a server provided in an embodiment of the present application, which illustrates an implementation manner in which a central processor of the server cooperates with a baseboard management controller BMC to obtain a memory bank trigger frequency. As shown in fig. 4, the server includes a plurality of memory banks: memory bank 1, memory bank 2, … …, memory bank N, central processing unit CPU, baseboard management controller BMC, and heat dissipation unit. The BMC is connected with a CPU and a heat dissipation unit, and the CPU is connected with a plurality of memory strips. The CPU is used for monitoring each memory bank, obtaining the triggering times of the correctable error CE triggered by each memory bank, and storing the triggering times in the register. The BMC reads the register of the CPU to obtain the triggering times of each memory bank.
In a possible implementation, the register continuously accumulates the trigger times after power-up, i.e. the register records the accumulated value of the trigger times. After the BMC obtains the trigger times, the newly increased times of the trigger times of the current reading compared with the last reading are further calculated, and then the newly increased times are divided by the interval time of the two readings, so that the frequency of reporting the CE by the memory bank in the time is obtained. In another possible implementation, the BMC clears the register after obtaining the trigger number of each memory bank per read. Thus, the register records the newly increased number of times after the last reading, and the BMC can directly divide the trigger number of each reading by the interval duration of the two readings to obtain the frequency of reporting the CE by the memory bank in the time.
On the basis of the above embodiment, the time interval between any two adjacent readings is the same, which is a preset duration. That is, the BMC periodically reads the number of times each memory bank recorded by the register of the CPU reports CE. In the process of calculating the trigger frequency, the trigger frequency is divided by a preset time interval, and the time interval between two adjacent readings is not required to be monitored, so that the operation steps are saved. Furthermore, the continuous and stable monitoring of the working state of the memory bank is realized.
In still other embodiments, fig. 5 is a schematic diagram of a second structure of a server according to an embodiment of the present application. Embodiments are presented in which temperature and correctable false trigger frequency are both the basis for temperature regulation. As shown in fig. 5, taking a memory bank as an example, a temperature sensor is disposed on the memory bank, the temperature sensor is connected with a baseboard management controller BMC, and the baseboard management controller BMC monitors the actual working temperature of the memory bank through the temperature sensor.
The server also includes a Mode Register (MR) of the central processing unit, i.e., a CPU MR Register shown in fig. 5, which stores the number of triggers for triggering the correctable errors per memory bank. The baseboard management controller BMC is connected with the CPU MR register and is used for reading the triggering times of the memory bank triggering correctable errors and obtaining the triggering frequency of the memory bank triggering correctable errors according to the triggering times.
Considering that the probability of occurrence of a correctable error is high at high temperature, the probability of occurrence of a correctable error is low at low temperature. Therefore, optionally, according to the current temperature, the time interval of the BMC reading the CPU MR register is adjusted, so that the energy consumption can be reduced to a certain extent. In a possible implementation manner, the temperature threshold and the trigger frequency threshold are set at the same time, and when the temperature reaches the temperature threshold or the trigger frequency reaches the trigger frequency threshold, the BMC controls the heat dissipation unit to enhance refrigeration. The above method of setting two determination conditions for temperature adjustment may be more reliable than a single determination condition.
On the basis of the above embodiment, the method of temperature adjustment according to temperature and the method of temperature adjustment according to correctable false triggering frequency may be further combined in depth. For example, the BMC read time interval is adjusted according to the current temperature. Specifically, when the current temperature is low, the time interval between two adjacent times of reading of the BMC can be prolonged, and the BMC does not need to monitor the working state of the memory bank frequently because the probability of occurrence of correctable errors is low under the condition of low temperature. When the current temperature is higher, the time interval between two adjacent times of reading of the BMC can be shortened, and because the probability of occurrence of correctable errors is higher under the condition of higher temperature, the BMC can improve the monitoring frequency of the working state of the memory bank. The arrangement is beneficial to improving the reliability of the temperature regulation method; adjusting the read time interval in dependence on temperature is also advantageous for reducing power consumption.
In some embodiments, the heat dissipating unit comprises an air cooling unit; the baseboard management controller BMC is specifically used for increasing the fan rotating speed of the air cooling unit so as to cool the memory bank.
In still other embodiments, the heat dissipation unit comprises a liquid cooling unit; the baseboard management controller BMC is specifically configured to increase the flow rate of the refrigerant of the liquid cooling unit so as to cool the memory bank.
In addition, as shown in fig. 5, if the electronic device has two cooling configurations, i.e., air cooling and liquid cooling, one cooling mode may be selected, or two cooling modes cooperate to cool.
In some embodiments, the electronic device includes a plurality of heat dissipating units, and at least one memory bank is disposed in a temperature adjusting area of the heat dissipating units; the controller is specifically used for acquiring slot information of the memory bank with the triggering frequency exceeding a threshold value, wherein the slot information represents the position of the memory bank; the controller is specifically further configured to determine a heat dissipation unit in an area where the memory bank is located based on a position of the memory bank, and control the heat dissipation unit to perform cooling treatment on all the memory banks in the temperature adjustment area.
Specifically, a plurality of memory strips are distributed on the electronic equipment main board, and dividing areas are arranged according to the physical distribution of the memory strips, and each area is correspondingly provided with a heat dissipation unit for cooling. The arrangement can avoid the difference of cooling effects caused by the distance between the memory strip and the radiating unit, and ensure good and even cooling effects.
Optionally, the slot information is used to characterize the location of the memory banks, and each slot corresponds to one memory bank. Specifically, a server and a baseboard management controller BMC are taken as examples. The BMC can acquire slot information through the CPU MR register, wherein the CPU MR register is recorded with the slot information and the correctable error triggering times of the memory bank corresponding to the slot information.
Further, the BMC periodically reads the CPU MR register, acquires the times of reporting CE by the memory and the information of the error reporting memory slot position, and analyzes the information into the frequency of reporting CE in each slot position fixed period; judging whether the frequency of the memory reporting CE exceeds a set threshold value one by one according to the single slot position; when detecting that the frequency of reporting CE in a certain slot memory exceeds a set threshold value, the BMC controls the air cooling unit or the liquid cooling unit, so as to increase the fan rotating speed or the cooling liquid flow rate of the area where the slot memory is located and reduce the temperature of the slot memory. Above-mentioned through the slot position information to the memory bank location, control corresponding heat dissipation unit carries out the targeted cooling to the region that this memory bank is located and handles, compares in controlling all heat dissipation unit and carries out the cooling and handle, is favorable to reducing the consumption.
In addition, it can be understood that the cooling effect of the heat dissipating unit is regional, and when the heat dissipating unit performs cooling, all memory banks in the temperature adjusting area of the heat dissipating unit are cooled.
The electronic device provided by the embodiment of the application comprises: comprises a controller and a heat dissipation unit; the controller is in communication connection with the heat dissipation unit 5; the controller is used for acquiring the triggering frequency of the triggering correctable errors of each memory bank in the electronic equipment; control of
And the controller is also used for controlling the heat dissipation unit to perform cooling treatment when the memory bank with the triggering frequency exceeding the threshold value is detected. According to the embodiment of the application, the triggering frequency of the memory bank triggering correctable errors is used as the basis of temperature adjustment, and compared with the actual temperature of the memory bank is used as the basis of temperature adjustment, the current working state of the memory bank can be directly and truly represented, so that the method is beneficial
The corresponding reasonable cooling treatment is carried out to reduce the probability of occurrence of correctable errors, so as to reduce the influence on the running performance of the server caused by 0 of treatment and error correction; further, the method is beneficial to avoiding occurrence caused by excessive number of correctable false triggers
And the uncorrectable errors are generated, the probability of downtime or restarting of the server is reduced, and the running reliability of the server is improved.
Example III
The following is a temperature adjusting device corresponding to the temperature adjusting method provided in the first embodiment of the present application, and the content and effect 5 of the temperature adjusting device may refer to the first embodiment, and will not be described again.
Fig. 6 is a schematic structural diagram of a temperature adjusting device provided in an embodiment of the present application, which is applied to an electronic device, where the electronic device may be a device or apparatus configured with a memory bank, such as a server, a computer, or the like. As shown in fig. 6, the electronic device is exemplified by a server. The temperature adjusting device includes:
an acquiring module 10, configured to acquire a trigger frequency of a trigger correctable error of each memory bank in the electronic device; a 0 processing module 20 for controlling the heat dissipation unit to enter when detecting that the memory bank with the trigger frequency exceeding the threshold exists
And (5) cooling.
In some embodiments, the acquiring module 10 is specifically configured to acquire, by the controller, a trigger frequency of the correctable error of the triggering of each memory bank in the electronic device.
In some embodiments, the acquiring module 10 is specifically configured to read a trigger frequency of a trigger correctable error of 5 memory stripes of each memory recorded in a register of the central processing unit, where the trigger frequency is obtained by monitoring each memory stripe by the central processing unit.
In other embodiments, the acquiring module 10 is specifically configured to read the triggering times of the correctable errors of the triggering of each memory bank recorded in the register of the central processing unit, where the triggering times are obtained by monitoring each memory bank by the central processing unit;
The acquiring module 10 is specifically further configured to acquire a trigger frequency of a trigger correctable error of each memory bank according to an increment of the trigger number 0 in the time interval between two adjacent readings.
In some embodiments, the time interval between any two adjacent reads is the same, which is a predetermined duration.
In some embodiments, the server includes a plurality of heat dissipating units, and at least one memory bank is disposed in a temperature adjusting area of the heat dissipating units; the processing module 20 is specifically configured to obtain slot information of a memory bank with a trigger frequency exceeding a threshold, where the slot information characterizes a location of the memory bank; the processing module 20 is specifically further configured to determine a heat dissipation unit in an area where the memory bank is located based on the location of the memory bank, and control the heat dissipation unit to perform cooling processing on all the memory banks in the temperature adjustment area.
In some embodiments, the heat dissipating unit comprises an air cooling unit; the processing unit 20 is specifically configured to increase the fan speed of the air cooling unit to perform cooling processing on the memory bank.
In some embodiments, the heat dissipation unit comprises a liquid cooling unit; the processing unit 20 is specifically configured to increase the flow rate of the refrigerant of the liquid cooling unit to perform cooling processing on the memory bank.
According to the embodiment of the application, the triggering frequency of the memory bank triggering the correctable errors is used as the basis of temperature adjustment, and compared with the fact that the actual temperature of the memory bank is used as the basis of temperature adjustment, the current working state of the memory bank can be directly and truly represented, and accordingly reasonable cooling treatment is facilitated, the probability of the correctable errors is reduced, and the influence on the running performance of the electronic equipment caused by treatment and error correction is further reduced; further, uncorrectable errors caused by excessive triggering times of correctable errors can be avoided, the probability of downtime or restarting of the electronic equipment is reduced, and the operation reliability of the electronic equipment is improved.
Example IV
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where, as shown in fig. 7, the electronic device includes:
a processor 291, the electronic device further comprising a memory 292; a communication interface (Communication Interface) 293 and bus 294 may also be included. The processor 291, the memory 292, and the communication interface 293 may communicate with each other via the bus 294. Communication interface 293 may be used for information transfer. The processor 291 may call logic instructions in the memory 292 to perform the methods of the method embodiments described above. Specifically, the electronic device may be a controller, a control chip, or a device or apparatus such as a server, a computer, or the like including the controller.
Further, the logic instructions in memory 292 described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product.
The memory 292 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and program instructions/modules corresponding to the methods in the embodiments of the present application. The processor 291 executes functional applications and data processing by running software programs, instructions and modules stored in the memory 292, i.e., implements the methods of the method embodiments described above.
Memory 292 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. Further, memory 292 may include high-speed random access memory, and may also include non-volatile memory.
The present application provides a computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of the foregoing method embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. A temperature adjustment method applied to an electronic device, comprising:
acquiring the triggering frequency of the correctable error of the triggering of each memory bank in the electronic equipment;
and when the memory bank with the triggering frequency exceeding the threshold value is detected, controlling the heat radiating unit to perform cooling treatment.
2. The method of claim 1, wherein the obtaining a trigger frequency of a trigger-correctable error for each memory bank in the electronic device comprises:
And acquiring the triggering frequency of the correctable error of the triggering of each memory bank in the electronic equipment through the controller.
3. The method according to claim 1 or 2, wherein the obtaining a trigger frequency of the trigger-correctable error of each memory bank in the electronic device includes:
and reading the triggering frequency of the correctable errors of the triggering of each memory bank recorded in a register of the central processing unit, wherein the triggering frequency is obtained by monitoring each memory bank by the central processing unit.
4. The method according to claim 1 or 2, wherein the obtaining a trigger frequency of the trigger-correctable error of each memory bank in the electronic device includes:
reading the triggering times of the correctable errors of the triggering of each memory bank recorded in a register of a central processing unit, wherein the triggering times are obtained by monitoring each memory bank by the central processing unit;
and acquiring the triggering frequency of the triggering correctable errors of each memory bank according to the time interval of two adjacent readings and the increment of the triggering times in the time interval.
5. The method of claim 4, wherein the time interval between any two adjacent reads is the same, and is a predetermined duration.
6. The method of claim 1, wherein the electronic device comprises a plurality of the heat dissipating units, and at least one memory bank is disposed in a temperature adjustment area of the heat dissipating units;
the cooling control treatment is carried out on the heat dissipation unit, and the cooling control treatment comprises the following steps:
acquiring slot position information of the memory bank with the triggering frequency exceeding the threshold value, wherein the slot position information represents the position of the memory bank;
and determining the heat radiating unit of the area where the memory bank is located based on the position of the memory bank, and controlling the heat radiating unit to cool down all the memory banks in the temperature adjusting area.
7. The method of any one of claims 1-6, wherein the heat dissipating unit comprises an air cooling unit;
the cooling control treatment is carried out on the heat dissipation unit, and the cooling control treatment comprises the following steps:
and increasing the rotating speed of a fan of the air cooling unit so as to cool the memory bank.
8. The method of any one of claims 1-6, wherein the heat dissipation unit comprises a liquid cooling unit; the cooling control treatment is carried out on the heat dissipation unit, and the cooling control treatment comprises the following steps:
and increasing the flow rate of the refrigerant of the liquid cooling unit so as to cool the memory bank.
9. An electronic device is characterized by comprising a controller and a heat dissipation unit; the controller is in communication connection with the heat dissipation unit;
the controller is used for acquiring the triggering frequency of the triggering correctable errors of each memory bank in the electronic equipment;
and the controller is also used for controlling the heat dissipation unit to conduct cooling treatment when the memory bank with the trigger frequency exceeding a threshold value is detected.
10. The electronic device of claim 9, wherein the controller is communicatively coupled to a central processor;
the controller is specifically configured to read the trigger frequency of the correctable error of the trigger of each memory bank recorded in the register of the central processing unit, where the trigger frequency is obtained by monitoring each memory bank by the central processing unit.
11. The electronic device of claim 9, wherein the controller is communicatively coupled to a central processor;
the controller is specifically configured to read the triggering times of the correctable errors of triggering of each memory bank recorded in the register of the central processing unit, where the triggering times are obtained by monitoring each memory bank by the central processing unit;
The controller is specifically further configured to obtain a trigger frequency of the correctable error of the triggering of each memory bank according to a time interval between two adjacent readings and the increment of the trigger frequency.
12. The electronic device of claim 11, wherein the time interval between any two adjacent readings is the same, and is a predetermined duration.
13. The electronic device of claim 9, wherein the electronic device comprises a plurality of the heat dissipating units, and at least one memory bank is disposed in a temperature adjusting area of the heat dissipating units;
the controller is specifically configured to obtain slot information of the memory bank with the trigger frequency exceeding the threshold, where the slot information characterizes a position of the memory bank;
the controller is specifically further configured to determine the heat dissipation unit in the area where the memory bank is located based on the position of the memory bank, and control the heat dissipation unit to perform cooling treatment on all the memory banks in the temperature adjustment area.
14. The electronic device of any one of claims 9-13, wherein the heat dissipation unit comprises an air cooling unit;
the controller is specifically configured to increase a rotational speed of a fan of the air cooling unit to cool the memory bank.
15. The electronic device of any one of claims 9-13, wherein the heat dissipation unit comprises a liquid cooling unit;
the controller is specifically configured to increase a flow rate of the refrigerant of the liquid cooling unit to cool the memory bank.
CN202211605086.1A 2022-12-14 2022-12-14 Temperature adjusting method and electronic equipment Pending CN116225181A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211605086.1A CN116225181A (en) 2022-12-14 2022-12-14 Temperature adjusting method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211605086.1A CN116225181A (en) 2022-12-14 2022-12-14 Temperature adjusting method and electronic equipment

Publications (1)

Publication Number Publication Date
CN116225181A true CN116225181A (en) 2023-06-06

Family

ID=86575709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211605086.1A Pending CN116225181A (en) 2022-12-14 2022-12-14 Temperature adjusting method and electronic equipment

Country Status (1)

Country Link
CN (1) CN116225181A (en)

Similar Documents

Publication Publication Date Title
US7233538B1 (en) Variable memory refresh rate for DRAM
KR102385766B1 (en) Method and apparatus for performing power analytics of a storage system
US8184422B2 (en) Overheat detection in thermally controlled devices
US7305518B2 (en) Method and system for dynamically adjusting DRAM refresh rate
US20030135794A1 (en) Method for apparatus for tracking errors in a memory system
US6882963B1 (en) Computer system monitoring
US8255740B2 (en) Multi-level DIMM error reduction
US11232848B2 (en) Memory module error tracking
US20040215912A1 (en) Method and apparatus to establish, report and adjust system memory usage
CN116382439A (en) Control method and device of server fan, computer equipment and storage medium
US11513933B2 (en) Apparatus with temperature mitigation mechanism and methods for operating the same
US20110093132A1 (en) Platform-independent thermal management of components in electronic devices
US20200379525A1 (en) Computer system with thermal performance mechanism and method of operation thereof
TW202242656A (en) A system for adaptively adjusting a thermal ceiling and method thereof
US9996276B2 (en) Memory system, memory controller and control device
US9940192B2 (en) Non-volatile semiconductor storage apparatus
CN116225181A (en) Temperature adjusting method and electronic equipment
JP2014059831A (en) Memory refresh device, information processing system, memory refresh method, and computer program
US20200111539A1 (en) Information processing apparatus for repair management of storage medium
JP4941051B2 (en) Memory control method, memory system, and program
US20200011339A1 (en) Device and method for fan speed control
US20220012141A1 (en) Solid state drive, electronic device including solid state drive, and method of managing solid state drive
US20210279122A1 (en) Lifetime telemetry on memory error statistics to improve memory failure analysis and prevention
KR102553275B1 (en) Semiconductor device for ensuring stable operation of memory in case of power failure
US11803217B2 (en) Management of composite cold temperature for data storage devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination