CN112965866B - Method for automatically screening temperature resistance range of memory bank - Google Patents

Method for automatically screening temperature resistance range of memory bank Download PDF

Info

Publication number
CN112965866B
CN112965866B CN202110239120.7A CN202110239120A CN112965866B CN 112965866 B CN112965866 B CN 112965866B CN 202110239120 A CN202110239120 A CN 202110239120A CN 112965866 B CN112965866 B CN 112965866B
Authority
CN
China
Prior art keywords
temperature
memory bank
management controller
platform management
counter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110239120.7A
Other languages
Chinese (zh)
Other versions
CN112965866A (en
Inventor
徐智亨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yingxin Computer Technology Co Ltd
Original Assignee
Shandong Yingxin Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yingxin Computer Technology Co Ltd filed Critical Shandong Yingxin Computer Technology Co Ltd
Priority to CN202110239120.7A priority Critical patent/CN112965866B/en
Publication of CN112965866A publication Critical patent/CN112965866A/en
Application granted granted Critical
Publication of CN112965866B publication Critical patent/CN112965866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The invention relates to a method for automatically screening temperature resistance ranges of memory chips, which realizes the classification of the memory chips according to the temperature resistance ranges and comprises the following steps: the BMC starts a memory bank shutdown temperature control module, confirms that the temperature of the memory bank reaches a set temperature, starts an X86 system, starts a POST counter, starts a memory bank startup temperature control module, judges whether the BMC receives a POST counter clearing instruction within a set time, judges whether the BMC receives an OS counter clearing instruction within a set time if the BMC receives the POST counter clearing instruction within the set time, and judges whether the operation is the lowest overtemperature temperature if the BMC does not receive the OS counter clearing instruction within the set time; if an instruction of clearing the OS counter is received, recording the current temperature as the maximum operable temperature of the memory bank, and if not, judging whether the operation is the lowest overtemperature temperature or not; if the operation is confirmed to be the lowest overtemperature temperature, the memory bank is judged to be incapable of overtemperature, if not, the BMC forcibly closes the X86 system, reduces the test temperature by one grade, and waits for the next test cycle.

Description

Method for automatically screening temperature resistance range of memory bank
Technical Field
The invention belongs to the technical field of information, and particularly relates to a method for automatically screening temperature resistance ranges of memory chips.
Background
A manufacturer defines a specification for each type of memory bank, wherein the specification includes an operation temperature which can be endured by the memory bank, and if a user wants to operate the memory bank in an environment exceeding the rated temperature, the normal use of the memory bank cannot be guaranteed; if the memory bank can withstand higher temperature, the memory bank can be applied to X86 systems in more special environments.
Therefore, if the memory bank needs to be operated at a higher temperature, the memory bank needs to be ensured to have a better quality in advance, and two methods are usually used for confirmation, namely, directly purchasing a memory bank with a higher temperature resistant style, which has the defect of higher price; secondly, the memory banks are classified in a manual mode, for example, the environmental temperature of the test is adjusted manually, and the pressure test is carried out by matching with software, and the method has the defects that:
(1) The temperature of the environment is manually set, the temperature is easily adjusted by mistake by personnel, the conditions of selection omission or repeated selection can occur, and a large number of memory banks are usually required to screen out the people with better physique, so that the testing efficiency is reduced;
(2) If the whole X86 is placed in a high-temperature environment, the CPU may not bear high temperature and become abnormal; if the hot air is directly blown to the memory bank, the hot air may affect the CPU or other components, so that the system cannot operate normally. This is a disadvantage of the prior art.
In view of the above, the present invention provides a method for automatically screening a temperature-resistant range of a memory bank; it is very necessary to solve the above-mentioned defects existing in the prior art.
Disclosure of Invention
The invention aims to provide a method for automatically screening the temperature resistance range of a memory bank so as to solve the technical problems.
A method for automatically screening a temperature resistance range of a memory bank specifically comprises the following steps:
s1, after a platform management controller is initialized and loaded with BIOS firmware, various temperatures to be tested for a memory bank are set in an internal table of the platform management controller, and the highest temperature in the whole overtemperature test is set as the test temperature;
s2, the platform management controller starts a shutdown temperature control module of the memory bank to ensure that the memory bank reaches the temperature condition set by the test under the condition that the X86 system is shut down;
s3, the platform management controller drives the X86 system to be started through the startup and shutdown link, and starts a POST counter;
s4, starting a starting temperature control module of the memory bank by the platform management controller;
s5, judging whether the platform management controller receives an instruction for clearing the POST counter or not,
if the platform management controller receives the command of clearing the POST counter, the S6 is switched to;
if the platform management controller does not receive the command of clearing the POST counter, the S9 is switched to;
s6, starting an OS counter by the platform management controller, and automatically executing a memory bank pressure test program after the X86 system enters the OS;
s7, judging whether the platform management controller receives an instruction for clearing the OS counter or not;
if the platform management controller receives the instruction of clearing the OS counter, turning to step S8;
if the platform management controller does not receive the instruction of clearing the OS counter, turning to step S9;
s8, recording the current temperature by the platform management controller, and setting the current temperature as the maximum operable temperature of the memory bank; ending the screening method;
s9, judging whether the set temperature of the memory of the X86 system is the lowest overtemperature test temperature or not;
if the set temperature of the memory of the X86 system is the lowest overtemperature test temperature, the step S10 is carried out; otherwise go to step S11;
s10, the platform management controller records that the memory bank cannot be applied to overtemperature; and ending the screening method;
s11, the platform management controller forcibly closes the X86 system through a power-on and power-off link;
and S12, after the X86 system is closed, the platform management controller reduces the test temperature by one level, and the step S2 is carried out.
Preferably, the platform management controller adopts BMC.
Preferably, a communication link, a power on/off link, and a memory bank reading link are included between the platform management controller and the X86 system.
Preferably, the platform management controller is further connected to the heat source module through a control link.
Preferably, the heat source module is located at the upwind position of the memory bank, so that the hot air generated by the heat source module directly influences the operating temperature of the memory bank.
Preferably, the heat source module and the two sides of the memory bank are both provided with structural fan covers, so that the air of the memory bank and the air of the peripheral component can be completely isolated, the hot air generated by the heat source module is completely blown to the memory bank, and other components are prevented from being influenced.
Preferably, the control link comprises a power supply, a plurality of MOSFETs and a plurality of current limiting resistors, and the temperature is controlled by controlling the number of the turned-on MOSFETs.
The invention has the advantages that the X86 system is used for matching with the fan cover, the BMC or other chips with the same functions, so that the production line can efficiently screen out the tolerable range of the memory bank without manually setting the environmental temperature, and if a customer has the requirement of a high-temperature environmental system, the high-temperature tolerable memory bank matching system can be determined to be shipped, thereby improving the product competitiveness.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of an X86 system with a platform management controller.
FIG. 3 is a schematic diagram of a BMC reading memory bank EEPROM link through SMBus.
FIG. 4 is a schematic diagram of the BMC controlling the heat source module through the GPIO.
Fig. 5 is a schematic view of heat source airflow control.
1 is a platform manager (also denoted BMC), 2 is an X86 system, 3 is a heat source module,
4 is the communication link, 5 is the on-off link, 6 is the memory bank reading link, 7 is the control link, 8 is the memory bank, 9 is the structure fan housing, 10 is the fan, 11 is the SMBus multiplexer, 12 is the memory bank SMBus,13 is CPU SMBus,14 is BMC SMBus,15 is the power, 16 is the current-limiting resistor, 17 is MOSFET.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings by way of specific examples, which are illustrative of the present invention and are not limited to the following embodiments.
As shown in fig. 1, a method for automatically screening a temperature-resistant range of a memory bank specifically includes the following steps: after the platform management controller 1 is initialized and the BIOS firmware is loaded, the platform management controller 1 contains a table, the table can be set by a user to various temperatures to be tested for the memory bank 8, and the highest temperature in the whole over-temperature test is set as the test temperature.
The platform management controller 1 starts a shutdown temperature control module of the memory bank to ensure that the memory bank 8 reaches the temperature condition set by the test under the condition that the X86 system 2 is shut down; in a shutdown state of the X86 system 8, the BMC1 obtains the temperature of the memory bank through the memory bank read link 6, reduces the temperature of the heat source module 3 through the control link 7 if the temperature is higher than the set temperature, increases the temperature of the heat source module 3 through the control link 7 if the temperature is lower than the set temperature, and determines that the control link 7 is abnormal and interrupts the test if the temperature of the memory bank cannot be made to coincide with the set temperature for a certain duration.
The platform management controller 1 drives the X86 system 2 to start up through the power on/off link 5, starts the POST counter, establishes a watchdog mechanism, and waits for a subsequent instruction to clear the POST counter.
After the X86 system 2 is started, the temperature of the memory bank is actively detected, and at this time, the X86 system 2 occupies the memory bank SMBus12, so that the platform management controller 1 no longer occupies the memory bank reading link 6, and if the platform management controller 1 needs to read the temperature of the memory bank 8, the start-up temperature control module of the memory bank must be started; the temperature of the memory bank needs to be inquired from the X86 system 2 through the communication link 4, if the temperature is higher than the set temperature, the temperature of the heat source module 3 is reduced through the control link 7, if the temperature is lower than the set temperature, the temperature of the heat source module 3 is increased through the control link 7, if the memory bank temperature cannot be made to be consistent with the set temperature for a certain duration, it is determined that the control link 7 is abnormal, and the test is interrupted.
Judging whether the platform management controller 1 receives an instruction for clearing the POST counter or not, if the platform management controller 1 receives the instruction for clearing the POST counter through the communication link 4, indicating that the X86 system 2 is normally operated at present, starting an OS counter after the POST counter is cleared by the platform management controller 1, and establishing a watchdog mechanism again; the X86 system 2 automatically executes the memory bank pressure test program after entering the OS, and determines whether the platform management controller 1 receives an instruction to clear the OS counter after the test program is completed.
If the platform management controller 1 receives an instruction to clear the OS counter, the platform management controller 1 records the current temperature and sets the current temperature as the maximum operable temperature of the memory bank; if the platform management controller 1 does not receive the instruction for clearing the OS counter, judging whether the set temperature of the memory bank in the X86 system 2 is the lowest overtemperature test temperature; during the test, the memory may not be able to operate because the memory cannot bear high temperature to generate error data, or the instruction for clearing the OS counter will not be issued if the memory bank stress test program determines that the result does not pass after the test is completed.
If the memory set temperature of the X86 system 2 is the lowest overtemperature test temperature, the platform management controller 1 records that the memory bank 8 cannot be applied to overtemperature; if the memory set temperature of the X86 system 2 is not the minimum overtemperature test temperature, the platform management controller 1 forcibly turns off the X86 system 2 through the power on/off link 5, and after the X86 system 2 is turned off, the platform management controller 1 reduces the test temperature by one level, so that the X86 system 2 enters the next test cycle.
If the platform management controller 1 does not receive the command of clearing the POST counter, judging whether the memory set temperature of the X86 system 2 is the lowest overtemperature test temperature or not, and if the memory set temperature of the X86 system 2 is the lowest overtemperature test temperature, recording that the memory bank cannot be applied to overtemperature by the platform management controller 1; if the memory set temperature of the X86 system 2 is not the lowest overtemperature test temperature, the platform management controller 1 forcibly closes the X86 system 2 through the power on/off link 5, and after the X86 system 2 is closed, the platform management controller 1 reduces the test temperature by one level, so that the X86 system 1 enters the next test cycle.
As shown in fig. 2, in The method, an X86 (The X86 architecture, a computer language instruction set executed by a microprocessor) system 2 is used in combination with a platform Management Controller 1, a BMC (Baseboard Management Controller) 1 is preferably selected for The platform Management Controller 1, and a communication link 4, a power on/off link 5, and a memory bank reading link 6 are at least required for connection between The platform Management Controller 1 and The X86 system 2.
A communication link 4 between the X86 system 2 and the BMC1 uses an LPC (Low pin count) bus or an Enhanced-SP (Enhanced-SP) signal; the BMC1 drives the switch link 5 of the X86 system 2, and controls the power button trigger pin of the X86 system 2 through a General purpose input/output (GPIO) of the BMC 1.
As shown in fig. 3, an SMBus (System Management Bus) multiplexer 11 is connected to the memory bank 8 via the memory bank SMBus12, the X86 System 2 via the CPU SMBus13, and the BMC1 via the BMC SMBus 14; a Memory bank reading link 6 between the BMC1 and the X86 system 2, a BMC SMBus14 reads information of a Memory bank EEPROM (Electrically Erasable Programmable Read Only Memory) through an SMBus multiplexer 11, if the BMC1 is to Read the information of the Memory bank EEPROM in a state where the X86 system 2 is powered off, the SMBus multiplexer 11 is switched to a BMC path, and the SMBus multiplexer 11 maintains communication between the X86 system 2 and the Memory bank 8 at other times.
As shown in fig. 4, the control link 7 between the BMC1 and the heat source module 3 includes a power supply 15, a plurality of MOSFETs (Metal-Oxide-Semiconductor Field-Effect transistors) 17, and a plurality of current-limiting resistors 16, the power supply 15 is connected to a first end of each current-limiting resistor 16, a second end of each current-limiting resistor 16 is connected to a first end of the MOSFET17, a second end of the MOSFET17 is connected to a GPIO of the BMC1, and third ends of all the MOSFETs 17 are grounded; the GPIO through the BMC1 controls the number of MOSFETs 17 to be turned on, when the MOSFETs 17 are turned on, current can pass through, the serially connected current limiting resistors 16 generate heat energy under the action of the current, and then the heat current is conducted to the air through the heat radiator in contact with the current limiting resistors 16, if higher temperature needs to be generated, more MOSFETs 17 are turned on through the GPIO, and if the temperature of the heat source module 3 needs to be reduced, the number of the turned-on MOSFETs 17 can be reduced through the GPIO.
As shown in fig. 5, the heat source module 3 is located between the fan 10 and the memory bank 8, i.e. the upwind of the memory bank 8, so that the hot air generated by the heat source module 3 can directly influence the operating temperature of the memory bank 8; the two sides of the heat source module 3 and the memory bank 8 are both provided with structural fan housings 9, which can completely isolate the air communication between the memory bank 8 and peripheral components, so that the hot air flow generated by the heat source module 3 can be completely blown to the memory bank 8, and other components are prevented from being influenced.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and the present invention is not limited thereto, and any modifications and variations which can be made by those skilled in the art without departing from the spirit of the present invention shall fall within the scope of the present invention.

Claims (7)

1. A method for automatically screening the temperature resistance range of a memory bank is characterized by comprising the following steps:
s1, after a platform management controller is initialized and loaded with BIOS firmware, various temperatures to be tested for a memory bank are set in an internal table of the platform management controller, and the highest temperature in the whole overtemperature test is set as the test temperature;
s2, the platform management controller starts a shutdown temperature control module of the memory bank to ensure that the memory bank reaches the temperature condition set by the test under the condition that the X86 system is shut down;
s3, the platform management controller drives the X86 system to be started through the startup and shutdown link, and starts a POST counter;
s4, starting a starting temperature control module of the memory bank by the platform management controller;
s5, judging whether the platform management controller receives an instruction for clearing the POST counter or not;
if the platform management controller receives the command of clearing the POST counter, the S6 is switched to;
if the platform management controller does not receive the command of clearing the POST counter, the S9 is switched to;
s6, starting an OS counter by the platform management controller, and automatically executing a memory bank pressure test program after the X86 system enters the OS;
s7, judging whether the platform management controller receives an instruction for clearing the OS counter or not;
if the platform management controller receives the instruction of clearing the OS counter, turning to step S8;
if the platform management controller does not receive the instruction of clearing the OS counter, turning to step S9;
s8, recording the current temperature by the platform management controller, and setting the current temperature as the maximum operable temperature of the memory bank; ending the screening method;
s9, judging whether the set temperature of the memory of the X86 system is the lowest overtemperature test temperature or not;
if the set temperature of the memory of the X86 system is the lowest overtemperature test temperature, the step S10 is carried out; otherwise go to step S11;
s10, the platform management controller records that the memory bank cannot be applied to overtemperature; and ending the screening method;
s11, the platform management controller forcibly closes the X86 system through a power-on and power-off link;
and S12, after the X86 system is closed, the platform management controller reduces the test temperature by one level, and the step S2 is carried out.
2. The method as claimed in claim 1, wherein the platform management controller is a baseboard management controller.
3. The method of claim 1, wherein the platform management controller and the X86 system comprise a communication link, a power on/off link, and a memory bank read link.
4. The method of claim 1, wherein the platform management controller is further connected to the heat source module via a control link.
5. The method of claim 4, wherein the heat source module is located upwind of the memory bank.
6. The method for automatically screening the temperature resistant range of the memory bank as claimed in claim 4, wherein the heat source module and the memory bank are provided with structural wind shields at both sides.
7. The method of claim 4, wherein the control link comprises a power source, a plurality of MOSFETs, and a plurality of current limiting resistors.
CN202110239120.7A 2021-03-04 2021-03-04 Method for automatically screening temperature resistance range of memory bank Active CN112965866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110239120.7A CN112965866B (en) 2021-03-04 2021-03-04 Method for automatically screening temperature resistance range of memory bank

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110239120.7A CN112965866B (en) 2021-03-04 2021-03-04 Method for automatically screening temperature resistance range of memory bank

Publications (2)

Publication Number Publication Date
CN112965866A CN112965866A (en) 2021-06-15
CN112965866B true CN112965866B (en) 2023-01-10

Family

ID=76276425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110239120.7A Active CN112965866B (en) 2021-03-04 2021-03-04 Method for automatically screening temperature resistance range of memory bank

Country Status (1)

Country Link
CN (1) CN112965866B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361533A (en) * 2018-09-17 2019-02-19 视联动力信息技术股份有限公司 Heatproof test method and device
CN110736545A (en) * 2018-07-20 2020-01-31 大族激光科技产业集团股份有限公司 laser head temperature monitoring device and method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7318173B1 (en) * 2002-06-03 2008-01-08 National Semiconductor Corporation Embedded controller based BIOS boot ROM select
CN100498715C (en) * 2006-05-20 2009-06-10 技嘉科技股份有限公司 Method for simulating IPMI by BIOS
US9760136B2 (en) * 2014-08-15 2017-09-12 Intel Corporation Controlling temperature of a system memory
CN109324945A (en) * 2018-09-07 2019-02-12 郑州云海信息技术有限公司 A kind of BMC reads RAID card temperature-time automatic obtaining method and system
CN109931285B (en) * 2019-03-06 2021-10-26 郑州云海信息技术有限公司 Fan speed regulation method and device and electronic equipment
CN111949463B (en) * 2020-08-28 2022-07-08 苏州浪潮智能科技有限公司 Method and system for screening out over-frequency range of multiple memory banks
CN112349342B (en) * 2020-11-05 2024-03-22 海光信息技术股份有限公司 Maintenance device, method, equipment and storage medium for maintaining DDR5 memory subsystem

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110736545A (en) * 2018-07-20 2020-01-31 大族激光科技产业集团股份有限公司 laser head temperature monitoring device and method
CN109361533A (en) * 2018-09-17 2019-02-19 视联动力信息技术股份有限公司 Heatproof test method and device

Also Published As

Publication number Publication date
CN112965866A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN101311899B (en) Information processing apparatus prewarming control system and its control method
TWI291652B (en) Debugging device using a LPC interface capable of recovering functions of BIOS, and debugging method therefor
CN106055440B (en) A kind of test method and system for realizing server exception power-off by BMC
CN101901180B (en) Heating protection circuit, electronic device and heating protection method thereof
US7930534B2 (en) Motherboard and start-up method utilizing a BIOS bin file and GPIO pins
CA2820563A1 (en) Programmed triggering of diagnostics for a space conditioning system
CN101727368A (en) On/off test method and on/off test system
US20090100287A1 (en) Monitoring Apparatus and a Monitoring Method Thereof
US11573619B2 (en) Information processing apparatus and method
CN105468114A (en) Design method for optimizing heat dissipation noise of server board card
EP1712993A2 (en) Information processing apparatus
CN101604281A (en) Computer installation and temperature control method thereof
CN100472467C (en) Method and device for monitoring status of computer power supply fan
CN112965866B (en) Method for automatically screening temperature resistance range of memory bank
JP2019133653A (en) Power supply unit fan recovery process
CN104597983A (en) Regulation method of revolving speed of computer and mainboard system
CN117289963A (en) Method and equipment for online updating target area of server platform service firmware
KR20090037223A (en) Method and system for power-on self testing after system off, and booting method the same
CN107132468A (en) Mainboard test device and method of testing
US9619355B2 (en) Booting verification method of computer and electronic device
WO2012039711A1 (en) Method and system for performing system maintenance in a computing device
US6691242B1 (en) System test and method for checking processor over-clocking by retrieving an assigned speed from an register internal to the processor, comparing with running speed, and displaying caution message to user
US20050097371A1 (en) CPU chip having registers therein for reporting maximum CPU power and temperature ratings
CN110109789B (en) Novel OTP MCU test method
Intel Intel® Desktop Board DP43BFL Technical Product Specification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant