CN109188247B - Electronic system abnormal state detection system and method - Google Patents

Electronic system abnormal state detection system and method Download PDF

Info

Publication number
CN109188247B
CN109188247B CN201811058466.1A CN201811058466A CN109188247B CN 109188247 B CN109188247 B CN 109188247B CN 201811058466 A CN201811058466 A CN 201811058466A CN 109188247 B CN109188247 B CN 109188247B
Authority
CN
China
Prior art keywords
abnormal
management unit
chip
health management
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811058466.1A
Other languages
Chinese (zh)
Other versions
CN109188247A (en
Inventor
罗禹铭
罗禹城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangyu Safety Technology Shenzhen Co ltd
Original Assignee
Wangyu Safety Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangyu Safety Technology Shenzhen Co ltd filed Critical Wangyu Safety Technology Shenzhen Co ltd
Priority to CN201811058466.1A priority Critical patent/CN109188247B/en
Publication of CN109188247A publication Critical patent/CN109188247A/en
Application granted granted Critical
Publication of CN109188247B publication Critical patent/CN109188247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/2801Testing of printed circuits, backplanes, motherboards, hybrid circuits or carriers for multichip packages [MCP]
    • G01R31/281Specific types of tests or tests for a specific type of fault, e.g. thermal mapping, shorts testing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/2801Testing of printed circuits, backplanes, motherboards, hybrid circuits or carriers for multichip packages [MCP]
    • G01R31/2806Apparatus therefor, e.g. test stations, drivers, analysers, conveyors
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/2801Testing of printed circuits, backplanes, motherboards, hybrid circuits or carriers for multichip packages [MCP]
    • G01R31/281Specific types of tests or tests for a specific type of fault, e.g. thermal mapping, shorts testing
    • G01R31/2812Checking for open circuits or shorts, e.g. solder bridges; Testing conductivity, resistivity or impedance

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a system and a method for detecting the abnormal state of an electronic system.A board card receives a health detection request sent by a system health management unit, and then sends the health detection request to each chip on the board card through the board card health management unit; the chip health management unit reads the information of each parameter, adjusts the information according to the abnormal correspondence of the parameters, and reports the abnormal information to the board health management unit when the processed parameters are not recovered; after the board health management unit carries out adjustment processing according to the abnormal information, if the chip is still abnormal, the power supply of the abnormal chip is cut off, and the abnormal information is reported to the system health management unit; and after the system health management unit carries out regulation processing according to the chip abnormal information, if the board card is still abnormal, the power supply of the abnormal board card is cut off, and the abnormal information is reported to an equipment administrator. The invention ensures that the whole electronic system can safely and reliably operate by recovering the abnormal chip operation.

Description

Electronic system abnormal state detection system and method
Technical Field
The invention relates to the technical field of computer application, in particular to an electronic system abnormal state detection system and method.
Background
An objective entity composed of electronic components or parts and capable of generating, transmitting, collecting or processing electrical signals and information is generally called an electronic system, and generally, an electronic system is composed of three major parts, i.e., input, output and information processing, and is used for processing some information, controlling or driving some load, such as application to a server or a vehicle-mounted system.
A plurality of chips form a board card (a printed circuit board, PCB for short, with a plug core when being manufactured, which can be inserted into a slot of a main circuit board (mainboard) of a computer to control the operation of hardware, such as a display, an acquisition card and other devices, and corresponding hardware functions can be realized after a driver is installed), and then a plurality of board cards form a system complete machine (electronic system), because the electronic system comprises a large number of chips, some chips may be overheated, overloaded or run off a program (which means that the value of a program counter PC deviates from a given unique change course after the system is interfered by a certain amount, and the program operation deviates from a normal operation path) and other abnormalities, at this time, if the electronic system cannot detect the abnormal chip in time, the whole system cannot operate safely and reliably, and the prior art has the condition of monitoring and processing a certain abnormal condition (for example, overheating), but cannot detect and provide corresponding solutions for multiple abnormal situations at the same time.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
The invention provides a system and a method for detecting the abnormal state of an electronic system, aiming at the defects of the prior art, the invention transmits the health state of a chip to the system through a specific interface of the chip, the system judges the health state of the current chip according to the detected information, and executes operations such as frequency reduction, reset, restart, power failure and the like on the chip according to the needs to recover the chip to a normal working state or temporarily stop the work of the chip, thereby avoiding influencing the overall state of the system and ensuring the safe and reliable operation of the whole system.
The technical scheme adopted by the invention for solving the technical problem is as follows:
an electronic system abnormal state detection system, wherein the electronic system abnormal state detection system comprises:
the electronic system comprises a board card consisting of a plurality of chips and an electronic system consisting of a plurality of board cards;
the system health management unit is connected with the electronic system and controls the plurality of board cards through an analog switch, and the system health management unit is used for sending health detection requests to the board cards through an SPI (serial peripheral interface) and receiving and processing abnormal information reported by the board cards;
each board card is provided with a board card health management unit which is connected with and controls a plurality of chips through an analog switch, and the board card health management unit is used for sending health detection requests to the chips on the board card and receiving and processing abnormal information reported by the chips;
each chip is provided with a chip health management unit which is used for reading information of each parameter on the chip, judging whether each parameter is in a normal range or not and processing abnormal information of the parameter.
The electronic system abnormal state detection system, wherein the chip health management unit reading the parameter information on the chip comprises: voltage, current, temperature, and watchdog information.
An electronic system abnormal state detection method, wherein the electronic system abnormal state detection method comprises the following steps:
the board card receives a health detection request sent by the system health management unit, and then sends the health detection request to each chip on the board card through the board card health management unit;
the chip health management unit reads the information of each parameter, adjusts the information according to the abnormal correspondence of the parameters, and reports the abnormal information to the board health management unit when the processed parameters are not recovered;
after the board health management unit carries out adjustment processing according to the abnormal information, if the chip is still abnormal, the power supply of the abnormal chip is cut off, and the abnormal information is reported to the system health management unit;
and after the system health management unit carries out regulation processing according to the chip abnormal information, if the board card is still abnormal, the power supply of the abnormal board card is cut off, and the abnormal information is reported to an equipment administrator.
The electronic system abnormal state detection method includes that the board card receives a health detection request sent by a system health management unit, and then sends the health detection request to each chip on the board card through the board card health management unit, wherein the method specifically includes the following steps:
the system health management unit sends a health detection request to each board card through the SPI;
and after the board receives the health detection request sent by the health management unit, the board health management unit sends the health detection request to each chip on the board.
In the method for detecting an abnormal state of an electronic system, the reading of the information of each parameter by the chip health management unit, the adjustment according to the abnormal correspondence of the parameter, and the reporting of the abnormal information to the board health management unit when the processed parameter is not recovered specifically include:
after a chip on the board card receives the health detection request, reading parameter information in the chip through a chip health management unit;
judging whether the read parameter information is in a normal range, and adjusting according to the abnormal correspondence of the parameters when the parameter information is not in the normal range;
and when the processed parameters are still abnormal, reporting the abnormal information to the board card health management unit through the SPI interface.
In the method for detecting the abnormal state of the electronic system, after a chip on a board card receives a health detection request, the voltage, the current, the temperature and the watchdog information in the chip are read through a chip health management unit;
if the current voltage is too low, the current is too high or the temperature is too high, the active frequency reduction operation in the chip is executed;
and if the current watchdog information is abnormal, resetting the current chip.
After the board health management unit performs adjustment processing according to the abnormal information, if the chip is still abnormal, the method for detecting the abnormal state of the electronic system cuts off the power supply of the abnormal chip and reports the abnormal information to the system health management unit specifically comprises the following steps:
the board card health management unit receives the abnormal information sent by the chip, adjusts and processes the abnormal information and sends a health detection request to the chip again;
if the chip is detected to be abnormal, the board card health management unit cuts off the power supply of the abnormal chip and reports the abnormal information to the system health management unit through the SPI interface.
According to the electronic system abnormal state detection method, after the board card health management unit receives abnormal information sent by the chip and carries out adjustment processing, if the current voltage is too low, the current is too high or the temperature is too high, the active frequency reduction operation of the board card is executed.
The method for detecting the abnormal state of the electronic system, wherein after the system health management unit performs adjustment processing according to the chip abnormal information, if the board card is still abnormal, the power supply of the abnormal board card is cut off, and the abnormal information is reported to an equipment administrator, specifically comprising:
the system health management unit receives the abnormal information sent by the chip, adjusts and processes the abnormal information and sends a health detection request to the board card again;
if the board card is detected to be abnormal, the system health management unit cuts off the power supply of the abnormal board card and reports the abnormal information to an equipment administrator through the SPI interface.
According to the method for detecting the abnormal state of the electronic system, after the system health management unit receives the abnormal information sent by the chip and carries out adjustment processing, if the current voltage is too low, the current is too high or the temperature is too high, the active frequency reduction operation of the system is executed.
The invention discloses a system and a method for detecting the abnormal state of an electronic system.A board card receives a health detection request sent by a system health management unit, and then sends the health detection request to each chip on the board card through the board card health management unit; the chip health management unit reads the information of each parameter, adjusts the information according to the abnormal correspondence of the parameters, and reports the abnormal information to the board health management unit when the processed parameters are not recovered; after the board health management unit carries out adjustment processing according to the abnormal information, if the chip is still abnormal, the power supply of the abnormal chip is cut off, and the abnormal information is reported to the system health management unit; and after the system health management unit carries out regulation processing according to the chip abnormal information, if the board card is still abnormal, the power supply of the abnormal board card is cut off, and the abnormal information is reported to an equipment administrator. The invention carries out corresponding operation to recover the abnormal chip operation by detecting the abnormal condition in the operation process of the electronic system and ensures that the whole electronic system can safely and reliably operate.
Drawings
FIG. 1 is a schematic diagram of an abnormal state detection system of an electronic system according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart of the method for detecting abnormal states of an electronic system according to the present invention;
FIG. 3 is a flowchart illustrating the step S10 of the abnormal status detection method of the electronic system according to the present invention;
FIG. 4 is a flowchart illustrating the step S20 of the abnormal status detection method of the electronic system according to the present invention;
FIG. 5 is a flowchart illustrating the step S30 of the abnormal status detection method of the electronic system according to the present invention;
FIG. 6 is a flowchart illustrating the step S40 of the abnormal status detection method of the electronic system according to the present invention;
FIG. 7 is a flowchart illustrating the detection and processing of program run-off exceptions in the preferred embodiment of the method for detecting an abnormal state of an electronic system according to the present invention;
FIG. 8 is a flow chart illustrating the chip temperature anomaly detection and processing according to the method for detecting the anomaly status of the electronic system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the whole detection system of the present invention is divided into three levels, from large to small, an electronic system (system health management unit), a board card (board health management unit), and a chip (chip health management unit), wherein the electronic system is composed of a plurality of (N) board cards, the system health management unit is connected to and controls the plurality of board cards through an analog switch, the system health management unit controls the plurality of board cards through the analog switch, the board card is composed of a plurality of (N) chips, the board card health management unit is connected to and controls the plurality of chips through the analog switch, the system health management unit is connected to the plurality of board cards through the analog switch, and is connected to a power supply, a reset and a clock management unit of the electronic system, each board card is provided with a board card health management unit, and the board health management unit is connected to the power supply, the reset and the, Reset, the clock management unit is connected, the healthy management unit of integrated circuit board passes through analog switch and is connected with a plurality of chips, analog switch passes through the SPI interface and is connected with the healthy management unit of chip (every chip all is provided with a healthy management unit of chip), the healthy management unit of chip monitors four major parameters simultaneously, is temperature monitoring, voltage monitoring, current monitoring and watchdog control respectively, the power of each integrated circuit board, reset, the clock reset frequency of every chip on the integrated circuit board of clock management unit control.
Specifically, the system health management unit is connected with and controls a plurality of board cards through an analog switch, and the system health management unit is used for sending health detection requests to the board cards through an SPI (serial peripheral interface) and receiving and processing abnormal information reported by the board cards; the board health management unit is used for sending a health detection request to each chip on the board, and receiving and processing abnormal information reported by the chips; the chip health management unit is used for reading information of each parameter on the chip, judging whether each parameter is in a normal range, and processing abnormal information of the parameter.
Based on the above system for detecting abnormal status of an electronic system, the method for detecting abnormal status of an electronic system according to the preferred embodiment of the present invention, as shown in fig. 2, comprises the following steps:
and step S10, the board receives the health detection request sent by the system health management unit, and then sends the health detection request to each chip on the board through the board health management unit.
Specifically, the watchdog, the voltage, the current and the temperature detection circuit are arranged in the chip, the system can query the working state of each board card and chip step by step through a special SPI (Serial Peripheral Interface) Interface, control processing is performed step by step, and the abnormal state is reported to the superior health detection unit for processing after the processing of the current stage.
Therefore, firstly, the system health management unit sends a health detection request to each board card through the SPI interface; when each board receives a health detection request sent by the system through the SPI, the board health management unit sends the health detection request to each chip on the board.
Please refer to fig. 3, which is a flowchart of step S10 in the network handover control method according to the present invention.
As shown in fig. 3, the step S10 includes:
s11, the system health management unit sends a health detection request to each board card through the SPI;
and S12, after the board receives the health detection request sent by the health management unit, sending the health detection request to each chip on the board through the board health management unit.
And step S20, the chip health management unit reads the information of each parameter, adjusts the information according to the abnormal correspondence of the parameter, and reports the abnormal information to the board health management unit when the processed parameter is not recovered.
Please refer to fig. 4, which is a flowchart of step S20 in the network handover control method according to the present invention.
As shown in fig. 4, the step S20 includes:
s21, after the chip on the board card receives the health detection request, reading each parameter information in the chip through the chip health management unit;
s22, judging whether the read parameter information is in a normal range, and adjusting according to the abnormal correspondence of the parameters when the parameter information is not in the normal range;
and S23, when the processed parameters are still abnormal, reporting the abnormal information to the board card health management unit through the SPI interface.
Specifically, after a chip on a board card receives a health detection request, a chip health management unit reads the voltage, current, temperature and WatchDog (WatchDog) information in the chip health management unit, and judges the voltage, current, temperature and WatchDog information, if the current voltage is too low, the current is too high or the temperature is too high, active frequency reduction operation in the chip is executed, and frequency reduction refers to a processing mode of reducing the working frequency of the chip by configuring a Phase Locked Loop (PLL) of the chip to uniformly integrate clock signals so that high-frequency devices normally work, such as access data of an internal memory, and the like, so as to reduce the power consumption and the temperature of the chip; if the watchdog information is abnormal, the current chip is reset, wherein the reset refers to a processing mode of restoring the chip to a power-on initial state, and the chip can be restored from the abnormality. After the health detection and processing in the chip, if the voltage, the current, the temperature or the watchdog information of the chip is still abnormal, the abnormal information is reported to the board health management unit through the SPI.
And step S30, after the board health management unit adjusts and processes according to the abnormal information, if the chip is still abnormal, the power supply of the abnormal chip is cut off, and the abnormal information is reported to the system health management unit.
Please refer to fig. 5, which is a flowchart of step S30 in the network handover control method according to the present invention.
As shown in fig. 5, the step S30 includes:
s31, the board card health management unit receives the abnormal information sent by the chip, adjusts the abnormal information and sends a health detection request to the chip again;
and S32, if the chip is detected to be abnormal, the board health management unit cuts off the power supply of the abnormal chip and reports the abnormal information to the system health management unit through the SPI interface.
Specifically, the board health management unit receives the health abnormal information of the chip, judges the health abnormal information, and executes active frequency reduction operation of the board if the current voltage is too low, the current is too high or the temperature is too high; and resending the health detection request to the chip (the chip with the abnormality before), if the chip still has the abnormality after the frequency reduction of the board card, cutting off the power supply of the chip by the board card health management unit, and reporting the abnormal information to the system health management unit through the SPI.
And step S40, after the system health management unit carries out adjustment processing according to the chip abnormal information, if the board card is still abnormal, the power supply of the abnormal board card is cut off, and the abnormal information is reported to an equipment administrator.
Please refer to fig. 6, which is a flowchart of step S40 in the network handover control method according to the present invention.
As shown in fig. 6, the step S40 includes:
s41, the system health management unit receives the abnormal information sent by the chip, adjusts and processes the abnormal information, and sends a health detection request to the board card again;
and S42, if the board card is detected to be abnormal, the system health management unit cuts off the power supply of the abnormal board card and reports the abnormal information to the equipment administrator through the SPI interface.
Specifically, the system health management unit receives the health abnormal information of the chip, judges the health abnormal information, and executes the active frequency reduction operation of the system if the current voltage is too low, the current is too high or the temperature is too high; and resending the health detection request to the board card (the board card with the abnormality before), if the board card is still abnormal after the system frequency reduction, cutting off the power supply of the board card by the system health management unit, and reporting the abnormality to an equipment administrator.
Two specific examples are described below:
(1) program run-off exception detection and handling
As shown in fig. 7, when a chip program on a board in an electronic system runs off, the detection and processing may be performed through the following processes:
s101, a system health management unit sends a health detection request to each board card through an SPI (serial peripheral interface);
s102, when the board card receives a health detection request sent by the system health management unit through the SPI, the board card health management unit sends the health detection request to each chip on the board card;
s103, after the chip on the board card receives the health detection request, the chip health management unit reads the voltage, current, temperature and WatchDog information in the chip health management unit;
s104, judging whether the WatchDog information is abnormal, if so, executing S105, and if not, executing S109;
s105, at this time, as the chip on the board card flies, the chip health management unit detects that the WatchDog is abnormal, and the chip health management unit sends reset to the clock reset frequency control unit of the chip;
s106, the control chip carries out reset operation and reports reset information to the board card health management unit;
s107, after the processing is finished, the board card health management unit reports reset information to the system health management unit;
s108, the system health management unit records chip reset information of the board card and ends the health detection;
s109, if the WatchDog information is not abnormal, reporting the abnormal information to a board card health management unit;
s110, the board health management unit reports abnormal information to the system health management unit;
and S111, the system health management unit records the detection information and ends the health detection.
It should be noted that: if the chip on the board card or the chip on the other board card in the system is abnormal due to the run-off and reset of the chip, the abnormal detection and processing follow the same processing flow.
(2) Chip temperature anomaly detection and handling
As shown in fig. 8, when the temperature of a chip on a board in an electronic system is abnormal, the detection and processing may be performed through the following processes:
s201, the system health management unit sends a health detection request to each board card through an SPI (serial peripheral interface);
s202, when the board receives a health detection request sent by the system through the SPI, the board health management unit sends the health detection request to each chip on the board;
s203, after the chip on the board card receives the health detection request, the chip health management unit reads the voltage, current, temperature and WatchDog information in the chip health management unit;
s204, judging whether the temperature is abnormal or not, if so, executing S205, and if not, executing S208;
s205, at this time, as the chip temperature on the board card is abnormal, the chip health management unit detects the abnormal temperature, and the chip health management unit sends a frequency reduction command to a clock reset frequency control unit of the chip according to the current abnormal temperature value;
s206, the chip performs frequency reduction according to the current temperature, and the chip health management unit continues to detect the temperature change inside the chip after frequency reduction;
s207, continuously judging whether the temperature is abnormal or not, if so, executing S214, and if not, executing S211;
s208, reporting abnormal-free information to a board card health management unit;
s209, the board card health management unit reports the abnormal information to the system health management unit;
s210, the system health management unit records the detection information;
s211, if the temperature is not abnormal, reporting frequency reduction information to a board card health management unit;
s212, the board card health management unit reports the frequency reduction information to the system health management unit;
s213, the system health management unit records the detection information;
s214, if the temperature of the chip is not recovered to be normal within the specific time T after the frequency reduction, resetting the chip and reporting abnormal temperature information to a health management unit of the board card;
s215, after receiving the temperature abnormal information of the chip, the board card health management unit reduces the frequency of the board card and sends a health detection request to the chip again;
s216, judging whether the temperature of the chip after the frequency reduction of the board card is abnormal, if so, executing S217, and if not, executing S220;
s217, if the temperature of the chip is still abnormal after the frequency of the board card is reduced, the board card health management unit cuts off the power supply of the chip;
s218, reporting the abnormal information to a system health management unit through an SPI interface;
s219, after receiving the temperature abnormal information and the chip power-off information of the chip of the board card, the system health management unit records the information and reports the information to a system administrator;
s220, if the temperature of the chip is normal after the frequency reduction of the board card, the processing flow is ended, and frequency reduction information is reported to a system health management unit;
s221, the system health management unit records the detection information.
The invention detects the abnormality step by step and processes the abnormality, the detected abnormality comprises voltage, current, temperature and watchdog information, the health detection of the electronic system is more complete, and the invention adopts a step-by-step detection and processing mechanism, when the electronic system is abnormal, the abnormality can be processed without influencing the functions of other chips which normally work.
The electronic system comprises a common PC, a server for cloud computing and storage, an embedded vehicle-mounted electronic system and the like.
The invention can detect the states of program runaway, over-high chip temperature, abnormal chip working voltage, abnormal current and the like in the operation process of the electronic system, carry out corresponding operation to recover the abnormal chip operation and ensure that the whole system can safely and reliably operate.
In summary, the present invention provides a system and a method for detecting an abnormal state of an electronic system, in which a board receives a health detection request sent by a system health management unit, and then sends a health detection request to each chip on the board through the board health management unit; the chip health management unit reads the information of each parameter, adjusts the information according to the abnormal correspondence of the parameters, and reports the abnormal information to the board health management unit when the processed parameters are not recovered; after the board health management unit carries out adjustment processing according to the abnormal information, if the chip is still abnormal, the power supply of the abnormal chip is cut off, and the abnormal information is reported to the system health management unit; and after the system health management unit carries out regulation processing according to the chip abnormal information, if the board card is still abnormal, the power supply of the abnormal board card is cut off, and the abnormal information is reported to an equipment administrator. The invention carries out corresponding operation to recover the abnormal chip operation by detecting the abnormal condition in the operation process of the electronic system and ensures that the whole electronic system can safely and reliably operate.
Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be automatically performed by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer-readable storage medium, and the program can include the processes of the embodiments of the methods described above when executed. The storage medium may be a memory, a magnetic disk, an optical disk, etc.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (6)

1. An electronic system abnormal state detection method based on an electronic system abnormal state detection system, the electronic system abnormal state detection system comprising:
the electronic system comprises a board card consisting of a plurality of chips and an electronic system consisting of a plurality of board cards;
the system health management unit is connected with the electronic system and controls the plurality of board cards through an analog switch, and the system health management unit is used for sending health detection requests to the board cards through an SPI (serial peripheral interface) and receiving and processing abnormal information reported by the board cards;
each board card is provided with a board card health management unit which is connected with and controls a plurality of chips through an analog switch, and the board card health management unit is used for sending health detection requests to the chips on the board card and receiving and processing abnormal information reported by the chips;
each chip is provided with a chip health management unit, and the chip health management unit is used for reading information of each parameter on the chip, judging whether each parameter is in a normal range, and processing abnormal information of the parameter; the step of reading the parameter information on the chip by the chip health management unit comprises the following steps: voltage, current, temperature, and watchdog information;
the electronic system abnormal state detection method is characterized by comprising the following steps:
the board card receives a health detection request sent by the system health management unit, and then sends the health detection request to each chip on the board card through the board card health management unit;
the chip health management unit reads the information of each parameter, adjusts the information according to the abnormal correspondence of the parameters, and reports the abnormal information to the board health management unit when the processed parameters are not recovered;
the chip health management unit reads each parameter information, adjusts according to the abnormal correspondence of the parameters, and reports the abnormal information to the board health management unit when the processed parameters are not recovered, and the method specifically comprises the following steps:
after a chip on the board card receives the health detection request, reading parameter information in the chip through a chip health management unit;
judging whether the read parameter information is in a normal range, and adjusting according to the abnormal correspondence of the parameters when the parameter information is not in the normal range;
when the processed parameters are still abnormal, reporting abnormal information to the board card health management unit through the SPI interface;
after a chip on the board card receives a health detection request, reading the voltage, the current, the temperature and the watchdog information in the chip through a chip health management unit;
if the current voltage is too low, the current is too high or the temperature is too high, the active frequency reduction operation in the chip is executed;
if the current watchdog information is abnormal, resetting the current chip;
after the board health management unit carries out adjustment processing according to the abnormal information, if the chip is still abnormal, the power supply of the abnormal chip is cut off, and the abnormal information is reported to the system health management unit;
and after the system health management unit carries out regulation processing according to the chip abnormal information, if the board card is still abnormal, the power supply of the abnormal board card is cut off, and the abnormal information is reported to an equipment administrator.
2. The method for detecting the abnormal state of the electronic system according to claim 1, wherein the board receives the health detection request sent by the system health management unit, and then sends the health detection request to each chip on the board through the board health management unit specifically includes:
the system health management unit sends a health detection request to each board card through the SPI;
and after the board receives the health detection request sent by the health management unit, the board health management unit sends the health detection request to each chip on the board.
3. The method for detecting an abnormal state of an electronic system according to claim 1, wherein after the board health management unit performs adjustment processing according to the abnormal information, if the chip is still abnormal, the board health management unit cuts off power supply to the abnormal chip and reports the abnormal information to the system health management unit specifically includes:
the board card health management unit receives the abnormal information sent by the chip, adjusts and processes the abnormal information and sends a health detection request to the chip again;
if the chip is detected to be abnormal, the board card health management unit cuts off the power supply of the abnormal chip and reports the abnormal information to the system health management unit through the SPI interface.
4. The method for detecting the abnormal state of the electronic system as claimed in claim 3, wherein after the board health management unit receives the abnormal information sent by the chip and performs the adjustment process, if the current voltage is too low, the current is too high or the temperature is too high, the active frequency reduction operation of the board is performed.
5. The method for detecting an abnormal state of an electronic system according to claim 1, wherein after the system health management unit performs adjustment processing according to the chip abnormal information, if the board card is still abnormal, the system health management unit cuts off power supply to the abnormal board card and reports the abnormal information to an equipment administrator, specifically comprising:
the system health management unit receives the abnormal information sent by the chip, adjusts and processes the abnormal information and sends a health detection request to the board card again;
if the board card is detected to be abnormal, the system health management unit cuts off the power supply of the abnormal board card and reports the abnormal information to an equipment administrator through the SPI interface.
6. The method as claimed in claim 5, wherein after the system health management unit receives the abnormal information sent by the chip and adjusts the abnormal information, if the current voltage is too low, the current is too high, or the temperature is too high, the active frequency reduction operation of the system is performed.
CN201811058466.1A 2018-09-11 2018-09-11 Electronic system abnormal state detection system and method Active CN109188247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811058466.1A CN109188247B (en) 2018-09-11 2018-09-11 Electronic system abnormal state detection system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811058466.1A CN109188247B (en) 2018-09-11 2018-09-11 Electronic system abnormal state detection system and method

Publications (2)

Publication Number Publication Date
CN109188247A CN109188247A (en) 2019-01-11
CN109188247B true CN109188247B (en) 2020-04-14

Family

ID=64910423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811058466.1A Active CN109188247B (en) 2018-09-11 2018-09-11 Electronic system abnormal state detection system and method

Country Status (1)

Country Link
CN (1) CN109188247B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831024B (en) * 2019-04-19 2022-03-01 群联电子股份有限公司 Temperature control circuit, memory storage device and temperature control method
CN110907802A (en) * 2019-11-19 2020-03-24 北京东方逸腾数码医疗设备技术有限公司 State detection device
CN113051137B (en) * 2021-04-22 2024-03-26 北京计算机技术及应用研究所 Design method of extensible server remote health management system
CN113741656A (en) * 2021-09-15 2021-12-03 西安超越申泰信息科技有限公司 VPX architecture-based chassis management system and method
CN115389915B (en) * 2022-10-27 2023-03-17 北京东远润兴科技有限公司 Circuit health monitoring management system, monitoring method and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458758A (en) * 2007-12-10 2009-06-17 上海华虹Nec电子有限公司 Chip test system and method
CN103136147A (en) * 2011-12-03 2013-06-05 鸿富锦精密工业(深圳)有限公司 Signal collection system and method
JP2013174555A (en) * 2012-02-27 2013-09-05 Furukawa Electric Co Ltd:The Battery status detection apparatus
CN103793283A (en) * 2012-11-05 2014-05-14 重庆重邮信科通信技术有限公司 Terminal fault handling method and terminal fault handling device
CN103810070A (en) * 2013-11-29 2014-05-21 航天恒星科技有限公司 State monitoring system based on single-chip microcomputers
CN104316731A (en) * 2014-10-29 2015-01-28 上海华岭集成电路技术股份有限公司 Chip test board and chip test system
CN104639231A (en) * 2015-02-14 2015-05-20 苏州新海宜通信科技股份有限公司 Pass through system for power outage and circuit break protection of optical network ring
CN104951276A (en) * 2015-06-24 2015-09-30 福州瑞芯微电子有限公司 Detection method and system for failure of chip instruction cache memory
CN107023504A (en) * 2017-06-02 2017-08-08 郑州云海信息技术有限公司 A kind of fan control system and control method based on BMC
CN107634277A (en) * 2017-09-27 2018-01-26 深圳市聚马新能源汽车科技有限公司 A kind of automobile high in the clouds battery management system based on wireless telecommunications battery core
CN207798152U (en) * 2018-02-05 2018-08-31 东莞久久蜜蜂智能科技有限公司 A kind of temperature-humidity detecting device, temperature/humiditydetection detection system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104571436A (en) * 2013-10-22 2015-04-29 成都爱信雅克科技有限公司 Computer overheat protecting circuit
CN107403798B (en) * 2017-08-11 2019-02-19 北京兆易创新科技股份有限公司 A kind of chip and its detection method
CN108121425A (en) * 2017-12-22 2018-06-05 广州小微电子技术有限公司 chip reset method, chip and consumable container

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458758A (en) * 2007-12-10 2009-06-17 上海华虹Nec电子有限公司 Chip test system and method
CN103136147A (en) * 2011-12-03 2013-06-05 鸿富锦精密工业(深圳)有限公司 Signal collection system and method
JP2013174555A (en) * 2012-02-27 2013-09-05 Furukawa Electric Co Ltd:The Battery status detection apparatus
CN103793283A (en) * 2012-11-05 2014-05-14 重庆重邮信科通信技术有限公司 Terminal fault handling method and terminal fault handling device
CN103810070A (en) * 2013-11-29 2014-05-21 航天恒星科技有限公司 State monitoring system based on single-chip microcomputers
CN104316731A (en) * 2014-10-29 2015-01-28 上海华岭集成电路技术股份有限公司 Chip test board and chip test system
CN104639231A (en) * 2015-02-14 2015-05-20 苏州新海宜通信科技股份有限公司 Pass through system for power outage and circuit break protection of optical network ring
CN104951276A (en) * 2015-06-24 2015-09-30 福州瑞芯微电子有限公司 Detection method and system for failure of chip instruction cache memory
CN107023504A (en) * 2017-06-02 2017-08-08 郑州云海信息技术有限公司 A kind of fan control system and control method based on BMC
CN107634277A (en) * 2017-09-27 2018-01-26 深圳市聚马新能源汽车科技有限公司 A kind of automobile high in the clouds battery management system based on wireless telecommunications battery core
CN207798152U (en) * 2018-02-05 2018-08-31 东莞久久蜜蜂智能科技有限公司 A kind of temperature-humidity detecting device, temperature/humiditydetection detection system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
计算机故障诊断仪的设计与实现;孟艳梅;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130315;第5-17页 *

Also Published As

Publication number Publication date
CN109188247A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109188247B (en) Electronic system abnormal state detection system and method
US9304562B2 (en) Server rack system and power management method applicable thereto
CN106557391A (en) Display screen processing method and processing device
US20150089252A1 (en) Computer system and operating method thereof
CN114389971B (en) Intelligent monitoring fine adjustment method, device, equipment and storage medium
CN111857308B (en) Server power management method and system
CN112035285A (en) Hardware watchdog circuit system based on high-pass platform and monitoring method thereof
CN111538613A (en) Cluster system exception recovery processing method and device
CN117453036A (en) Method, system and device for adjusting power consumption of equipment in server
US7289027B2 (en) Network management using suppressible RFID tags
CN116339479A (en) Control method and device of server power supply, storage medium and electronic device
CN103135728B (en) Power supply start-up control method and system thereof
CN112130913A (en) Method and system for reading memory temperature and computer readable storage medium
CN110968456B (en) Method and device for processing fault disk in distributed storage system
CN113359967B (en) Equipment starting method and device
CN114237380A (en) Intelligent power consumption control method, electronic device and storage medium
EP1351149B1 (en) Data processing system and method with status indicator
CN111857319A (en) Intelligent optimization method and system for server power consumption
CN114168393B (en) Server testing method, system, equipment and medium
CN113270848B (en) Trigger control method and device for fault protection of target circuit, controller and power supply
CN117348713A (en) Shutdown control method, electronic device and storage medium
US20220221923A1 (en) Power limit alterations of component types
CN114610584A (en) Method, device, equipment and medium for heat dissipation strategy redundancy
CN114089824B (en) Hot plug part protection method and device
CN114625574A (en) Service management method, device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant