CN112882901A - Intelligent health state monitor of distributed processing system - Google Patents
Intelligent health state monitor of distributed processing system Download PDFInfo
- Publication number
- CN112882901A CN112882901A CN202110243326.7A CN202110243326A CN112882901A CN 112882901 A CN112882901 A CN 112882901A CN 202110243326 A CN202110243326 A CN 202110243326A CN 112882901 A CN112882901 A CN 112882901A
- Authority
- CN
- China
- Prior art keywords
- health
- node
- root node
- processing system
- distributed processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000036541 health Effects 0.000 title claims abstract description 51
- 238000012545 processing Methods 0.000 title claims abstract description 35
- 238000012544 monitoring process Methods 0.000 claims abstract description 65
- 238000004891 communication Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 230000003862 health status Effects 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 2
- 239000007787 solid Substances 0.000 abstract description 2
- 238000013461 design Methods 0.000 description 1
- 238000000034 method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a distributed processing system health state intelligent monitor, which comprises a monitoring management node, a health monitoring data network switch and a health monitoring server; the number of the monitoring management nodes corresponds to the number of the monitored processors, each monitoring management node collects health state information of various functional modules inside the corresponding processor, and transmits the information to the health monitoring server through a data communication network and a health monitoring data network switch, and the health monitoring server analyzes and decides the health monitoring data, diagnoses the cause of system faults and resumes work in the shortest possible time. The working states of various components such as a power supply, a CPU (central processing unit), a memory, a solid memory and the like of the distributed processing system are monitored in real time, a system manager is assisted to quickly diagnose the cause of system faults, the testability, the maintainability and the supportability of the system are effectively improved, and meanwhile the task processing capacity of the system is greatly improved.
Description
Technical Field
The invention belongs to the technical field of embedded computer system design, and particularly relates to a health state intelligent monitor of a distributed processing system.
Background
The type and the number of the equipment of the airborne embedded system are more and more, the processing system is more and more complex, the health condition monitoring of the system is more and more difficult, the problems cannot be accurately positioned only by relying on the BIT test of the main processor in the traditional method, the task processing function of the main processor is directly influenced, and the operation efficiency of the processing resources of the system is reduced.
Disclosure of Invention
The invention aims to provide a distributed intelligent health state monitor of a processing system, which is used for meeting the requirements of a high-performance aircraft system on testability, maintainability and reliability of processing equipment.
In order to realize the task, the invention adopts the following technical scheme:
a distributed processing system health state intelligent monitor comprises a monitoring management node, a health monitoring data network switch and a health monitoring server; the number of the monitoring management nodes corresponds to the number of the monitored processors, each monitoring management node collects health state information of various functional modules inside the corresponding processor, and transmits the information to the health monitoring server through a data communication network and a health monitoring data network switch, and the health monitoring server analyzes and decides the health monitoring data, diagnoses the cause of system faults and resumes work in the shortest possible time.
Furthermore, the health monitoring data network switch realizes monitoring data exchange, the data network is FC, AFDX or Ethernet, and the network communication rate is not lower than 1 Gbps.
Further, the handler includes child nodes and a root node; the number of the root nodes is two, and the root nodes are two independent circuits which are physically located in the functional module and mutually form double backup; one communication connection is always kept between the two root nodes, one is an active root node, and the other is a backup root node; the active root node monitors the health states of all the functional modules including the functional modules where the active root node and the backup root node are located, detects the power failure and the pulling-out of the functional modules, reports events to the corresponding monitoring management nodes, receives control instructions of the monitoring management nodes, executes proper operation to carry out task scheduling of the functional modules, and prevents system faults.
Further, the physical bearers of the two root node data links are two I2C buses.
Furthermore, the sub-node is an independent circuit unit located in the functional module, and the sub-node is used for collecting and uploading sensor data, CPU state data and self-checking data in the functional module where the sub-node is located, reading a module slot number and an equipment number, and controlling power-on and power-off and resetting of the functional module.
Furthermore, the microcontroller in the child node runs module functional software and is responsible for receiving commands of an external root node and uploading sensor data and CPU and software state data; the monitoring process and content of the child nodes comprise: detecting a slot position before power-on, if the detection is correct, normally powering on the functional module, and if the detection is incorrect, reporting, wherein the functional module cannot normally supply power; power-on and reset control of the functional module; detecting voltage, current of a key circuit and temperature; detecting the running state of a core device, wherein the core device comprises a CPU, a switching chip and a memory; detecting the running state of the key application; and detecting the state of the upper line and the lower line of the port of the switching chip.
Furthermore, the sub-node monitors the voltage and the temperature of the functional module and obtains the working state of the module from the CPU through the universal serial bus;
when the voltage, the temperature or the working state of the functional module is abnormal, the child nodes send alarms to the root node, meanwhile, the information of the voltage, the temperature, the working state and the like of the module is reported to the root node in response to the query command of the root node, the root node receives the query request from the monitoring management node through the network, issues the query requests of the temperature, the voltage, the working state and the like to each functional module child node, automatically reports system alarm information to the monitoring management node, and records a system working log.
Furthermore, the monitoring management node, the root node and the child nodes are respectively powered by independent power supplies and are powered on before the functional circuits of the distributed processing system are powered on.
Compared with the prior art, the invention has the following technical characteristics:
aiming at the stronger demand of a complex environment on an embedded system, the invention provides the intelligent health state monitor of the distributed processing system, which can realize real-time monitoring of the working states of various components of a power supply, a CPU (central processing unit), a memory, a solid memory and the like of the distributed processing system, assist a system manager to quickly diagnose the cause of system faults, recover the work in the shortest possible time, effectively improve the testability, the maintainability and the supportability of the system and greatly improve the task processing capacity of the system.
Drawings
FIG. 1 is a distributed processing system state of health smart monitor;
fig. 2 is a handler content monitor functional structure.
Detailed Description
Referring to fig. 1, the intelligent health status monitor for a distributed processing system according to the present invention includes a monitoring management node, a health monitoring data network switch, and a health monitoring server; the system comprises a plurality of monitoring management nodes, a health monitoring server and a data communication network, wherein the monitoring management nodes can be arranged in a plurality of numbers according to system requirements and correspond to monitored processing machines, each monitoring management node collects health state information of various functional modules in the corresponding processing machine, transmits the information to the health monitoring server through a health monitoring data network switch through the data communication network, and the health monitoring server analyzes and decides the health monitoring data, quickly diagnoses the cause of system faults and resumes work in the shortest possible time. The health monitoring data network switch realizes monitoring data exchange, the data network can be FC, AFDX, Ethernet and the like, and the network communication rate is not lower than 1 Gbps.
As shown in fig. 2, the handler includes a child node and a root node. The number of the root nodes is two, and the root nodes are two independent circuits which are physically located in the functional module and mutually form double backup; one communication connection is always kept between the two root nodes, one is an active root node, and the other is a backup root node; the active root node monitors the health states of all functional modules including the functional modules where the active root node and the backup root node are located, detects the power failure and the pulling-out of the functional modules, reports events to corresponding monitoring management nodes, receives control instructions of the monitoring management nodes, executes proper operation to carry out task scheduling of the functional modules, and prevents system faults; the physical bearers for the two root node data links are the two I2C buses. The functional module is a module for realizing a certain function in the processor, such as a computing module, an output module, and the like.
The sub-node is also an independent circuit unit positioned in the functional module and adopts an independent power supply to supply power; the child node is mainly responsible for collecting and uploading sensor data, CPU state data and self-checking data in the functional module where the child node is located, reading the number of the module slot and the number of the equipment, and controlling the power-on and power-off and resetting of the functional module. And the microcontroller in the child node runs module functional software and is responsible for receiving commands of the external root node and uploading sensor data and CPU and software state data. The monitoring process and content of the child nodes comprise: detecting a slot position before power-on, if the detection is correct, normally powering on the functional module, and if the detection is incorrect, reporting, wherein the functional module cannot normally supply power; power-on and reset control of the functional module; detecting voltage, current of a key circuit and temperature; detecting the running state of a core device, including a CPU, a switching chip, a memory and the like; detecting the running state of the key application; and detecting the state of the upper line and the lower line of the port of the switching chip.
The monitoring management node, the root node and the child nodes are respectively powered by independent power supplies and are powered on before the functional circuits of the distributed processing system are powered on. The subnodes monitor the voltage and temperature of the functional module and acquire the working state of the module from the CPU through the universal serial bus. When the voltage, the temperature or the working state of the functional module is abnormal, the child node sends an alarm to the root node; and meanwhile, responding to the query command of the root node and reporting the information of the module such as voltage, temperature, working state and the like to the root node, wherein the root node can receive the query request from the monitoring management node through the network, issue the query requests of temperature, voltage, working state and the like to each function module sub-node, automatically report system alarm information to the monitoring management node and record a system working log.
The intelligent monitor is independent of functional components, automatically operates, saves processing resources, improves the system task processing capacity, solves the problems of insufficient health condition monitoring information and inaccurate problem positioning of a complex processing system, can assist a system manager to quickly diagnose system faults, resumes work in as short a time as possible, and effectively improves the system testability, maintainability and supportability.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equally replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.
Claims (8)
1. A distributed processing system health state intelligent monitor is characterized by comprising a monitoring management node, a health monitoring data network switch and a health monitoring server; the number of the monitoring management nodes corresponds to the number of the monitored processors, each monitoring management node collects health state information of various functional modules inside the corresponding processor, and transmits the information to the health monitoring server through a data communication network and a health monitoring data network switch, and the health monitoring server analyzes and decides the health monitoring data, diagnoses the cause of system faults and resumes work in the shortest possible time.
2. The distributed processing system health status smart monitor of claim 1, wherein the health monitoring data network switch implements monitoring data exchange, the data network is FC, AFDX or ethernet, and the network communication rate is not less than 1 Gbps.
3. The distributed processing system health intelligent monitor of claim 1, wherein the handler comprises a child node and a root node; the number of the root nodes is two, and the root nodes are two independent circuits which are physically located in the functional module and mutually form double backup; one communication connection is always kept between the two root nodes, one is an active root node, and the other is a backup root node; the active root node monitors the health states of all the functional modules including the functional modules where the active root node and the backup root node are located, detects the power failure and the pulling-out of the functional modules, reports events to the corresponding monitoring management nodes, receives control instructions of the monitoring management nodes, executes proper operation to carry out task scheduling of the functional modules, and prevents system faults.
4. The distributed processing system health state intelligent monitor of claim 3, wherein the physical bearers of the two root node data links are two I2C buses.
5. The distributed processing system health status intelligent monitor as claimed in claim 3, wherein the sub-node is an independent circuit unit located in the functional module, and the sub-node is used for collecting and uploading sensor data, CPU status data and self-checking data in the functional module, reading module slot number and device number, and controlling power-on and power-off and resetting of the functional module.
6. The distributed processing system health intelligent monitor of claim 3, wherein the micro-controller in the child node runs module function software responsible for receiving commands from an external root node and uploading sensor data and CPU, software status data; the monitoring process and content of the child nodes comprise: detecting a slot position before power-on, if the detection is correct, normally powering on the functional module, and if the detection is incorrect, reporting, wherein the functional module cannot normally supply power; power-on and reset control of the functional module; detecting voltage, current of a key circuit and temperature; detecting the running state of a core device, wherein the core device comprises a CPU, a switching chip and a memory; detecting the running state of the key application; and detecting the state of the upper line and the lower line of the port of the switching chip.
7. The distributed processing system health intelligent monitor of claim 3, wherein the sub-nodes monitor voltage, temperature of functional modules and obtain module operating status from the CPU via a universal serial bus;
when the voltage, the temperature or the working state of the functional module is abnormal, the child nodes send alarms to the root node, meanwhile, the information of the voltage, the temperature, the working state and the like of the module is reported to the root node in response to the query command of the root node, the root node receives the query request from the monitoring management node through the network, issues the query requests of the temperature, the voltage, the working state and the like to each functional module child node, automatically reports system alarm information to the monitoring management node, and records a system working log.
8. The distributed processing system health intelligent monitor of claim 3, wherein the monitoring management node, the root node, and the child nodes are powered by independent power supplies respectively, and are powered on before functional circuits of the distributed processing system are powered on.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110243326.7A CN112882901B (en) | 2021-03-04 | 2021-03-04 | Intelligent health state monitor of distributed processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110243326.7A CN112882901B (en) | 2021-03-04 | 2021-03-04 | Intelligent health state monitor of distributed processing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112882901A true CN112882901A (en) | 2021-06-01 |
CN112882901B CN112882901B (en) | 2024-06-18 |
Family
ID=76055397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110243326.7A Active CN112882901B (en) | 2021-03-04 | 2021-03-04 | Intelligent health state monitor of distributed processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112882901B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113010379A (en) * | 2021-03-09 | 2021-06-22 | 爱瑟福信息科技(上海)有限公司 | Electronic equipment monitoring system |
CN113722012A (en) * | 2021-09-07 | 2021-11-30 | 超越科技股份有限公司 | Domestic system-level management system |
CN114172829A (en) * | 2022-02-10 | 2022-03-11 | 统信软件技术有限公司 | Server health monitoring method and system and computing equipment |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2209304A1 (en) * | 1996-06-27 | 1997-12-27 | Bull S.A. | Process for monitoring numerous types of objects on numerous nodes from a management node in a computer system |
US20140223240A1 (en) * | 2013-02-01 | 2014-08-07 | International Business Machines Corporation | Selective monitoring of archive and backup storage |
US20140361978A1 (en) * | 2013-06-07 | 2014-12-11 | International Business Machines Corporation | Portable computer monitoring |
US20150154233A1 (en) * | 2013-12-02 | 2015-06-04 | Qbase, LLC | Dependency manager for databases |
CN106126407A (en) * | 2016-06-22 | 2016-11-16 | 西安交通大学 | A kind of performance monitoring Operation Optimization Systerm for distributed memory system and method |
CN109144802A (en) * | 2018-09-12 | 2019-01-04 | 杭州智享新电科技有限公司 | Internet of Things module health control diagnostic method |
CN109698775A (en) * | 2018-11-21 | 2019-04-30 | 中国航空工业集团公司洛阳电光设备研究所 | A kind of dual-machine redundancy backup system based on real-time status detection |
CN110011829A (en) * | 2019-02-28 | 2019-07-12 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Comprehensive airborne task system health control subsystem |
US20200145533A1 (en) * | 2018-11-05 | 2020-05-07 | Nice Ltd. | Method and system for creating a fragmented video recording of events on a screen using serverless computing |
CN111880997A (en) * | 2020-07-29 | 2020-11-03 | 曙光信息产业(北京)有限公司 | Distributed monitoring system, monitoring method and device |
-
2021
- 2021-03-04 CN CN202110243326.7A patent/CN112882901B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2209304A1 (en) * | 1996-06-27 | 1997-12-27 | Bull S.A. | Process for monitoring numerous types of objects on numerous nodes from a management node in a computer system |
US20140223240A1 (en) * | 2013-02-01 | 2014-08-07 | International Business Machines Corporation | Selective monitoring of archive and backup storage |
US20140361978A1 (en) * | 2013-06-07 | 2014-12-11 | International Business Machines Corporation | Portable computer monitoring |
US20150154233A1 (en) * | 2013-12-02 | 2015-06-04 | Qbase, LLC | Dependency manager for databases |
CN106126407A (en) * | 2016-06-22 | 2016-11-16 | 西安交通大学 | A kind of performance monitoring Operation Optimization Systerm for distributed memory system and method |
CN109144802A (en) * | 2018-09-12 | 2019-01-04 | 杭州智享新电科技有限公司 | Internet of Things module health control diagnostic method |
US20200145533A1 (en) * | 2018-11-05 | 2020-05-07 | Nice Ltd. | Method and system for creating a fragmented video recording of events on a screen using serverless computing |
CN109698775A (en) * | 2018-11-21 | 2019-04-30 | 中国航空工业集团公司洛阳电光设备研究所 | A kind of dual-machine redundancy backup system based on real-time status detection |
CN110011829A (en) * | 2019-02-28 | 2019-07-12 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Comprehensive airborne task system health control subsystem |
CN111880997A (en) * | 2020-07-29 | 2020-11-03 | 曙光信息产业(北京)有限公司 | Distributed monitoring system, monitoring method and device |
Non-Patent Citations (1)
Title |
---|
肖智明: "工业制造设备监控系统的研究与实现", 《中国优秀硕士学位论文全文数据库》, 15 June 2020 (2020-06-15), pages 029 - 167 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113010379A (en) * | 2021-03-09 | 2021-06-22 | 爱瑟福信息科技(上海)有限公司 | Electronic equipment monitoring system |
CN113010379B (en) * | 2021-03-09 | 2024-03-15 | 爱瑟福信息科技(上海)有限公司 | Electronic equipment monitoring system |
CN113722012A (en) * | 2021-09-07 | 2021-11-30 | 超越科技股份有限公司 | Domestic system-level management system |
CN114172829A (en) * | 2022-02-10 | 2022-03-11 | 统信软件技术有限公司 | Server health monitoring method and system and computing equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112882901B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112882901B (en) | Intelligent health state monitor of distributed processing system | |
EP2093934B1 (en) | System, device, equipment and method for monitoring management | |
CN100565470C (en) | A kind of blog management method and device | |
CN111831488B (en) | TCMS-MPU control unit with safety level design | |
CN104035831A (en) | High-end fault-tolerant computer management system and method | |
CN107632907B (en) | BMC chip hosting system and control method thereof | |
CN111124981B (en) | Management system and method for server I2C equipment | |
EP3306422B1 (en) | Arithmetic device and control apparatus | |
CN117992270B (en) | Memory resource management system, method, device, equipment and storage medium | |
CN116126772A (en) | UART serial port management system and method applied to ARM server | |
CN111880999B (en) | High-availability monitoring management device for high-density blade server and redundancy switching method | |
CN107026759A (en) | The firmware and its development approach of a kind of remote management BBU modules based on BMC | |
CN103176516B (en) | The monitoring method of cabinet system and cabinet system | |
CN105471652A (en) | Big data all-in-one machine and redundancy management unit thereof | |
CN116483613B (en) | Processing method and device of fault memory bank, electronic equipment and storage medium | |
CN111628944B (en) | Switch and switch system | |
CN206460446U (en) | A kind of supervising device for ruggedized computer mainboard | |
CN106407081B (en) | Case management system and server | |
CN210222525U (en) | Flight parameter system main control module with health monitoring circuit and flight parameter system | |
CN116610430A (en) | Method for realizing electrified operation and maintenance of processor and server system | |
CN111984471A (en) | Cabinet power BMC redundancy management system and method | |
WO2023125702A1 (en) | Cloud management method and system for battery swapping station, server, and storage medium | |
CN115168141A (en) | Optical interface management system, method, device, programmable logic device and storage medium | |
CN101741654B (en) | Monitoring device and method of operating system | |
CN108388488A (en) | A kind of intelligent platform management system and fault handling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |