CN112882901A - Intelligent health state monitor of distributed processing system - Google Patents

Intelligent health state monitor of distributed processing system Download PDF

Info

Publication number
CN112882901A
CN112882901A CN202110243326.7A CN202110243326A CN112882901A CN 112882901 A CN112882901 A CN 112882901A CN 202110243326 A CN202110243326 A CN 202110243326A CN 112882901 A CN112882901 A CN 112882901A
Authority
CN
China
Prior art keywords
health
node
root node
processing system
distributed processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110243326.7A
Other languages
Chinese (zh)
Other versions
CN112882901B (en
Inventor
李成文
韩强
张伟栋
陈国�
丰生磊
赵子杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202110243326.7A priority Critical patent/CN112882901B/en
Publication of CN112882901A publication Critical patent/CN112882901A/en
Application granted granted Critical
Publication of CN112882901B publication Critical patent/CN112882901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a distributed processing system health state intelligent monitor, which comprises a monitoring management node, a health monitoring data network switch and a health monitoring server; the number of the monitoring management nodes corresponds to the number of the monitored processors, each monitoring management node collects health state information of various functional modules inside the corresponding processor, and transmits the information to the health monitoring server through a data communication network and a health monitoring data network switch, and the health monitoring server analyzes and decides the health monitoring data, diagnoses the cause of system faults and resumes work in the shortest possible time. The working states of various components such as a power supply, a CPU (central processing unit), a memory, a solid memory and the like of the distributed processing system are monitored in real time, a system manager is assisted to quickly diagnose the cause of system faults, the testability, the maintainability and the supportability of the system are effectively improved, and meanwhile the task processing capacity of the system is greatly improved.

Description

Intelligent health state monitor of distributed processing system
Technical Field
The invention belongs to the technical field of embedded computer system design, and particularly relates to a health state intelligent monitor of a distributed processing system.
Background
The type and the number of the equipment of the airborne embedded system are more and more, the processing system is more and more complex, the health condition monitoring of the system is more and more difficult, the problems cannot be accurately positioned only by relying on the BIT test of the main processor in the traditional method, the task processing function of the main processor is directly influenced, and the operation efficiency of the processing resources of the system is reduced.
Disclosure of Invention
The invention aims to provide a distributed intelligent health state monitor of a processing system, which is used for meeting the requirements of a high-performance aircraft system on testability, maintainability and reliability of processing equipment.
In order to realize the task, the invention adopts the following technical scheme:
a distributed processing system health state intelligent monitor comprises a monitoring management node, a health monitoring data network switch and a health monitoring server; the number of the monitoring management nodes corresponds to the number of the monitored processors, each monitoring management node collects health state information of various functional modules inside the corresponding processor, and transmits the information to the health monitoring server through a data communication network and a health monitoring data network switch, and the health monitoring server analyzes and decides the health monitoring data, diagnoses the cause of system faults and resumes work in the shortest possible time.
Furthermore, the health monitoring data network switch realizes monitoring data exchange, the data network is FC, AFDX or Ethernet, and the network communication rate is not lower than 1 Gbps.
Further, the handler includes child nodes and a root node; the number of the root nodes is two, and the root nodes are two independent circuits which are physically located in the functional module and mutually form double backup; one communication connection is always kept between the two root nodes, one is an active root node, and the other is a backup root node; the active root node monitors the health states of all the functional modules including the functional modules where the active root node and the backup root node are located, detects the power failure and the pulling-out of the functional modules, reports events to the corresponding monitoring management nodes, receives control instructions of the monitoring management nodes, executes proper operation to carry out task scheduling of the functional modules, and prevents system faults.
Further, the physical bearers of the two root node data links are two I2C buses.
Furthermore, the sub-node is an independent circuit unit located in the functional module, and the sub-node is used for collecting and uploading sensor data, CPU state data and self-checking data in the functional module where the sub-node is located, reading a module slot number and an equipment number, and controlling power-on and power-off and resetting of the functional module.
Furthermore, the microcontroller in the child node runs module functional software and is responsible for receiving commands of an external root node and uploading sensor data and CPU and software state data; the monitoring process and content of the child nodes comprise: detecting a slot position before power-on, if the detection is correct, normally powering on the functional module, and if the detection is incorrect, reporting, wherein the functional module cannot normally supply power; power-on and reset control of the functional module; detecting voltage, current of a key circuit and temperature; detecting the running state of a core device, wherein the core device comprises a CPU, a switching chip and a memory; detecting the running state of the key application; and detecting the state of the upper line and the lower line of the port of the switching chip.
Furthermore, the sub-node monitors the voltage and the temperature of the functional module and obtains the working state of the module from the CPU through the universal serial bus;
when the voltage, the temperature or the working state of the functional module is abnormal, the child nodes send alarms to the root node, meanwhile, the information of the voltage, the temperature, the working state and the like of the module is reported to the root node in response to the query command of the root node, the root node receives the query request from the monitoring management node through the network, issues the query requests of the temperature, the voltage, the working state and the like to each functional module child node, automatically reports system alarm information to the monitoring management node, and records a system working log.
Furthermore, the monitoring management node, the root node and the child nodes are respectively powered by independent power supplies and are powered on before the functional circuits of the distributed processing system are powered on.
Compared with the prior art, the invention has the following technical characteristics:
aiming at the stronger demand of a complex environment on an embedded system, the invention provides the intelligent health state monitor of the distributed processing system, which can realize real-time monitoring of the working states of various components of a power supply, a CPU (central processing unit), a memory, a solid memory and the like of the distributed processing system, assist a system manager to quickly diagnose the cause of system faults, recover the work in the shortest possible time, effectively improve the testability, the maintainability and the supportability of the system and greatly improve the task processing capacity of the system.
Drawings
FIG. 1 is a distributed processing system state of health smart monitor;
fig. 2 is a handler content monitor functional structure.
Detailed Description
Referring to fig. 1, the intelligent health status monitor for a distributed processing system according to the present invention includes a monitoring management node, a health monitoring data network switch, and a health monitoring server; the system comprises a plurality of monitoring management nodes, a health monitoring server and a data communication network, wherein the monitoring management nodes can be arranged in a plurality of numbers according to system requirements and correspond to monitored processing machines, each monitoring management node collects health state information of various functional modules in the corresponding processing machine, transmits the information to the health monitoring server through a health monitoring data network switch through the data communication network, and the health monitoring server analyzes and decides the health monitoring data, quickly diagnoses the cause of system faults and resumes work in the shortest possible time. The health monitoring data network switch realizes monitoring data exchange, the data network can be FC, AFDX, Ethernet and the like, and the network communication rate is not lower than 1 Gbps.
As shown in fig. 2, the handler includes a child node and a root node. The number of the root nodes is two, and the root nodes are two independent circuits which are physically located in the functional module and mutually form double backup; one communication connection is always kept between the two root nodes, one is an active root node, and the other is a backup root node; the active root node monitors the health states of all functional modules including the functional modules where the active root node and the backup root node are located, detects the power failure and the pulling-out of the functional modules, reports events to corresponding monitoring management nodes, receives control instructions of the monitoring management nodes, executes proper operation to carry out task scheduling of the functional modules, and prevents system faults; the physical bearers for the two root node data links are the two I2C buses. The functional module is a module for realizing a certain function in the processor, such as a computing module, an output module, and the like.
The sub-node is also an independent circuit unit positioned in the functional module and adopts an independent power supply to supply power; the child node is mainly responsible for collecting and uploading sensor data, CPU state data and self-checking data in the functional module where the child node is located, reading the number of the module slot and the number of the equipment, and controlling the power-on and power-off and resetting of the functional module. And the microcontroller in the child node runs module functional software and is responsible for receiving commands of the external root node and uploading sensor data and CPU and software state data. The monitoring process and content of the child nodes comprise: detecting a slot position before power-on, if the detection is correct, normally powering on the functional module, and if the detection is incorrect, reporting, wherein the functional module cannot normally supply power; power-on and reset control of the functional module; detecting voltage, current of a key circuit and temperature; detecting the running state of a core device, including a CPU, a switching chip, a memory and the like; detecting the running state of the key application; and detecting the state of the upper line and the lower line of the port of the switching chip.
The monitoring management node, the root node and the child nodes are respectively powered by independent power supplies and are powered on before the functional circuits of the distributed processing system are powered on. The subnodes monitor the voltage and temperature of the functional module and acquire the working state of the module from the CPU through the universal serial bus. When the voltage, the temperature or the working state of the functional module is abnormal, the child node sends an alarm to the root node; and meanwhile, responding to the query command of the root node and reporting the information of the module such as voltage, temperature, working state and the like to the root node, wherein the root node can receive the query request from the monitoring management node through the network, issue the query requests of temperature, voltage, working state and the like to each function module sub-node, automatically report system alarm information to the monitoring management node and record a system working log.
The intelligent monitor is independent of functional components, automatically operates, saves processing resources, improves the system task processing capacity, solves the problems of insufficient health condition monitoring information and inaccurate problem positioning of a complex processing system, can assist a system manager to quickly diagnose system faults, resumes work in as short a time as possible, and effectively improves the system testability, maintainability and supportability.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equally replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims (8)

1. A distributed processing system health state intelligent monitor is characterized by comprising a monitoring management node, a health monitoring data network switch and a health monitoring server; the number of the monitoring management nodes corresponds to the number of the monitored processors, each monitoring management node collects health state information of various functional modules inside the corresponding processor, and transmits the information to the health monitoring server through a data communication network and a health monitoring data network switch, and the health monitoring server analyzes and decides the health monitoring data, diagnoses the cause of system faults and resumes work in the shortest possible time.
2. The distributed processing system health status smart monitor of claim 1, wherein the health monitoring data network switch implements monitoring data exchange, the data network is FC, AFDX or ethernet, and the network communication rate is not less than 1 Gbps.
3. The distributed processing system health intelligent monitor of claim 1, wherein the handler comprises a child node and a root node; the number of the root nodes is two, and the root nodes are two independent circuits which are physically located in the functional module and mutually form double backup; one communication connection is always kept between the two root nodes, one is an active root node, and the other is a backup root node; the active root node monitors the health states of all the functional modules including the functional modules where the active root node and the backup root node are located, detects the power failure and the pulling-out of the functional modules, reports events to the corresponding monitoring management nodes, receives control instructions of the monitoring management nodes, executes proper operation to carry out task scheduling of the functional modules, and prevents system faults.
4. The distributed processing system health state intelligent monitor of claim 3, wherein the physical bearers of the two root node data links are two I2C buses.
5. The distributed processing system health status intelligent monitor as claimed in claim 3, wherein the sub-node is an independent circuit unit located in the functional module, and the sub-node is used for collecting and uploading sensor data, CPU status data and self-checking data in the functional module, reading module slot number and device number, and controlling power-on and power-off and resetting of the functional module.
6. The distributed processing system health intelligent monitor of claim 3, wherein the micro-controller in the child node runs module function software responsible for receiving commands from an external root node and uploading sensor data and CPU, software status data; the monitoring process and content of the child nodes comprise: detecting a slot position before power-on, if the detection is correct, normally powering on the functional module, and if the detection is incorrect, reporting, wherein the functional module cannot normally supply power; power-on and reset control of the functional module; detecting voltage, current of a key circuit and temperature; detecting the running state of a core device, wherein the core device comprises a CPU, a switching chip and a memory; detecting the running state of the key application; and detecting the state of the upper line and the lower line of the port of the switching chip.
7. The distributed processing system health intelligent monitor of claim 3, wherein the sub-nodes monitor voltage, temperature of functional modules and obtain module operating status from the CPU via a universal serial bus;
when the voltage, the temperature or the working state of the functional module is abnormal, the child nodes send alarms to the root node, meanwhile, the information of the voltage, the temperature, the working state and the like of the module is reported to the root node in response to the query command of the root node, the root node receives the query request from the monitoring management node through the network, issues the query requests of the temperature, the voltage, the working state and the like to each functional module child node, automatically reports system alarm information to the monitoring management node, and records a system working log.
8. The distributed processing system health intelligent monitor of claim 3, wherein the monitoring management node, the root node, and the child nodes are powered by independent power supplies respectively, and are powered on before functional circuits of the distributed processing system are powered on.
CN202110243326.7A 2021-03-04 2021-03-04 Intelligent health state monitor of distributed processing system Active CN112882901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110243326.7A CN112882901B (en) 2021-03-04 2021-03-04 Intelligent health state monitor of distributed processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110243326.7A CN112882901B (en) 2021-03-04 2021-03-04 Intelligent health state monitor of distributed processing system

Publications (2)

Publication Number Publication Date
CN112882901A true CN112882901A (en) 2021-06-01
CN112882901B CN112882901B (en) 2024-06-18

Family

ID=76055397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110243326.7A Active CN112882901B (en) 2021-03-04 2021-03-04 Intelligent health state monitor of distributed processing system

Country Status (1)

Country Link
CN (1) CN112882901B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010379A (en) * 2021-03-09 2021-06-22 爱瑟福信息科技(上海)有限公司 Electronic equipment monitoring system
CN113722012A (en) * 2021-09-07 2021-11-30 超越科技股份有限公司 Domestic system-level management system
CN114172829A (en) * 2022-02-10 2022-03-11 统信软件技术有限公司 Server health monitoring method and system and computing equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2209304A1 (en) * 1996-06-27 1997-12-27 Bull S.A. Process for monitoring numerous types of objects on numerous nodes from a management node in a computer system
US20140223240A1 (en) * 2013-02-01 2014-08-07 International Business Machines Corporation Selective monitoring of archive and backup storage
US20140361978A1 (en) * 2013-06-07 2014-12-11 International Business Machines Corporation Portable computer monitoring
US20150154233A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC Dependency manager for databases
CN106126407A (en) * 2016-06-22 2016-11-16 西安交通大学 A kind of performance monitoring Operation Optimization Systerm for distributed memory system and method
CN109144802A (en) * 2018-09-12 2019-01-04 杭州智享新电科技有限公司 Internet of Things module health control diagnostic method
CN109698775A (en) * 2018-11-21 2019-04-30 中国航空工业集团公司洛阳电光设备研究所 A kind of dual-machine redundancy backup system based on real-time status detection
CN110011829A (en) * 2019-02-28 2019-07-12 西南电子技术研究所(中国电子科技集团公司第十研究所) Comprehensive airborne task system health control subsystem
US20200145533A1 (en) * 2018-11-05 2020-05-07 Nice Ltd. Method and system for creating a fragmented video recording of events on a screen using serverless computing
CN111880997A (en) * 2020-07-29 2020-11-03 曙光信息产业(北京)有限公司 Distributed monitoring system, monitoring method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2209304A1 (en) * 1996-06-27 1997-12-27 Bull S.A. Process for monitoring numerous types of objects on numerous nodes from a management node in a computer system
US20140223240A1 (en) * 2013-02-01 2014-08-07 International Business Machines Corporation Selective monitoring of archive and backup storage
US20140361978A1 (en) * 2013-06-07 2014-12-11 International Business Machines Corporation Portable computer monitoring
US20150154233A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC Dependency manager for databases
CN106126407A (en) * 2016-06-22 2016-11-16 西安交通大学 A kind of performance monitoring Operation Optimization Systerm for distributed memory system and method
CN109144802A (en) * 2018-09-12 2019-01-04 杭州智享新电科技有限公司 Internet of Things module health control diagnostic method
US20200145533A1 (en) * 2018-11-05 2020-05-07 Nice Ltd. Method and system for creating a fragmented video recording of events on a screen using serverless computing
CN109698775A (en) * 2018-11-21 2019-04-30 中国航空工业集团公司洛阳电光设备研究所 A kind of dual-machine redundancy backup system based on real-time status detection
CN110011829A (en) * 2019-02-28 2019-07-12 西南电子技术研究所(中国电子科技集团公司第十研究所) Comprehensive airborne task system health control subsystem
CN111880997A (en) * 2020-07-29 2020-11-03 曙光信息产业(北京)有限公司 Distributed monitoring system, monitoring method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肖智明: "工业制造设备监控系统的研究与实现", 《中国优秀硕士学位论文全文数据库》, 15 June 2020 (2020-06-15), pages 029 - 167 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010379A (en) * 2021-03-09 2021-06-22 爱瑟福信息科技(上海)有限公司 Electronic equipment monitoring system
CN113010379B (en) * 2021-03-09 2024-03-15 爱瑟福信息科技(上海)有限公司 Electronic equipment monitoring system
CN113722012A (en) * 2021-09-07 2021-11-30 超越科技股份有限公司 Domestic system-level management system
CN114172829A (en) * 2022-02-10 2022-03-11 统信软件技术有限公司 Server health monitoring method and system and computing equipment

Also Published As

Publication number Publication date
CN112882901B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN112882901B (en) Intelligent health state monitor of distributed processing system
EP2093934B1 (en) System, device, equipment and method for monitoring management
CN100565470C (en) A kind of blog management method and device
CN111831488B (en) TCMS-MPU control unit with safety level design
CN104035831A (en) High-end fault-tolerant computer management system and method
CN107632907B (en) BMC chip hosting system and control method thereof
CN111124981B (en) Management system and method for server I2C equipment
EP3306422B1 (en) Arithmetic device and control apparatus
CN117992270B (en) Memory resource management system, method, device, equipment and storage medium
CN116126772A (en) UART serial port management system and method applied to ARM server
CN111880999B (en) High-availability monitoring management device for high-density blade server and redundancy switching method
CN107026759A (en) The firmware and its development approach of a kind of remote management BBU modules based on BMC
CN103176516B (en) The monitoring method of cabinet system and cabinet system
CN105471652A (en) Big data all-in-one machine and redundancy management unit thereof
CN116483613B (en) Processing method and device of fault memory bank, electronic equipment and storage medium
CN111628944B (en) Switch and switch system
CN206460446U (en) A kind of supervising device for ruggedized computer mainboard
CN106407081B (en) Case management system and server
CN210222525U (en) Flight parameter system main control module with health monitoring circuit and flight parameter system
CN116610430A (en) Method for realizing electrified operation and maintenance of processor and server system
CN111984471A (en) Cabinet power BMC redundancy management system and method
WO2023125702A1 (en) Cloud management method and system for battery swapping station, server, and storage medium
CN115168141A (en) Optical interface management system, method, device, programmable logic device and storage medium
CN101741654B (en) Monitoring device and method of operating system
CN108388488A (en) A kind of intelligent platform management system and fault handling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant