CN109631994A - Operate automatic detection and the Fault Locating Method of indication control board - Google Patents

Operate automatic detection and the Fault Locating Method of indication control board Download PDF

Info

Publication number
CN109631994A
CN109631994A CN201811531467.3A CN201811531467A CN109631994A CN 109631994 A CN109631994 A CN 109631994A CN 201811531467 A CN201811531467 A CN 201811531467A CN 109631994 A CN109631994 A CN 109631994A
Authority
CN
China
Prior art keywords
bmc
control board
indication control
monitoring
automatic detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811531467.3A
Other languages
Chinese (zh)
Inventor
詹承华
张力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Computer Technology and Applications
Priority to CN201811531467.3A priority Critical patent/CN109631994A/en
Publication of CN109631994A publication Critical patent/CN109631994A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01DMEASURING NOT SPECIALLY ADAPTED FOR A SPECIFIC VARIABLE; ARRANGEMENTS FOR MEASURING TWO OR MORE VARIABLES NOT COVERED IN A SINGLE OTHER SUBCLASS; TARIFF METERING APPARATUS; MEASURING OR TESTING NOT OTHERWISE PROVIDED FOR
    • G01D21/00Measuring or testing not otherwise provided for
    • G01D21/02Measuring two or more variables by means not covered by a single other subclass

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention relates to a kind of automatic detection for operating indication control board and Fault Locating Methods, wherein management framework includes system-level BMC and cell level BMC in operation indication control board system;The first step sorts out all cell level BMC monitoring data reported according to corresponding unit marks number;Second step, qualification determination is carried out to the data that all cell level BMC are reported, determined whether in threshold range according to the threshold range of setting, if not starting photoelectric alarm if, and device id and corresponding fault value are recorded in log, according to the state of link monitoring, determine whether current main communication network can be used, device id and corresponding malfunction are recorded in log;Third step carries out all monitoring states of real-time display according to device id.The automatic detection of operation of the present invention indication control board and Fault Locating Method can be realized the automatic monitoring and fault location of the complicated special equipment such as operation indication control board.

Description

Operate automatic detection and the Fault Locating Method of indication control board
Technical field
The present invention relates to automated diagnostic measuring technology, in particular to a kind of automatic detection for operating indication control board, failure Localization method.
Background technique
Modern special equipment complicated composition, each subsystem or equipment are produced by different unit complete designs substantially, and The health status of all subsystems finally determines the normal operation entirely equipped.Therefore, more and more engineering technology and science Research field all starts to pay attention to automatic test and the maintenance of system level.
For a long time, it for the test of Large-Scale Equipment and maintenance, needs with a variety of testing meter and instrument equipment (such as: ten thousand With table, oscillograph, frequency spectrograph etc.), it is completed by manual intervention, elapsed time is long, accuracy is poor, skills to test maintaining personnel Art level requirement is high.Since that there is methods is not careful enough, technological means is original, large labor intensity for traditional detection mode, especially In the equipment using large scale integrated circuit, high-speed communication as core, traditional method is no longer satisfied quick diagnosis, fast for it Speed positioning, rapid-maintenance demand, and automated diagnostic, measuring technology can be with the above-mentioned all problems of effective solution.
Automated diagnostic measuring technology, will by the thinking of networking based on key signal, position detection sensing All detection information Unified Sequencesization are sorted out, finally to user or administrator formed all critical values of a set of visualization display, System mode realizes the function that failure system automatic alarm, fault data report and historgraphic data recording is inquired, and is subsequent intelligence Analyzing and diagnosing can be changed, foundation is provided.
Summary of the invention
The purpose of the present invention is to provide a kind of automatic detection for operating indication control board and Fault Locating Methods, for solving Certainly above-mentioned problem of the prior art.
A kind of automatic detection for operating indication control board of the present invention and Fault Locating Method, wherein operation indication control board system Interior management framework includes system-level BMC and cell level BMC;The first step, the monitoring data that all cell level BMC are reported according to Corresponding unit marks number are sorted out;Second step, the data reported to all cell level BMC carry out qualification determination, according to setting The threshold range set determines whether in threshold range, if not starting photoelectric alarm if, and by device id and correspondence Fault value be recorded in log, according to the state of link monitoring, determine whether current main communication network can be used, by device identification Number and corresponding malfunction be recorded in log;Third step carries out all monitoring states of real-time display according to device id.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein be Main body of the irrespective of size BMC as system administration formulates monitoring policy, monitoring cycle, setting empirical value, failure judgment method, day Will writeback policies and visualization display mode;The main body that each unit grade BMC is executed as monitoring, using hub-and-spoke configuration and system Grade BMC interconnection.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein right The monitoring data that all cell level BMC are reported are sorted out according to corresponding unit marks number, specifically include: the main process task of unit Whether device temperature value, module level temperature value, complete machine current value, heartbeat normal, electric board residual capacity size and link state Value.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein right The data that all cell level BMC are reported carry out qualification determination, according to the threshold range of setting, comprising: determine temperature value, voltage Whether value, current value are in threshold range.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein such as Fruit does not start photoelectric alarm then in threshold range.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein also Including the state according to link monitoring, determine that whether normal mouse, keyboard and basic display unit be in place.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein behaviour It include: main processor modules, power module, display, status control module and memory module, primary processor mould as indication control board Carrier of the block as system-level BMC, carrier of other modules as cell level BMC.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein be Parameter and state that each cell level BMC needs to monitor is arranged by user configuration in irrespective of size BMC, and passes through IPMB bus, according to Identification number is synchronized on the BMC of each cell level, after cell level receives order and parameter, completes the prison of oneself state as requested It surveys, power module monitors the supply voltage value, current value and temperature value of itself;Display monitors the signal input state of itself And link state;Status control module monitors the voltage value and serial communication state of itself;Memory module monitors the defeated of itself Enter voltage value and residual storage capacity.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein be The data that irrespective of size BMC reports each unit grade BMC are compared according to identification number Classifying Sum according to the threshold range of setting, When the actual value of monitoring or state are in threshold range, real-time display;When outside threshold range, acousto-optic electric alarms based on pattern recognition is triggered And show device number and the barrier position of trouble unit, record fault log.
The automatic detection of operations according to the instant invention indication control board and an embodiment of Fault Locating Method, wherein single First grade BMC passes through the current parameters and voltage parameter of sensor LTC2991 acquisition module, acquires mould by sensor LM75CIMM The temperature parameter of block determines network communication link state by the actual value of network interface card LinkStatus status register, by aobvious Show that the level value of the HotPlug signal of device determines the state of display signal, is determined by the actual value of the MAR register of Flash The residual capacity size of Current electronic disk, cell level BMC is to all collected parameter values and status information according to IPMI protocol It is packaged, is reported to system-level BMC according to loop cycle.
The present invention devises a kind of automated detection method of complicated special equipment, uses to operate indication control board as prototype Standard Smart Management Bus agreement realizes that all subsystems are related in equipment processor, memory, network, electric board, display connect The keys such as the state-detection of the functions such as mouth, interactive interface, and the running temperature to critical component, running current, working voltage Parameter carries out real-time monitoring.Converge to System Management Unit by Smart Management Bus, realize statistic of classification, logic calculation and Visualization display, and be reported in external server or other storage equipment by network, history of forming data.
Detailed description of the invention
Fig. 1 show the architectural framework and flow chart of operation indication control board automatic monitoring, fault location
Specific embodiment
To keep the purpose of the present invention, content and advantage clearer, with reference to the accompanying drawings and examples, to of the invention Specific embodiment is described in further detail.
Fig. 1 show the architectural framework and flow chart of operation indication control board automatic monitoring, fault location, as shown in Figure 1, this Invention devises management framework in a kind of operation indication control board system, including system-level BMC, cell level BMC and all monitoring numbers According to processing method.
As shown in Figure 1, main body of the system-level BMC as system administration, formulates monitoring policy, monitoring cycle, setting experience Threshold value, failure judgment method, log writeback policies and visualization display mode;The master that each unit grade BMC is executed as monitoring Body is interconnected using hub-and-spoke configuration and system-level BMC, including the environmental sensors such as temperature, electric current, voltage, and with monitored pair State judgment basis (heartbeat signal of such as primary processor, the read-write state of electric board and capacity status, network letter as between Number link state, show the monitoring state etc. of signal), according to system-level BMC synchronous monitoring type, monitoring cycle, monitoring The parameters such as precision execute specific parameter acquisition, status monitoring task, while according to monitoring cycle real-time report monitoring data.
As shown in Figure 1, the processing method of monitoring data, comprising: the first step, the monitoring number reported to all cell level BMC Sorted out according to according to corresponding unit marks number, specifically includes primary processor temperature value T1, module level that unit 1 arrives unit n Temperature value T2, the complete machine current value A1 of unit 1, the whether normal H1 of heartbeat, electric board residual capacity size R1, link state value L1 Deng;Second step carries out qualification determination to the data that all cell level BMC are reported, according to the threshold range of setting, determines temperature value Whether T1, voltage value V1, current value A1 etc. are in threshold range, if not starting photoelectric alarm if, and by device id It is recorded in log with corresponding fault value.According to the state of link monitoring, determine whether current main communication network can be used, mouse Whether mark, keyboard, basic display unit are normal in place, start audible alarm if abnormal, and by device id and corresponding Malfunction is recorded in log;Third step carries out all monitoring states of real-time display according to device id.
As shown in Figure 1, operation indication control board specifically includes: main processor modules, power module, display, state control mould Block, memory module etc..Carrier of the main processor modules as system-level BMC, carrier of other modules as cell level BMC.
Parameter and state that each cell level BMC needs to monitor is arranged by user configuration in BMC system-level first, and leads to IPMB bus is crossed, is synchronized to according to identification number on the BMC of each cell level.After cell level receives order and parameter, as requested The monitoring for completing oneself state, as power module monitors the supply voltage value, current value, temperature value of itself;Display monitoring is certainly The signal input state and link state of body;Status control module monitors the voltage value of itself, serial communication state;Memory module Monitor itself input voltage value, residual storage capacity etc..Above-mentioned all monitorings are held according to the monitoring cycle circulation set Row.
As shown in Figure 1, secondly cell level BMC passes through the current parameters and voltage parameter of sensor LTC2991 acquisition module, By the temperature parameter of sensor LM75CIMM acquisition module, determined by the actual value of network interface card LinkStatus status register Network communication link state is determined the state of display signal by the level value of the HotPlug signal of display, passes through Flash MAR (memory register) register actual value determine Current electronic disk residual capacity size.Cell level BMC is to all Collected parameter value and status information are packaged according to IPMI protocol, are reported to system-level BMC according to loop cycle.
As shown in Figure 1, the data that report to each unit grade BMC of final system grade BMC are according to identification number Classifying Sum, according to The threshold range of setting is compared, when the actual value of monitoring or state are in threshold range, real-time display;When in threshold value model When enclosing outer, triggers acousto-optic electric alarms based on pattern recognition and show the device number of trouble unit, abort situation, record fault log.
A kind of operation indication control board automatic detection of the invention, localization method pass through and manage bus, system in designing system Grade BMC (onboard administrative unit), cell level BMC realize that overall and subsystem interconnects;It synchronized, reported, vlan query protocol VLAN by design, Realize issuing for order and reporting for data;By setting empirical value, write-back monitor log, realize fault location, alarm with Failure logging;Stateful visualization display is realized finally by display equipment.
The automatic detection of operation of the present invention indication control board and Fault Locating Method, in this way the advantages of be can Realize the automatic monitoring and fault location of the complicated special equipment of operation indication control board etc., and all subsystem, module are all abided by The IPMB bus for following standard, realizes the IPMI protocol of standard, it is easier to realize the unified monitoring of large-scale and complicated device health status And positioning, the implementation being more not limited to inside subsystem, as long as monitoring interface, which complies with standard, to be monitored by the overall situation, from And loosely coupled design theory is realized in certain level.Meanwhile this mode can also improve Measuring error efficiency, all parameters It monitors, report automatically, show automatically, be automatically positioned and alarm automatically with state, maintenance personnel only needs to check fault alarm Service work can be completed in position.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of automatic detection for operating indication control board and Fault Locating Method, which is characterized in that in operation indication control board system Management framework includes system-level BMC and cell level BMC;
The first step sorts out all cell level BMC monitoring data reported according to corresponding unit marks number;
Second step carries out qualification determination to the data that all cell level BMC are reported, is determined whether according to the threshold range of setting In threshold range, if not starting photoelectric alarm if, and device id and corresponding fault value are recorded in log, root According to the state of link monitoring, determine whether current main communication network can be used, device id and corresponding malfunction are recorded In log;
Third step carries out all monitoring states of real-time display according to device id.
2. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that system Main body of the grade BMC as system administration formulates monitoring policy, monitoring cycle, setting empirical value, failure judgment method, log Writeback policies and visualization display mode;
The main body that each unit grade BMC is executed as monitoring is interconnected using hub-and-spoke configuration and system-level BMC.
3. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that institute The monitoring data for having cell level BMC to report are sorted out according to corresponding unit marks number, specifically include: the primary processor of unit Whether temperature value, module level temperature value, complete machine current value, heartbeat normal, electric board residual capacity size and link state value.
4. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that institute The data that have cell level BMC to report carry out qualification determination, according to the threshold range of setting, comprising: determine temperature value, voltage value, Whether current value is in threshold range.
5. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that if Photoelectric alarm is not started then in threshold range.
6. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that also wrap The state according to link monitoring is included, determines that whether normal mouse, keyboard and basic display unit be in place.
7. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that operation Indication control board includes: main processor modules, power module, display, status control module and memory module, main processor modules As the carrier of system-level BMC, carrier of other modules as cell level BMC.
8. the automatic detection of operation indication control board and Fault Locating Method as claimed in claim 7, which is characterized in that system Parameter and state that each cell level BMC needs to monitor is arranged by user configuration in grade BMC, and by IPMB bus, according to mark Knowledge number is synchronized on the BMC of each cell level, after cell level receives order and parameter, completes the prison of oneself state as requested It surveys, power module monitors the supply voltage value, current value and temperature value of itself;Display monitors the signal input state of itself And link state;Status control module monitors the voltage value and serial communication state of itself;Memory module monitors the defeated of itself Enter voltage value and residual storage capacity.
9. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that system The data that grade BMC reports each unit grade BMC are compared according to identification number Classifying Sum according to the threshold range of setting, when When the actual value or state of monitoring are in threshold range, real-time display;When outside threshold range, triggering acousto-optic electric alarms based on pattern recognition is simultaneously Device number and the barrier position for showing trouble unit, record fault log.
10. the automatic detection of operation indication control board and Fault Locating Method as described in claim 1, which is characterized in that single First grade BMC passes through the current parameters and voltage parameter of sensor LTC2991 acquisition module, acquires mould by sensor LM75CIMM The temperature parameter of block determines network communication link state by the actual value of network interface card LinkStatus status register, by aobvious Show that the level value of the HotPlug signal of device determines the state of display signal, is determined by the actual value of the MAR register of Flash The residual capacity size of Current electronic disk, cell level BMC is to all collected parameter values and status information according to IPMI protocol It is packaged, is reported to system-level BMC according to loop cycle.
CN201811531467.3A 2018-12-14 2018-12-14 Operate automatic detection and the Fault Locating Method of indication control board Pending CN109631994A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811531467.3A CN109631994A (en) 2018-12-14 2018-12-14 Operate automatic detection and the Fault Locating Method of indication control board

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811531467.3A CN109631994A (en) 2018-12-14 2018-12-14 Operate automatic detection and the Fault Locating Method of indication control board

Publications (1)

Publication Number Publication Date
CN109631994A true CN109631994A (en) 2019-04-16

Family

ID=66073884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811531467.3A Pending CN109631994A (en) 2018-12-14 2018-12-14 Operate automatic detection and the Fault Locating Method of indication control board

Country Status (1)

Country Link
CN (1) CN109631994A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111629043A (en) * 2020-05-21 2020-09-04 北京计算机技术及应用研究所 Cross-platform health management system based on cloud mode
CN111884830A (en) * 2020-06-24 2020-11-03 苏州浪潮智能科技有限公司 Method and device for reserving fault site based on BMC

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101001169A (en) * 2006-01-10 2007-07-18 英业达股份有限公司 Data transmitting system used in electronic equipment with multiple service unit
US20140195669A1 (en) * 2013-01-08 2014-07-10 American Megatrends, Inc. Emulated communication between master management instance and assisting management instances on baseboard management controller
CN104104543A (en) * 2014-07-17 2014-10-15 浪潮集团有限公司 Server managing system and method based on SNMP and IPMI protocol
CN104394232A (en) * 2014-12-11 2015-03-04 山东超越数控电子有限公司 Independent management and concentrated management method of cloud equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101001169A (en) * 2006-01-10 2007-07-18 英业达股份有限公司 Data transmitting system used in electronic equipment with multiple service unit
US20140195669A1 (en) * 2013-01-08 2014-07-10 American Megatrends, Inc. Emulated communication between master management instance and assisting management instances on baseboard management controller
CN104104543A (en) * 2014-07-17 2014-10-15 浪潮集团有限公司 Server managing system and method based on SNMP and IPMI protocol
CN104394232A (en) * 2014-12-11 2015-03-04 山东超越数控电子有限公司 Independent management and concentrated management method of cloud equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
阮荣友: "基于IPMI协议的服务器主板控制器的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111629043A (en) * 2020-05-21 2020-09-04 北京计算机技术及应用研究所 Cross-platform health management system based on cloud mode
CN111629043B (en) * 2020-05-21 2023-05-19 北京计算机技术及应用研究所 Cross-platform health management system based on cloud mode
CN111884830A (en) * 2020-06-24 2020-11-03 苏州浪潮智能科技有限公司 Method and device for reserving fault site based on BMC

Similar Documents

Publication Publication Date Title
CN111209131B (en) Method and system for determining faults of heterogeneous system based on machine learning
CN106383763B (en) Data center's intelligent trouble detects alarm system
CN103728965B (en) Monitoring device and method for aircraft engine and FADEC system
CN104991629B (en) Power-fail detecting system and its method
EP2161664B1 (en) System and method for detecting temporal relationships uniquely associated with an underlying root cause
CN107995049A (en) The transregional synchronous fault monitoring method of the power ampere whole district, device and system
KR102427205B1 (en) Apparatus and method for generating training data of artificial intelligence model
CN106603265A (en) Management methods, service controller devices, and non-transient computer-readable media
CN110750377A (en) Fault positioning method and device
CN109976959A (en) A kind of portable device and method for server failure detection
CN101594192B (en) Fault on-line detection method and fault on-line detection device for signal processing equipment and optical interface board
CN106201804A (en) The device of a kind of measuring and calculation mainboard, method and system
CN105335261A (en) Design method for testing BIT in server equipment
CN104574219A (en) System and method for monitoring and early warning of operation conditions of power grid service information system
CN113391978B (en) Inspection method and device for host
CN109631994A (en) Operate automatic detection and the Fault Locating Method of indication control board
CN105094070B (en) The operation monitoring system and its operating method of passive outdoor equipment
CN113240891A (en) Equipment alarm information push system
CN105183593A (en) Homemade computer based build in test system and method
WO2024098986A1 (en) Relay protection apparatus defect detection method and system based on intelligent oscillograph
CN113406417A (en) Fault tree analysis method of S700K turnout switch machine
TWI725552B (en) Machine failure analyzing system and wearable electronic device having machine failure analyzing function
TW200926082A (en) System for monitoring and controlling parking apparatus
ZHANG et al. Approach to anomaly detection in microservice system with multi-source data streams
CN108445280A (en) A kind of voltmeter with fault cues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190416

RJ01 Rejection of invention patent application after publication