CN107145428A - A kind of server and server monitoring method - Google Patents

A kind of server and server monitoring method Download PDF

Info

Publication number
CN107145428A
CN107145428A CN201710381927.8A CN201710381927A CN107145428A CN 107145428 A CN107145428 A CN 107145428A CN 201710381927 A CN201710381927 A CN 201710381927A CN 107145428 A CN107145428 A CN 107145428A
Authority
CN
China
Prior art keywords
bmc
control element
logic control
signal
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710381927.8A
Other languages
Chinese (zh)
Inventor
程万前
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710381927.8A priority Critical patent/CN107145428A/en
Publication of CN107145428A publication Critical patent/CN107145428A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of server and server monitoring method, when BMC normal works, BMC sends heartbeat signal to logic control element;When logic control element detects heartbeat signal, the monitoring signal that monitored unit is sent directly is issued into BMC;When BMC irregular workings, i.e., when logic control element can't detect heartbeat signal, logic control element detects the state of monitored unit;When level and the inconsistent normal level of its monitoring signal, logic control element is by the number record of the monitoring signal into RAM, and after the heartbeat signal to be checked for measuring BMC, the numbering for retelling monitoring signal issues BMC by logout signal.The present invention can be realized when BMC breaks down or resets, and remain able to read the monitoring signal change of failure or reset device.Improve the stability and reliability of server.

Description

A kind of server and server monitoring method
Technical field
The present invention relates to server and server monitoring.
Background technology
, it is necessary to using the management system outside band to the power consumption of server, voltage, fan, switching on and shutting down in server design Each index such as state is monitored.Management system uses special Management Controller, typically according to function by Management Controller It is divided into two kinds:BMC(Baseboard management controller, baseboard management controller), server master board is carried out Monitoring;SMC(System management controller, System Management Controller), server whole system is supervised Control.Generally, in order to ensure the reliability of server, SMC uses Redundancy Design, that is, configures two SMC, any one When SMC is out of order, another SMC can ensure the normal work of management system.
But BMC does not use Redundancy Design typically.It is the common connection in existing design, i.e., each mainboard or section such as Fig. 1 An only integrated BMC on point, for the monitoring to mainboard or node.The signal of monitoring is directly issued BMC by monitored unit.Quilt Monitoring unit is usually the equipment such as power module, CPU, PCH.Monitoring signal can be signal, the CPU overheats for indicating power supply status Signal etc..So, when system power state, temperature etc. are exceeded, BMC can successfully be detected and be recorded the phenomenon, and under progress Single stepping.
The problem of this just brings server reliability.That is, during BMC breaks down reset, it is impossible to mainboard or section The work health state of point(Power consumption, voltage, whether wrong false information etc.)It is monitored.
CPLD(Complex Programmable Logic Device, CPLD).
FPGA(Field-Programmable Gate Array, i.e. field programmable gate array).
The content of the invention
The present invention is solution when BMC breaks down, it is impossible to the work health state of mainboard or node(Power consumption, voltage, Whether wrong false information etc.)The technical problem being monitored.Therefore, the present invention provides a kind of server and server monitoring side Method, it has when BMC breaks down or resets, and remains able to read the monitoring signal change of failure or reset device, carries The high stability and the advantage of reliability of server.
To achieve these goals, the present invention is adopted the following technical scheme that.
A kind of server, comprising:
BMC, is connected with logic control element, sends heartbeat signal to logic control element, and receive logic control element transmission Monitoring signal and logout signal;
Logic control element, is connected with monitored unit, receives the monitoring signal that monitored unit is sent;
Monitored unit, for sending monitoring signal to logic control element.
It is preferred that, logic control element is CPLD or FPGA one kind.
It is preferred that, monitored unit is power module, CPU, PCH, network chip, the one or more of system power supply.
Whether power module, monitoring voltage has output, and whether amplitude is normal;CPU, has monitored whether to report an error;PCH, monitoring Whether report an error;Network chip, monitoring network signal whether UNICOM;System power supply, detect whether it is wrong, export it is whether normal (System power supply is that power module refers to the dc source for producing system power supply into the module of dc source by 220V Power converts Change into the various power supplys of board needs).
Server monitoring method, is comprised the steps of:
When BMC normal works, BMC sends heartbeat signal to logic control element.When logic control element detects heartbeat letter Number when, the monitoring signal that monitored unit is sent directly is issued into BMC.
When BMC irregular workings, i.e., when logic control element can't detect heartbeat signal, logic control element detection The state of monitored unit.When level and the inconsistent normal level of its monitoring signal, logic control element believes the monitoring Number number record into RAM, after the heartbeat signal to be checked for measuring BMC, the numbering for retelling monitoring signal is believed by logout Number issue BMC.
Beneficial effects of the present invention:The present invention is directed in existing design BMC when breaking down or resetting, it is impossible to mainboard or The work health state of node(Power consumption, voltage, whether wrong false information etc.)The problem of being monitored is improved.Pass through CPLD Judge BMC working condition, decide whether writing task of the adapter to monitoring information.Using this technology, it is possible to achieve in BMC hairs When raw failure or reset, remain able to read the monitoring signal change of failure or reset device.Improve the stabilization of server Property and reliability.
Brief description of the drawings
Fig. 1 is prior art server circuit connection diagram.
Fig. 2 is the present embodiment server circuit connection diagram.
Embodiment
The invention will be further described with embodiment below in conjunction with the accompanying drawings.
As shown in Fig. 2 a kind of server, comprising BMC, is connected with CPLD, heartbeat signal is sent to CPLD, and receive CPLD The monitoring signal and logout signal of transmission;CPLD, is connected with monitored unit, receives the monitoring letter that monitored unit is sent Number;Monitored unit, for sending monitoring signal to CPLD.Monitored unit includes power module, CPU, PCH.
Server monitoring method, is comprised the steps of:
When BMC normal works, BMC sends heartbeat signal to CPLD.When CPLD detects heartbeat signal, by monitored unit The monitoring signal sent directly issues BMC.
When BMC irregular workings, i.e., when CPLD can't detect heartbeat signal, CPLD detects the shape of monitored unit State.When level and the inconsistent normal level of its monitoring signal, CPLD into RAM, treats the number record of the monitoring signal After the heartbeat signal for detecting BMC, the numbering for retelling monitoring signal issues BMC by logout signal.
Although above-mentioned the embodiment of the present invention is described with reference to accompanying drawing, not to present invention protection model The limitation enclosed, one of ordinary skill in the art should be understood that on the basis of technical scheme those skilled in the art are not Need to pay various modifications or deform still within protection scope of the present invention that creative work can make.

Claims (4)

1. a kind of server, it is characterised in that include:
BMC, is connected with logic control element, sends heartbeat signal to logic control element, and receive logic control element transmission Monitoring signal and logout signal;
Logic control element, is connected with monitored unit, receives the monitoring signal that monitored unit is sent;
Monitored unit, for sending monitoring signal to logic control element.
2. the server as described in right will require 1, it is characterised in that the logic control element is the one of CPLD or FPGA Kind.
3. the server as described in right will require 1, it is characterised in that the monitored unit be power module, CPU, PCH, The one or more of network chip, system power supply.
4. according to claim 1-4 server monitoring method, comprise the steps of:
When BMC normal works, BMC sends heartbeat signal to logic control element;When logic control element detects heartbeat letter Number when, the monitoring signal that monitored unit is sent directly is issued into BMC;
When BMC irregular workings, i.e., when logic control element can't detect heartbeat signal, logic control element detection is supervised Control the state of unit;When level and the inconsistent normal level of its monitoring signal, logic control element is by the monitoring signal Number record is into RAM, after the heartbeat signal to be checked for measuring BMC, and the numbering for retelling monitoring signal is sent out by logout signal To BMC.
CN201710381927.8A 2017-05-26 2017-05-26 A kind of server and server monitoring method Pending CN107145428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710381927.8A CN107145428A (en) 2017-05-26 2017-05-26 A kind of server and server monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710381927.8A CN107145428A (en) 2017-05-26 2017-05-26 A kind of server and server monitoring method

Publications (1)

Publication Number Publication Date
CN107145428A true CN107145428A (en) 2017-09-08

Family

ID=59780272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710381927.8A Pending CN107145428A (en) 2017-05-26 2017-05-26 A kind of server and server monitoring method

Country Status (1)

Country Link
CN (1) CN107145428A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608925A (en) * 2017-10-09 2018-01-19 郑州云海信息技术有限公司 A kind of Server Extension card information acquisition methods and device
CN107783788A (en) * 2017-10-26 2018-03-09 英业达科技有限公司 The method started shooting after detection means and detection before start
CN107797880A (en) * 2017-11-29 2018-03-13 济南浪潮高新科技投资发展有限公司 A kind of method for improving server master board BMC reliabilities
CN108038019A (en) * 2017-12-25 2018-05-15 曙光信息产业(北京)有限公司 A kind of automatically restoring fault method and system of baseboard management controller
CN108170546A (en) * 2017-12-15 2018-06-15 山东超越数控电子股份有限公司 A kind of repositioning method based on EC
CN108255646A (en) * 2018-01-17 2018-07-06 重庆大学 A kind of self-healing method of industrial control program failure based on heartbeat detection
CN108762142A (en) * 2018-05-24 2018-11-06 新华三技术有限公司 A kind of communication equipment and its processing method
CN108919935A (en) * 2018-07-12 2018-11-30 浪潮电子信息产业股份有限公司 Monitoring method, device and equipment for power supply on server mainboard
CN109723666A (en) * 2018-11-26 2019-05-07 曙光信息产业股份有限公司 Fan control device and method
CN109826822A (en) * 2019-04-11 2019-05-31 苏州浪潮智能科技有限公司 A kind of control method for fan and relevant apparatus
CN109882440A (en) * 2019-04-16 2019-06-14 苏州浪潮智能科技有限公司 A kind of fan rotation speed control apparatus and control method
CN110244630A (en) * 2019-06-21 2019-09-17 深圳市三旺通信股份有限公司 Serial server based on programmable logic device online acquisition serial interface signal
CN110245106A (en) * 2019-06-21 2019-09-17 深圳市三旺通信股份有限公司 The serial server of SCM Based online acquisition serial interface signal
CN110262341A (en) * 2019-06-21 2019-09-20 深圳市三旺通信股份有限公司 The CAN server of SCM Based online acquisition CAN interface signal
CN110262342A (en) * 2019-06-21 2019-09-20 深圳市三旺通信股份有限公司 CAN server based on programmable logic device online acquisition CAN signal
CN110502377A (en) * 2019-08-08 2019-11-26 苏州浪潮智能科技有限公司 It is a kind of that test method is restarted based on CPLD
CN110597745A (en) * 2019-09-20 2019-12-20 苏州浪潮智能科技有限公司 Method and device for realizing multi-master multi-slave I2C communication of switch system
TWI684859B (en) * 2018-01-12 2020-02-11 廣達電腦股份有限公司 Method for remote system recovery
CN111639005A (en) * 2020-05-19 2020-09-08 成都市爱科科技实业有限公司 Independent monitoring system and method for server state
CN113064664A (en) * 2021-03-02 2021-07-02 凌华科技(中国)有限公司 Control method and device, complex programmable logic device and server
CN113064479A (en) * 2021-03-03 2021-07-02 山东英信计算机技术有限公司 Power supply redundancy control system, method and medium of GPU server
CN114691408A (en) * 2022-04-18 2022-07-01 苏州浪潮智能科技有限公司 Fault detection device for substrate management controller

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080313312A1 (en) * 2006-12-06 2008-12-18 David Flynn Apparatus, system, and method for a reconfigurable baseboard management controller
CN103835972A (en) * 2012-11-20 2014-06-04 英业达科技有限公司 Fan rotating speed control system and method for control rotating speed of fan
CN104063300A (en) * 2014-01-18 2014-09-24 浪潮电子信息产业股份有限公司 Acquisition device based on FPGA (Field Programmable Gate Array) for monitoring information of high-end multi-channel server
CN105117317A (en) * 2015-08-17 2015-12-02 浪潮(北京)电子信息产业有限公司 Method and device for monitoring server performance
CN105808398A (en) * 2016-03-08 2016-07-27 浪潮电子信息产业股份有限公司 Method for rapidly analyzing and positioning hardware abnormity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080313312A1 (en) * 2006-12-06 2008-12-18 David Flynn Apparatus, system, and method for a reconfigurable baseboard management controller
CN103835972A (en) * 2012-11-20 2014-06-04 英业达科技有限公司 Fan rotating speed control system and method for control rotating speed of fan
CN104063300A (en) * 2014-01-18 2014-09-24 浪潮电子信息产业股份有限公司 Acquisition device based on FPGA (Field Programmable Gate Array) for monitoring information of high-end multi-channel server
CN105117317A (en) * 2015-08-17 2015-12-02 浪潮(北京)电子信息产业有限公司 Method and device for monitoring server performance
CN105808398A (en) * 2016-03-08 2016-07-27 浪潮电子信息产业股份有限公司 Method for rapidly analyzing and positioning hardware abnormity

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107608925A (en) * 2017-10-09 2018-01-19 郑州云海信息技术有限公司 A kind of Server Extension card information acquisition methods and device
CN107783788A (en) * 2017-10-26 2018-03-09 英业达科技有限公司 The method started shooting after detection means and detection before start
CN107797880A (en) * 2017-11-29 2018-03-13 济南浪潮高新科技投资发展有限公司 A kind of method for improving server master board BMC reliabilities
CN108170546A (en) * 2017-12-15 2018-06-15 山东超越数控电子股份有限公司 A kind of repositioning method based on EC
CN108038019A (en) * 2017-12-25 2018-05-15 曙光信息产业(北京)有限公司 A kind of automatically restoring fault method and system of baseboard management controller
CN108038019B (en) * 2017-12-25 2021-06-11 曙光信息产业(北京)有限公司 Automatic fault recovery method and system for substrate management controller
TWI684859B (en) * 2018-01-12 2020-02-11 廣達電腦股份有限公司 Method for remote system recovery
US10846160B2 (en) 2018-01-12 2020-11-24 Quanta Computer Inc. System and method for remote system recovery
CN108255646A (en) * 2018-01-17 2018-07-06 重庆大学 A kind of self-healing method of industrial control program failure based on heartbeat detection
CN108255646B (en) * 2018-01-17 2022-02-01 重庆大学 Industrial control application program fault self-recovery method based on heartbeat detection
CN108762142A (en) * 2018-05-24 2018-11-06 新华三技术有限公司 A kind of communication equipment and its processing method
CN108919935A (en) * 2018-07-12 2018-11-30 浪潮电子信息产业股份有限公司 Monitoring method, device and equipment for power supply on server mainboard
CN109723666A (en) * 2018-11-26 2019-05-07 曙光信息产业股份有限公司 Fan control device and method
CN109826822B (en) * 2019-04-11 2021-06-29 苏州浪潮智能科技有限公司 Fan control method and related device
CN109826822A (en) * 2019-04-11 2019-05-31 苏州浪潮智能科技有限公司 A kind of control method for fan and relevant apparatus
CN109882440A (en) * 2019-04-16 2019-06-14 苏州浪潮智能科技有限公司 A kind of fan rotation speed control apparatus and control method
CN110262341A (en) * 2019-06-21 2019-09-20 深圳市三旺通信股份有限公司 The CAN server of SCM Based online acquisition CAN interface signal
CN110262342A (en) * 2019-06-21 2019-09-20 深圳市三旺通信股份有限公司 CAN server based on programmable logic device online acquisition CAN signal
CN110245106A (en) * 2019-06-21 2019-09-17 深圳市三旺通信股份有限公司 The serial server of SCM Based online acquisition serial interface signal
CN110244630A (en) * 2019-06-21 2019-09-17 深圳市三旺通信股份有限公司 Serial server based on programmable logic device online acquisition serial interface signal
CN110502377A (en) * 2019-08-08 2019-11-26 苏州浪潮智能科技有限公司 It is a kind of that test method is restarted based on CPLD
CN110502377B (en) * 2019-08-08 2021-04-27 苏州浪潮智能科技有限公司 Restarting test method based on CPLD
CN110597745A (en) * 2019-09-20 2019-12-20 苏州浪潮智能科技有限公司 Method and device for realizing multi-master multi-slave I2C communication of switch system
CN111639005A (en) * 2020-05-19 2020-09-08 成都市爱科科技实业有限公司 Independent monitoring system and method for server state
CN113064664A (en) * 2021-03-02 2021-07-02 凌华科技(中国)有限公司 Control method and device, complex programmable logic device and server
CN113064479A (en) * 2021-03-03 2021-07-02 山东英信计算机技术有限公司 Power supply redundancy control system, method and medium of GPU server
WO2022183877A1 (en) * 2021-03-03 2022-09-09 山东英信计算机技术有限公司 Power redundancy control system and method for gpu server, and medium
CN114691408A (en) * 2022-04-18 2022-07-01 苏州浪潮智能科技有限公司 Fault detection device for substrate management controller
CN114691408B (en) * 2022-04-18 2024-07-02 苏州浪潮智能科技有限公司 Fault detection device of substrate management controller

Similar Documents

Publication Publication Date Title
CN107145428A (en) A kind of server and server monitoring method
CN104794033A (en) CPU low-frequency fault positioning method and device based on BMC
WO2020253417A1 (en) Lorawan-based electric transmission line monitoring device and system
CN106952464A (en) Intelligent data acqusition system and acquisition method
CN104660440A (en) Blade server management system and control method thereof
CN106445055A (en) Power supply protection mechanism of Rack server
CN102495786B (en) Server system
CN112882901A (en) Intelligent health state monitor of distributed processing system
CN110907802A (en) State detection device
CN212809125U (en) Health management system of computer host
CN102780207A (en) Voltage protection system and voltage protection method
WO2021190093A1 (en) Server system, and frequency control device for processor therein
CN101799775B (en) Monitoring method for monitoring circuit and business board
CN104765326A (en) Air discharge monitoring system
CN116483613B (en) Processing method and device of fault memory bank, electronic equipment and storage medium
CN102928684B (en) Insulation monitoring device for medical isolated power system
CN210038709U (en) Power monitoring management buckle
CN102810840A (en) Voltage protection system and voltage protection method
CN206892209U (en) Failure detector circuit and system
CN105468495A (en) Complex programmable logic array control device
CN107831452A (en) DC control and protection system hostdown diagnoses and life appraisal equipment
CN109460139A (en) A kind of method and relevant apparatus of power supply guarantee
CN105185422B (en) A kind of Measurement redundancy rod level detecting apparatus
CN203100924U (en) Temperature rising testing device for toy detection
CN110905796A (en) Self-suction type slurry pump running state testing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170908

WD01 Invention patent application deemed withdrawn after publication