CN104794029A - Active-active type blade server management system achieving automatic monitoring and diagnosing - Google Patents

Active-active type blade server management system achieving automatic monitoring and diagnosing Download PDF

Info

Publication number
CN104794029A
CN104794029A CN201510201467.7A CN201510201467A CN104794029A CN 104794029 A CN104794029 A CN 104794029A CN 201510201467 A CN201510201467 A CN 201510201467A CN 104794029 A CN104794029 A CN 104794029A
Authority
CN
China
Prior art keywords
monitoring
module
management
diagnosis
controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510201467.7A
Other languages
Chinese (zh)
Inventor
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201510201467.7A priority Critical patent/CN104794029A/en
Publication of CN104794029A publication Critical patent/CN104794029A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides an active-active type blade server management system achieving automatic monitoring and diagnosing and relates to server management framework technologies. The active-active type blade server management system achieving automatic monitoring and diagnosing is mainly composed of redundant system management controllers, modular monitor agents and a plurality of monitoring and diagnosing networks. The system management controllers are in charge of monitoring, management and diagnosing testing of the whole system. The modular monitor agents are integrated with various modules and are in charge of modular fault monitoring, diagnosing, isolating and recovering. The monitoring and diagnosing networks mainly comprise the I2C monitoring network, the JTAG boundary scanning network and the serial port debugging network. By the adoption of the technology of redundant design of the double management controllers, the double controllers and the monitor agents in the function modules form two redundant monitoring control networks. Fault isolating is achieved completely, the utilization rate of system management resources is increased, and the parallel working efficiency of the system is improved.

Description

A kind of dual-active formula blade server management system of automatic monitoring diagnosis
Technical field
The present invention relates to server admin architecture technology, particularly relate to the dual-active formula blade server management system of a kind of automatic monitoring diagnosis.
Background technology
At present along with the explosive growth of data volume, single server unit cannot satisfying magnanimity data process need, the computer system architecture of large-scale parallel, by it, there is extendability strong, the features such as computing power is high, support unified management, more and more cater to the demand of large data age to server product, this make current server system physical size gradually huge, module composition gradually complicated, integrated level increase gradually, require also more and more higher to the redundance of system simultaneously.Monitoring management system is the important component part of computer system, it is the core guarantee of system reliability and maintenanceability, in administrative skill, current industry has the monitoring of I2C hardware state, jtag boundary scanning and RS232 AccessPort technology, how to synthesize the multiple monitoring channels in many levels, realizing comprehensive diagnos, is a guardian technique in monitoring and diagnosis subsystem.
Monitoring and diagnosis subsystem is for improving the global reliability of computer system and the maintainable effect played core and ensure, but the management channels that the many employings of management framework of server are single at present, management channels is single, cannot realize the monitoring of effective resultant fault, diagnosis, debug function; In industry computer system, the same time all adopts 1 Management Controller on the other hand, causes the idle of management of system resource and waste.
Summary of the invention
In order to solve this problem, propose the dual-active formula blade server management system framework of a kind of automatic monitoring diagnosis herein.
Automatic monitoring diagnosis a dual-active formula blade server management system, primarily of redundancy System Management Controller, module level monitoring and diagnosis agency and multiple monitoring diagnostic network composition; System Management Controller is responsible for system-wide monitoring, management and diagnostic test; Monitoring and diagnosis integration, in modules, is responsible for the malfunction monitoring of module level, diagnosis, isolation and restoration; Multiple monitoring diagnostic network forms primarily of I2C monitor network, jtag boundary scan for networks and AccessPort network; Have employed the technology of double-pipe type controller Redundancy Design, between the monitoring agent in dual controller and each functional module, form the Monitoring and Controlling network that two overlap redundancy.
In new management framework, embedded system Management Controller SMC carries out Real-Time Monitoring by I2C passage to the voltage of key equipment in each module of system, temperature, logic state, and the emergency such as overvoltage, excess temperature is taken emergency measures, avoid causing system failure; By boundary scan passage, diagnostic test is carried out to the processing module in system, Switching Module and I/O module, the error message of Real-Time Monitoring hardware logic, and carry out alternately with operating system, realize diagnosing the real-time online of modules; The serial ports redirect operation of each processing module is realized, for system debug and user provide direct supervisor console interface by serial ports passage.New management framework not only can improve the diagnosis efficiency of system, and the redundance of significantly elevator system management, reduces computer system shutdown maintenance number of times.System Management Controller has event recording function.
Module level monitoring and diagnosis agency be separately positioned in subsystems module, utilizes I2C bus, completes the monitoring task of local; Its major function monitors parts each in module, comprises voltage, temperature, rotation speed of the fan, error condition, can also read inter-node chip relevant information simultaneously, implements safeguard measure when emergency condition.
Module level monitoring and diagnosis agency be connected with the conventional I in processing module/O controller, Debugging network in composition module, and externally provide RS232 serial line interface, both can be used for Debugging object, also can be used for realizing single node terminal console function; Monitoring and diagnosis subsystem is connected by I/O bus with system I/O controller, and transmit fault detection and diagnosis information by asynchronous interrupt mode to system, work in coordination with operating system failure treatment mechanism, realize failure monitoring that software and hardware combines, diagnosis, isolation and restore funcitons.
System Management Controller, as the core cell in whole system, is responsible for the realization of management function in whole module; System Management Controller can be connected to system dorsulum by two groups of SMC_SEL signals, when the administration module in system will access fusion Switching Module, system management module can notify System Management Controller by SMC_SEL, is distributed the authority of access by System Management Controller.
1, new framework have employed the technology of double-pipe type controller Redundancy Design, forms the Monitoring and Controlling network that two overlap redundancy, make new management framework possess higher redundance between the monitoring agent in dual controller and each functional module;
2, in new management framework, adopt dual controller simultaneously first and deposit the technology of job design, realizing the isolation of fault completely, strengthen the utilization factor of system administration resources, improve the efficiency of system in parallel work.
Beneficial effect of the present invention:
Have employed the technology of double-pipe type controller Redundancy Design, form the Monitoring and Controlling network that two overlap redundancy between the monitoring agent in dual controller and each functional module, make new management framework possess higher redundance.In new management framework, adopt dual controller simultaneously first simultaneously and deposit the technology of job design, realizing the isolation of fault completely, strengthen the utilization factor of system administration resources, improve the efficiency of system in parallel work.
Accompanying drawing explanation
Accompanying drawing 1 is management system block architecture diagram of the present invention.
Accompanying drawing 2 is System Management Controller SMC structural drawing.
Accompanying drawing 3 is module level supervisor controller MMA design concept block diagrams.
Accompanying drawing 4 is system module level management theory diagrams.
Embodiment
More detailed elaboration is carried out to content of the present invention below:
Fig. 1 is the dual-active formula blade server management system block architecture diagram of the realized multichannel comprehensive automatic monitoring diagnosis supporting two redundance, wherein monitoring and diagnosis subsystem is by System Management Controller (the System Management Controller of redundancy, be called for short SMC), module level monitoring and diagnosis agency (Modular Monitor Agent, be called for short MMA) and multiple monitoring diagnostic network form.System Management Controller is responsible for system-wide monitoring, management and diagnostic test.Monitoring and diagnosis integration, in modules, is responsible for the malfunction monitoring of module level, diagnosis, isolation and restoration.Multiple monitoring diagnostic network is made up of I2C monitor network, jtag boundary scan for networks and AccessPort network.
Fig. 2 is System Management Controller SMC structural drawing, System Management Controller SMC is made up of parts such as flush bonding processor, I2C bus controller, RS232 serial communication interface, jtag boundary scanning monitor, Ethernet, panel steering logics, the real-time embedded operating system of operation support multitask.Wherein SMC collects the sensing datas such as voltage, electric current and temperature by I2C bus, realizes hardware environment monitoring.By the state of each chip in boundary scan chain acquisition hardware module, realize localization of fault; Send control code by boundary scan chain to each chip, realize the functions such as continuity test, hardware configuration, module resets, fault isolation.Collect running status echo by serial network, send instruction, realize the function debugging of hardware logic and system software, simultaneously also for user provides serial port terminal control desk to serve and the Ethernet remote maintenance and diagnosis interface of sing on web mode.In addition, System Management Controller SMC has event recording function.
Fig. 3 is module level supervisor controller design concept block diagram, and module level monitoring and diagnosis agency be separately positioned in subsystems module, utilizes I2C bus, completes the monitoring task of local.Its major function monitors parts each in module, comprises voltage, temperature, rotation speed of the fan, error condition etc., can also read inter-node chip relevant information simultaneously, implements safeguard measure when emergency condition.Monitoring and diagnosis agency be connected with the conventional I in processing module/O controller, Debugging network in composition module, and externally provide RS232 serial line interface, both can be used for Debugging object, also can be used for realizing single node terminal console function.Monitoring and diagnosis subsystem is connected by I/O bus with system I/O controller, and transmit fault detection and diagnosis information by asynchronous interrupt mode to system, work in coordination with operating system failure treatment mechanism, realize failure monitoring that software and hardware combines, diagnosis, isolation and restore funcitons.
Fig. 4 is system module level management theory diagram, and in figure, management control module is as the core cell in whole system, is responsible for the realization of management function in whole module.Administration module can be connected to system dorsulum by two groups of SMC_SEL signals, and when the administration module in system will access fusion Switching Module, system management module can notify management control module by SMC_SEL, is distributed the authority of access by management control module.Such as management control module is selected to be visited by SMC0 at present, now management control module can pass through SMC_SEL signal notification signal Switching Module 1, the gating of signal exchange module 1 primary responsibility I2C link, the path of SMC0_I2C can be opened after signal exchange module 1 receives the gating signal of management control module, now system management module just can obtain by the path of I2C the essential information merging Switching Module, such as merge the state in place of Switching Module, the temperature of core, the number of degrees of voltage and the basic information such as MAC Address, sequence number.The supply of each hardware device voltage on voltage transformation module primary responsibility whole fusion Switching Module, the electricity of 12V because system dorsulum is supplied to what merge Switching Module, and the various hardware devices in module only need 3.3V and 1.5V voltage usually, therefore need to convert 12V electricity to each device required voltage by power transfer module, ensure to merge the operation that on Switching Module, each electron device is stable.Voltage management module in charge merges the management exchanging upper voltage, after management control module detects that the in place and state of the fan merged on Switching Module is normal, can notify that voltage transformation module is opened, the input of voltage is provided to each device, if detect abnormal, can notify that voltage transformation module is closed, ensure that each hardware device can not damage.Platform information memory module is used for storing and merges the essential information of Switching Module, comprises sequence number, the information of production firm, shipment date and MAC Address.Message processing module is mainly used to the process of being responsible for merging Switching Module exchange data, because system management module and fusion Switching Module all can remove access platform information storage module, therefore in order to avoid the conflict of resource access, distributed by management control module equally to the access rights of platform access memory module, first by system management module, platform information memory module is conducted interviews in system under normal circumstances, when merging Switching Module and wanting access platform information to store mould, merge Switching Module and can send out request to management control module, management control module when confirm system management module not access platform information storage module time, signal exchange module 2 can be issued by CPU_EN signal, now information exchange module 2 is opened, at this moment message processing module can be come to visit as platform information memory module by I2C bus.System management module can directly access by SGMII_SMC signal the message processing module merged on Switching Module in systems in which, can send instructions to message processing module, also directly can get the information on whole Fusion Module simultaneously.Administration module directly accesses non-volatile memory modules by SPI BUS, non-volatile memory modules stores basic configuration information, and the startup being used for doing module configures.Administration module is by I2C_BUS link-access temperature monitoring module, obtain the temperature information on whole fusion Switching Module, according to the temperature information obtained, management control module carrys out the fan on adjusting module by FAN_ALERT_N signal notice fan control module, ensure the heat radiation of whole module.
Except the technical characteristic described in instructions, be the known technology of those skilled in the art.

Claims (6)

1. automatic monitoring diagnosis a dual-active formula blade server management system, it is characterized in that, primarily of redundancy System Management Controller, module level monitoring and diagnosis agency and multiple monitoring diagnostic network composition; System Management Controller is responsible for system-wide monitoring, management and diagnostic test; Monitoring and diagnosis integration, in modules, is responsible for the malfunction monitoring of module level, diagnosis, isolation and restoration; Multiple monitoring diagnostic network forms primarily of I2C monitor network, jtag boundary scan for networks and AccessPort network; Have employed the technology of double-pipe type controller Redundancy Design, between the monitoring agent in dual controller and each functional module, form the Monitoring and Controlling network that two overlap redundancy.
2. management system according to claim 1, it is characterized in that, embedded system Management Controller carries out Real-Time Monitoring by I2C passage to the voltage of key equipment in each module of system, temperature, logic state, and the emergency such as overvoltage, excess temperature is taken emergency measures, avoid causing system failure; By boundary scan passage, diagnostic test is carried out to the processing module in system, Switching Module and I/O module, the error message of Real-Time Monitoring hardware logic, and carry out alternately with operating system, realize diagnosing the real-time online of modules; The serial ports redirect operation of each processing module is realized, for system debug and user provide direct supervisor console interface by serial ports passage.
3. management system according to claim 2, is characterized in that, System Management Controller has event recording function.
4. management system according to claim 2, is characterized in that, module level monitoring and diagnosis agency be separately positioned in subsystems module, utilizes I2C bus, completes the monitoring task of local; Its major function monitors parts each in module, comprises voltage, temperature, rotation speed of the fan, error condition, can also read inter-node chip relevant information simultaneously, implements safeguard measure when emergency condition.
5. management system according to claim 4, it is characterized in that, module level monitoring and diagnosis agency be connected with the conventional I in processing module/O controller, Debugging network in composition module, and RS232 serial line interface is externally provided, both can be used for Debugging object, and also can be used for realizing single node terminal console function; Monitoring and diagnosis subsystem is connected by I/O bus with system I/O controller, and transmit fault detection and diagnosis information by asynchronous interrupt mode to system, work in coordination with operating system failure treatment mechanism, realize failure monitoring that software and hardware combines, diagnosis, isolation and restore funcitons.
6. management system according to claim 2, is characterized in that, System Management Controller, as the core cell in whole system, is responsible for the realization of management function in whole module; System Management Controller can be connected to system dorsulum by two groups of SMC_SEL signals, when the administration module in system will access fusion Switching Module, system management module can notify System Management Controller by SMC_SEL, is distributed the authority of access by System Management Controller.
CN201510201467.7A 2015-04-23 2015-04-23 Active-active type blade server management system achieving automatic monitoring and diagnosing Pending CN104794029A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510201467.7A CN104794029A (en) 2015-04-23 2015-04-23 Active-active type blade server management system achieving automatic monitoring and diagnosing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510201467.7A CN104794029A (en) 2015-04-23 2015-04-23 Active-active type blade server management system achieving automatic monitoring and diagnosing

Publications (1)

Publication Number Publication Date
CN104794029A true CN104794029A (en) 2015-07-22

Family

ID=53558839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510201467.7A Pending CN104794029A (en) 2015-04-23 2015-04-23 Active-active type blade server management system achieving automatic monitoring and diagnosing

Country Status (1)

Country Link
CN (1) CN104794029A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589830A (en) * 2015-12-28 2016-05-18 浪潮(北京)电子信息产业有限公司 Blade server architecture
CN115562219A (en) * 2022-08-18 2023-01-03 南京康尼电子科技有限公司 Platform door insertion sheet type intelligent diagnosis dynamic communication monitoring server and monitoring method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101212345A (en) * 2006-12-31 2008-07-02 联想(北京)有限公司 Blade server management system
CN101594235A (en) * 2009-06-02 2009-12-02 浪潮电子信息产业股份有限公司 A kind of method that manages based on SMBUS bus blade server
CN201947289U (en) * 2011-01-11 2011-08-24 东莞市博晟电子科技有限公司 Server managing and monitoring system
CN104035831A (en) * 2014-07-01 2014-09-10 浪潮(北京)电子信息产业有限公司 High-end fault-tolerant computer management system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101212345A (en) * 2006-12-31 2008-07-02 联想(北京)有限公司 Blade server management system
CN101594235A (en) * 2009-06-02 2009-12-02 浪潮电子信息产业股份有限公司 A kind of method that manages based on SMBUS bus blade server
CN201947289U (en) * 2011-01-11 2011-08-24 东莞市博晟电子科技有限公司 Server managing and monitoring system
CN104035831A (en) * 2014-07-01 2014-09-10 浪潮(北京)电子信息产业有限公司 High-end fault-tolerant computer management system and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589830A (en) * 2015-12-28 2016-05-18 浪潮(北京)电子信息产业有限公司 Blade server architecture
CN105589830B (en) * 2015-12-28 2018-12-25 浪潮(北京)电子信息产业有限公司 A kind of blade server framework
CN115562219A (en) * 2022-08-18 2023-01-03 南京康尼电子科技有限公司 Platform door insertion sheet type intelligent diagnosis dynamic communication monitoring server and monitoring method

Similar Documents

Publication Publication Date Title
CN107733684B (en) Multi-controller computing redundancy cluster based on Loongson processor
CN110361979B (en) Safety computer platform in railway signal field
CN110376876B (en) Double-system synchronous safety computer platform
CN103488551B (en) Redundant path power subsystem and operation method thereof
Byvaikov et al. Experience from design and application of the top-level system of the process control system of nuclear power-plant
CN103152414A (en) High available system based on cloud calculation and implementation method thereof
CN110351174A (en) A kind of safety computer platform of module redundancy
CN205139890U (en) Two memory system that live of two accuses based on X86 framework
CN105760241A (en) Exporting method and system for memory data
CN101752904B (en) Power supply system distributed controlling and managing subsystem computer
CN103106126A (en) High-availability computer system based on virtualization
CN100538647C (en) The processing method for service stream of polycaryon processor and polycaryon processor
CN105045181A (en) Overall redundant architecture of PAS 100 control system
CN105007041B (en) A kind of photovoltaic generation unit performance monitoring method based on eight states
CN104794029A (en) Active-active type blade server management system achieving automatic monitoring and diagnosing
CN206460446U (en) A kind of supervising device for ruggedized computer mainboard
CN110247809B (en) Communication control method of double-ring network control system
CN206440960U (en) A kind of active power filter control system based on FPGA
Yucheng et al. High continuous availability digital information system based on stratus Fault-Tolerant server
CN204883336U (en) PAS100 control system's controller and redundant framework of communication module
CN204904019U (en) PAS100 control system's overall redundant framework
KR20230064270A (en) Apparatus, method and system for high-speed control platform for voltage direct current transmission network
CN204883339U (en) PAS100 control system's communication module and redundant framework of bus
CN204883337U (en) PAS100 control system's redundant framework of communication module
CN107423167A (en) A kind of ISCSI target redundancy control methods and system based on dual control storage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150722