CN104363120A - Method and system for monitoring and protecting operating environment of server - Google Patents

Method and system for monitoring and protecting operating environment of server Download PDF

Info

Publication number
CN104363120A
CN104363120A CN201410645268.0A CN201410645268A CN104363120A CN 104363120 A CN104363120 A CN 104363120A CN 201410645268 A CN201410645268 A CN 201410645268A CN 104363120 A CN104363120 A CN 104363120A
Authority
CN
China
Prior art keywords
server
region
infrastructure device
breaking down
state information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410645268.0A
Other languages
Chinese (zh)
Inventor
宋维维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410645268.0A priority Critical patent/CN104363120A/en
Publication of CN104363120A publication Critical patent/CN104363120A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides a method and system for monitoring and protecting the operating environment of a server. The method for monitoring the operating environment of the server includes the steps that the operating state information of basic equipment is collected; a breakdown area is determined according to the operating state information; the warning information including the breakdown area is sent to a server monitoring system. By means of the method and system for monitoring and protecting the operating environment of the server, the operating environment of the server can be monitored, underclocking is conducted on the monitored server in the breakdown area, and thus the server can be protected; moreover, service operation is not interrupted under the condition that the basic equipment breaks down, and engineering cost is further lowered.

Description

The running environment monitoring of server, guard method and system
Technical field
The present invention relates to technical field of the computer network, particularly relate to a kind of running environment monitoring of server, guard method and system.
Background technology
The fault-tolerant grade that current large-scale data center is built is very high, therefore requires also higher to the running environment of server, and this just needs to ensure the external environment that server runs in power supply and refrigeration link.Mostly existing Means of Ensuring is to realize by increasing infrastructure device, namely to have equipment component be failure condition and design use.
Along with the increase of large-scale data center construction scale, such configuration can make the quantity of redundance unit increase thereupon, and then engineering cost is obviously increased, and cost is higher.In addition, often increase an infrastructure device, be also the increase in a fault point simultaneously, if infrastructure device breaks down just easily cause server failure, thus cause the service operation of server to interrupt.
Summary of the invention
The object of the embodiment of the present invention is; there is provided that a kind of running environment of server is monitored, guard method and system; the running environment of server is monitored; carry out operation to the server of the fault zone monitored to control to play a protective role; and under infrastructure device failure condition, non-interrupting service runs, and also reduces engineering cost.
For achieving the above object, The embodiment provides a kind of running environment method for supervising of server, comprising: the running state information gathering infrastructure device; The region of breaking down is determined according to described running state information; The warning message in the region of breaking down described in comprising is sent to monitoring system server.
Embodiments of the invention additionally provide a kind of guard method of server, comprising: receiving package is containing the warning message in the region of breaking down; The server needing protection is determined according to the region of breaking down in described warning message; Send control command to the server in described region of breaking down.
Embodiments of the invention additionally provide a kind of running environment supervisory control system of server, comprising: collecting device, for gathering the running state information of infrastructure device; Equipment is determined in fault zone, for the region determining to break down according to described running state information; Switch, for sending to monitoring system server by the warning message in the region of breaking down described in comprising.
Embodiments of the invention additionally provide a kind of protective device of server, comprising: receiver module, for the warning message of receiving package containing the region of breaking down; Determination module, for determining the server needing protection according to the region of breaking down in described warning message; Sending module, for sending control command to the server in described region of breaking down.
The running environment monitoring of the server that the embodiment of the present invention provides, guard method and system, the region of breaking down is determined by the running state information of the infrastructure device collected, the warning message comprising the region of breaking down is sent to monitoring system server, the server needing protection is determined according to the region of breaking down in warning message, send control command to the server in the region of breaking down, protected by frequency reducing to make server, run in above-mentioned monitoring and the equal non-interrupting service of protection process, and do not increase any infrastructure device as redundancy, engineering cost is had obvious reduction.
Accompanying drawing explanation
Fig. 1 is the monitoring system server of the embodiment of the present invention and the system architecture diagram of dynamic environment monitoring system;
Fig. 2 is the schematic flow sheet of the running environment method for supervising of the server of the embodiment of the present invention one;
Fig. 3 is the schematic flow sheet of the guard method of the server of the embodiment of the present invention two;
Fig. 4 is the structural representation of the running environment supervisory control system of the server of the embodiment of the present invention three;
Fig. 5 is the structural representation of the protective device of the server of the embodiment of the present invention four.
Embodiment
Below in conjunction with accompanying drawing, the running environment of a kind of server of the embodiment of the present invention is monitored, guard method and system be described in detail.
Basic conception of the present invention is; monitored by the running environment of server provided by the invention, guard method and system; can either the running environment of monitoring server; again the server of the fault zone monitored is carried out to the control of such as frequency reducing thus reaches protective effect; in monitoring and protection process, non-interrupting service runs simultaneously, also reduces engineering cost.
Fig. 1 is the monitoring system server of the embodiment of the present invention and the system architecture diagram of dynamic environment monitoring system, as shown in Figure 1, monitoring system server can adopt the mode of monitoring in band, namely some shell scripts are configured on the server, produce in net in enterprise and take little resource, just can monitor Servers-all.The usual independence networking of dynamic environment monitoring system, adopts two-node cluster hot backup form, and it is online that core switch wherein needs the enterprise being connected to monitoring system server place to produce, and realizes interconnecting.These two supervisory control systems have worked in coordination with running environment monitoring and the protection of server, and specific implementation principle is as follows:
Dynamic environment monitoring system gathers various infrastructure device (such as uninterrupted power supply by the embedded collecting device of multiple stage, humiture equipment, air-conditioning etc.) running state information, thus can monitor the running status of infrastructure device, when monitoring infrastructure device and breaking down, the server that the equipment that immediate analysis breaks down affects, and send warning message to Intranet core switch by power environment system core switch, the risk of operation is had with which server of announcement server supervisory control system, after monitoring system server receives warning message, the server needing protection is determined according to warning message, and send control command to needing the server of protection and then it being carried out to the control of such as frequency reducing, the effect of protection server is reached with this, and the operation of continual service device business.
Embodiment one
Fig. 2 is the schematic flow sheet of the running environment method for supervising of the server of the embodiment of the present invention one.Such as, can perform the method shown in Fig. 2 by dynamic environment monitoring system, described method comprises the steps:
Step 11: the running state information gathering infrastructure device.
In actual applications, the infrastructure device that large-scale data center (such as core machine room) is equipped with can include, but not limited to uninterrupted power supply, humiture equipment and air-conditioning.In order to ensure the running environment of server in power supply and refrigeration link dramatically, need the ruuning situation monitoring these infrastructure devices.
Particularly, can be gathered the running state information of these infrastructure devices by embedded collecting device, wherein, running state information includes the information of normal operating condition or failure operation state, or the trip information of infrastructure device.Such as, the indoor temperature and humidity data etc. that air-conditioning is in failure operation state, humiture device measuring can be gathered.In actual applications, infrastructure device there will be the situation such as such as power-off, air conditioner refrigerating fault, wherein, power-off can cause the service operation of server to interrupt, air conditioner refrigerating fault can cause computer room temperature too high, and server at high temperature runs and very easily breaks down, based on the running state information of the collection infrastructure device of this step, in device fails situation, the determination of fault zone provides data basis.
Step 12: determine the region of breaking down according to running state information.
According to exemplary embodiment of the present invention, in this step, describedly determine that the process in the region of breaking down can comprise: determine the infrastructure device broken down according to running state information, and obtain the region of breaking down corresponding with infrastructure device according to the infrastructure device determining to break down.
Such as, when occurring that air-conditioning does not freeze, in a step 11, collect air-conditioning and be in failure operation state, can determine that fault has appearred in which platform air-conditioning according to the information of failure operation state, then obtain the region of breaking down corresponding with the air-conditioning broken down.
Step 13: the warning message comprising the region of breaking down is sent to monitoring system server.Such as, can based on the transmission control protocol/Internet Protocol (MODBUS-TCP/IP) for industry spot, the object for process control connect with embed (Object Linkingand Embedding for Process Control, OPC), in socket (SOCKET) any one agreement send comprise the region of breaking down warning message to monitoring system server.
By the running environment method for supervising of the server of the present embodiment, the running environment of server can be got up by said method monitoring, when infrastructure device breaks down or humiture is too high, warning message can be sent timely to monitoring system server, thus not interrupt providing safeguard for the service operation of server.
Further, the process of monitoring system server is sent to also to comprise the warning message comprising the region of breaking down: the type of the infrastructure device broken down determined is sent to monitoring system server.In actual applications, warning message contains the region of breaking down, and can also comprise the type of the infrastructure device broken down, and does corresponding process to make monitoring system server according to the region of the type of the infrastructure device broken down to fault.
Embodiment two
Fig. 3 is the schematic flow sheet of the guard method of the server of the embodiment of the present invention two.Such as, can perform the method shown in Fig. 3 by monitoring system server, it comprises the steps:
Step 21: receiving package is containing the warning message in the region of breaking down.
Step 22: determine the server needing protection according to the region of breaking down in warning message.Receive warning message in step 21 after, just can be determined that by the region of breaking down comprised in warning message which server has the risk of operation, thus need to protect it.
Step 23: send control command to the server in the region of breaking down.Such as, described control command can be used to indicate the order that server reduces running frequency.
Concrete, monitoring system server can operate the server in the region of breaking down, such as, server in refrigeration fault zone or outage area.Monitoring system server can by the baseboard management controller of Control Server (Baseboard Management Controller; BMC) power consumption is limited to the order of (power capping) in specified scope; reduce to make the frequency of server; the reduction of server frequency just means the reduction of power consumption; thus the heat produced also reduces, under making infrastructure device failure condition, server is protected.Control to server of the present invention is not limited to reduce the running frequency of server, also can be other controls for limiting server power consumption, such as, server is placed in resting state etc.
By the guard method of the server of the present embodiment; the server needing protection can be determined according to the region of breaking down in the warning message received; and to the order needing the server of protection to send frequency reducing; complete the protection to server; thus even infrastructure device is broken down, therefore the service operation of server also can not interrupt, in addition; without the need to increasing redundance unit in engineering, engineering cost is had obvious reduction.
Further, warning message also comprises the type of the infrastructure device broken down, wherein, send control command to comprise to the process of the server in the region of breaking down: send specific control command to the server in the region of breaking down according to the type of the infrastructure device broken down.Concrete; be with the difference of step 23 herein; warning message further comprises the type of the infrastructure device broken down except comprising the region of breaking down; now monitoring system server can send different control commands according to the type difference of the infrastructure device broken down, and more effectively protects the server in region of breaking down.
Embodiment three
The present embodiment relates generally to the running environment supervisory control system of server, and Fig. 4 is the structural representation of the running environment supervisory control system of the server of the embodiment of the present invention three, and as shown in Figure 4, it comprises:
Collecting device 31, for gathering the running state information of infrastructure device.
Equipment 32 is determined in fault zone, for the region determining to break down according to running state information.
Switch 33, for sending to monitoring system server by the warning message comprising the region of breaking down.
By this running environment supervisory control system, can realize monitoring the running environment of server, when infrastructure device breaks down or humiture is too high, warning message can be sent timely to monitoring system server, thus not interrupt providing safeguard for the service operation of server.
Further, infrastructure device comprises uninterrupted power supply, humiture equipment and air-conditioning.
Further, running state information comprises the information of normal operating condition or failure operation state, or the trip information of infrastructure device.
Further, fault zone determine equipment specifically for: determine the infrastructure device broken down according to running state information; The region of breaking down corresponding with infrastructure device is obtained according to the infrastructure device determining to break down.
Further, switch is also for sending to monitoring system server by the type of the infrastructure device broken down determined.
Further, the running state information gathering infrastructure device comprises: the running state information being gathered infrastructure device by embedded collecting device.
Further, switch sends warning message based on any one agreement in MODBUS-TCP/IP, OPC, SOCKET.
Embodiment four
The present embodiment relates generally to the protective device of server, and Fig. 5 is the structural representation of the protective device of the server of the embodiment of the present invention four.As shown in Figure 5, described protective device comprises:
Receiver module 41, for the warning message of receiving package containing the region of breaking down.
Determination module 42, for determining the server needing protection according to the region of breaking down in warning message.
Sending module 43, for sending control command to the server in the region of breaking down.
By this protective device; the server needing protection can be determined according to the region of breaking down in the warning message received; and to the order needing the server of protection to send frequency reducing; thus the protection completed server; in addition; without the need to increasing redundance unit in engineering, engineering cost is had obvious reduction.
Further, control command is be used to indicate the order that server reduces running frequency.
Further, warning message also comprises the type of the infrastructure device broken down, and wherein, sending module is also for sending specific control command to the server in the region of breaking down according to the type of the infrastructure device broken down.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of described claim.

Claims (20)

1. a running environment method for supervising for server, is characterized in that, described method comprises:
Gather the running state information of infrastructure device;
The region of breaking down is determined according to described running state information;
The warning message in the region of breaking down described in comprising is sent to monitoring system server.
2. method according to claim 1, is characterized in that, described infrastructure device comprises uninterrupted power supply, humiture equipment and air-conditioning.
3. method according to claim 2, is characterized in that, described running state information comprises the information of normal operating condition or failure operation state, or the trip information of described infrastructure device.
4. method according to claim 3, is characterized in that, describedly determines that according to described running state information the process in the region of breaking down comprises:
The infrastructure device broken down is determined according to described running state information;
The region of breaking down described in corresponding with described infrastructure device according to the infrastructure device acquisition determining to break down.
5. method according to claim 4, is characterized in that, the described warning message by the region of breaking down described in comprising sends to the process of monitoring system server also to comprise:
The type of the infrastructure device broken down determined is sent to described monitoring system server.
6. method according to claim 5, is characterized in that, the process of the running state information of described collection infrastructure device comprises:
The running state information of infrastructure device is gathered by embedded collecting device.
7. the method according to any one of claim 1 ~ 6, it is characterized in that, the described warning message by the region of breaking down described in comprising sends to the process of monitoring system server to be connect based on the transmission control protocol being used for industry spot/Internet Protocol MODBUS-TCP/IP, the object that is used for process control and embed any one agreement in OPC, socket SOCKET and send.
8. a guard method for server, is characterized in that, described method comprises:
Receiving package is containing the warning message in the region of breaking down;
The server needing protection is determined according to the region of breaking down in described warning message;
Send control command to the server in described region of breaking down.
9. method according to claim 8, is characterized in that, described control command is be used to indicate the order that server reduces running frequency.
10. method according to claim 9, it is characterized in that, described warning message also comprises the type of the infrastructure device broken down, wherein, described transmission control command comprises to the process of the server in described region of breaking down: send specific control command to the server in described region of breaking down according to the type of the described infrastructure device broken down.
The running environment supervisory control system of 11. 1 kinds of servers, is characterized in that, described system comprises:
Collecting device, for gathering the running state information of infrastructure device;
Equipment is determined in fault zone, for the region determining to break down according to described running state information;
Switch, for sending to monitoring system server by the warning message in the region of breaking down described in comprising.
12. systems according to claim 11, is characterized in that, described infrastructure device comprises uninterrupted power supply, humiture equipment and air-conditioning.
13. systems according to claim 12, is characterized in that, described running state information comprises the information of normal operating condition or failure operation state, or the trip information of described infrastructure device.
14. systems according to claim 13, is characterized in that, described fault zone determine equipment for:
The infrastructure device broken down is determined according to described running state information;
The region of breaking down described in corresponding with described infrastructure device according to the infrastructure device acquisition determining to break down.
15. systems according to claim 14, is characterized in that, described switch is also for sending to described monitoring system server by the type of the infrastructure device broken down determined.
16. systems according to claim 15, is characterized in that, the running state information of described collection infrastructure device comprises:
The running state information of infrastructure device is gathered by embedded collecting device.
17. systems according to any one of claim 11 ~ 16, is characterized in that, described switch sends warning message based on any one agreement in MODBUS-TCP/IP, OPC, SOCKET.
The protective device of 18. 1 kinds of servers, is characterized in that, described device comprises:
Receiver module, for the warning message of receiving package containing the region of breaking down;
Determination module, for determining the server needing protection according to the region of breaking down in described warning message;
Sending module, for sending control command to the server in described region of breaking down.
19. devices according to claim 18, is characterized in that, described control command is be used to indicate the order that server reduces running frequency.
20. devices according to claim 19, it is characterized in that, described warning message also comprises the type of the infrastructure device broken down, wherein, described sending module also sends specific control command to the server in described region of breaking down for the type of the infrastructure device broken down described in basis.
CN201410645268.0A 2014-11-12 2014-11-12 Method and system for monitoring and protecting operating environment of server Pending CN104363120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410645268.0A CN104363120A (en) 2014-11-12 2014-11-12 Method and system for monitoring and protecting operating environment of server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410645268.0A CN104363120A (en) 2014-11-12 2014-11-12 Method and system for monitoring and protecting operating environment of server

Publications (1)

Publication Number Publication Date
CN104363120A true CN104363120A (en) 2015-02-18

Family

ID=52530347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410645268.0A Pending CN104363120A (en) 2014-11-12 2014-11-12 Method and system for monitoring and protecting operating environment of server

Country Status (1)

Country Link
CN (1) CN104363120A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955864A (en) * 2016-04-26 2016-09-21 浪潮(北京)电子信息产业有限公司 Power supply fault processing method, power supply module, monitoring management module and server
CN107272507A (en) * 2017-07-27 2017-10-20 郑州云海信息技术有限公司 A kind of device and method of thermo-humidistat cabinet and server power supply coordinated signals
CN111954318A (en) * 2020-07-20 2020-11-17 广东工贸职业技术学院 Equipment interconnection method, device and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833117A (en) * 2012-09-10 2012-12-19 山东省计算中心 Data center dynamic environment monitoring system and method on basis of IOT (Internet Of Things) technology
CN103543718A (en) * 2013-10-16 2014-01-29 浪潮创新科技有限公司 Internet of Things based intelligent IDC (Internet data center) computer room monitoring system
CN103684817A (en) * 2012-09-06 2014-03-26 百度在线网络技术(北京)有限公司 Monitoring method and system for data center

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103684817A (en) * 2012-09-06 2014-03-26 百度在线网络技术(北京)有限公司 Monitoring method and system for data center
CN102833117A (en) * 2012-09-10 2012-12-19 山东省计算中心 Data center dynamic environment monitoring system and method on basis of IOT (Internet Of Things) technology
CN103543718A (en) * 2013-10-16 2014-01-29 浪潮创新科技有限公司 Internet of Things based intelligent IDC (Internet data center) computer room monitoring system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955864A (en) * 2016-04-26 2016-09-21 浪潮(北京)电子信息产业有限公司 Power supply fault processing method, power supply module, monitoring management module and server
CN105955864B (en) * 2016-04-26 2019-05-28 浪潮(北京)电子信息产业有限公司 Power failure processing method, power module, monitoring management module and server
CN107272507A (en) * 2017-07-27 2017-10-20 郑州云海信息技术有限公司 A kind of device and method of thermo-humidistat cabinet and server power supply coordinated signals
CN107272507B (en) * 2017-07-27 2020-06-19 苏州浪潮智能科技有限公司 Device and method for linkage control of constant temperature and humidity cabinet and server power supply
CN111954318A (en) * 2020-07-20 2020-11-17 广东工贸职业技术学院 Equipment interconnection method, device and system
CN111954318B (en) * 2020-07-20 2022-06-10 广东工贸职业技术学院 Equipment interconnection method, device and system

Similar Documents

Publication Publication Date Title
CN108803552B (en) Monitoring system and monitoring method for equipment fault
CN102111310B (en) Method and system for monitoring content delivery network (CDN) equipment status
CN103905255A (en) Remote automatic alarm system and method for internal hardware operation faults of servers
CN106740991A (en) It is a kind of to be based on two and multiply two four/six line turnout drive systems for taking two frameworks
CN203376627U (en) Automatic power distribution on-line monitoring system
CN106249727A (en) A kind of management system of lithium battery pole slice assembly line
CN103490919A (en) Fault management system and fault management method
CN106443362A (en) Power distribution network fault detection system
CN107943670A (en) A kind of ups power equipment monitoring system
CN201909992U (en) Remote room environment monitoring device based on VPN (virtual private network) technology
CN104363120A (en) Method and system for monitoring and protecting operating environment of server
CN103940257B (en) Air-cooling island temperature control system
CN116582420A (en) Submarine data center double-CPU redundancy system, control method and controller
CN102889172B (en) System and method for sequentially recording faults of wind generating set
CN104765326A (en) Air discharge monitoring system
CN103353778A (en) Auxiliary monitoring, early-warning and processing system for power distribution automation
CN105743695A (en) Monitoring method and system based on IEC 104 protocol
CN204988339U (en) Computer lab environmental monitoring device
CN102331066B (en) Method for investigating indoor and outdoor communication fault points of VRV (Vacuum Reducer Valve) multi-connected air-condition unit
CN209166518U (en) A kind of substation's building environment online monitoring system based on wireless sensor technology
CN203837517U (en) Temperature control system of air cooling island
CN206891481U (en) A kind of dynamic environment monitoring system of grid automation computer room
CN205091588U (en) Supplementary monitoring system of circuit safety intelligence
CN107294790A (en) Controller node fault recovery method in a kind of group system
CN203773305U (en) Alarm system for plant flooding of hydropower station

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150218

RJ01 Rejection of invention patent application after publication