CN102736970A

CN102736970A - Method for monitoring activity state of operating system

Info

Publication number: CN102736970A
Application number: CN201210220128XA
Authority: CN
Inventors: 任华进; 顾春波; 刘海滨
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2012-06-29
Filing date: 2012-06-29
Publication date: 2012-10-17

Abstract

The invention provides a method for monitoring an activity state of an operating system. After the operating system runs, monitoring scrip is automatically executed under the operating system, so that a system network and a self-activity state are detected in real time, and detection information is regularly recorded in a monitoring text log; and a time point when a server fails and a probable failure expression can be accurately judged. The method has the advantages that 1) the running state of the server is accurately recorded, so that the influence of a subjective feedback mistake on failure judgment is avoided; and 2) a user can judge whether the failure expression is caused by halt because of system halt, blue screen, power failure and the like or caused by network interruption because of abnormal internal and external factors according to a record, so that corresponding solutions can be taken.

Description

A kind of method for supervising of operating system active state

Technical field

The present invention relates to specifically a kind of method for supervising of operating system active state.

Background technology

At the IDC machine room, directly through telecommunications or be communicated with core network and need professional is provided and serves, there are windows system and linux system two macrotaxonomies in the employed system of server at present a lot of users with own trust server; Possibly carry different application; When mechanical disorder took place, the information that user oneself obtains was that machine can not be communicated with, but machine still is that the suspension fault can't be known at the definite machine of delaying that when taken place; Can only notify the computer room administrator directly machine to be forced to restart; Back logon server system, confirm and judge that there is following problem in this judgment mode through system log message:

1) daily record of linux system can not be accomplished the real-time activity condition monitoring; The time of breaking down according to probably judgement system of log record generally speaking; When business is stablized even all do not have new log record in several days; Only when system restart, produce new " syslogd 1.4.1:restart. " record and the time point of corresponding system start-up; Information before this RP then is the daily record that random device ruuning situation generates, so under the situation that the keeper can not monitor server ruuning situation in real time, the concrete time that clearly judgement system breaks down; The system journal of windows system can be write down true deadlock time point;

2) for the machine of long-range trustship; No matter the windows system still be the linux system server when crash, when accident power-off or network failure cause network to interrupt; The getable failure message of long-distance user is exactly that ping is obstructed; Can't judge that this moment, fault was network failure or mechanical disorder; Since different phenomena of the failure possibly imply different faults maybe, if can't confirm phenomenon just can't be clear and definite next step solution, so often the localization of fault of server and quick-recovery are soon brought very big difficulty.

Summary of the invention

The method for supervising that the purpose of this invention is to provide a kind of operating system active state.

The objective of the invention is to realize by following mode; System operation back automatically performs monitoring script under operating system; Real-time detecting system network and own activity state; And will detect the information time recording in the daily record of monitoring text, and accurately judge time point and the general fault performance that takes place when server breaks down, concrete steps are following:

1) will automatically perform script according to the operating system classification and insert start operation automatically, windows pulls script jiankong.cmd into startup-operation place, and the linux system will carry out script jiankong.sh and write among the rc.local, guarantees machine start operation automatically;

2) Windows script: start back system script automatically performs; Elder generation register system start-up time, and generation system log-on message record " system is on ", real time record time afterwards; And order held stationary machine ip in the direct ping network through ping; Record ping command execution results, circulation is carried out, the active situation of supervisory system and network UNICOM situation;

3) Linux script: automatically perform after the start, first register system log-on message " system is on ", then writing time point; Through fixing mode of ip machine in the ping network; Judge whether network state is unobstructed, as can leading to by ping record network is up information; Otherwise record network is down information, circulation is carried out;

4) determination methods: mechanical disorder the back occurs and carries out the failure condition judgement according to the information of collecting, and each machine startup all writes down " system is on " record, checks that the time that mechanical disorder takes place can be confirmed in the final time point record before it; If ping command execution before is normal, explain that failure condition is deadlock or accident power-off, restarts, if ping command execution aborted, but the date also have record, explain that machine does not crash, just network has interrupted.

The invention has the beneficial effects as follows:

1) accurate recording operation condition of server is avoided the influence of subjective feedback mistake to fault judgement;

2) can judge the fault performance according to record is because the machine of the delaying fault of generations such as system in case of system halt, blue screen, outage or the network that unusual inside and outside factor causes interrupt, and is convenient to take corresponding solution;

3) implementation is simple, and monitoring script directly is set to automated system operation, advantages such as easy operating realization.

Description of drawings

Fig. 1 is the monitoring step synoptic diagram.

Embodiment

Explanation at length below with reference to Figure of description method of the present invention being done.

Concrete implementation procedure is following:

4) determination methods: mechanical disorder the back occurs and carries out the failure condition judgement according to the information of collecting, and each machine startup all writes down " system is on " record, checks that the time that mechanical disorder takes place can be confirmed in the final time point record before it; If ping command execution before is normal, explain that failure condition is deadlock or accident power-off, restarts etc., if ping command execution aborted, but the date also have record, explain that machine does not crash, but network has interrupted;

5) realize script: (annotate: xxx.xxx.xxx.xxx is stable ip address in the network, is used for doing the network monitoring RP) windows: ping.cmd echo on

echo?system?is?on?>>c:\test.txt

:ag

echo?%date%?>>c:\test.txt

echo?%time%?>>c:\test.txt

c:\windows\system32\ping?xxx.xxx.xxx.xxx?-n?4?>>c:\test.txt

goto?ag

Linux script: jiankong.sh (annotate: xxx.xxx.xxx.xxx is stable ip address in the network, is used for doing the network monitoring RP)

#!/bin/bash

echo?"system?is?on"?>>?/test.txt

while?:;do

echo?`date`?>>/test.txt

ping?-c?2?xxx.xxx.xxx.xxx?&>/dev/null

if?[?$ =?0?];then

echo?"network?is?up"?>>/test.txt

else

echo?"network?is?down"?>>/test.txt

fi

sleep?2

done

6) instance sample execution result

The windows system test.txt displaying contents after the script executing):

system?is?on

2011/09/14 Wednesday

17:51:02.69?

The data that just have 32 bytes at Ping 10.7.255.254:

Answer from 10.7.255.254: byte=32 times=3ms TTL=255

Answer from 10.7.255.254: byte=32 times < 1ms TTL=255

10.7.255.254 the Ping statistical information:

Packet: send=4, receive=4, lose=0 (0% loses),

The estimated time of round trip (is unit with the millisecond):

Weak point=0ms, length=3ms is average=0ms

2011/09/14 Wednesday

17:51:05.93

The data that just have 32 bytes at Ping 10.7.255.254:

Answer from 10.7.255.254: byte=32 times < 1ms TTL=255

10.7.255.254 the Ping statistical information:

Packet: send=4, receive=4, lose=0 (0% loses),

The estimated time of round trip (is unit with the millisecond):

Weak point=0ms, length=0ms is average=0ms

……

Linux script executing result:

[rootlocalhost?~]#?cat?/test.txt

system?is?on

Wed?Sep?7?12:31:54?CST?2011

network?is?up

Wed?Sep?7?12:31:57?CST?2011

network?is?up

Wed?Sep?7?12:32:00?CST?2011

network?is?up

Wed?Sep?7?12:32:03?CST?2011

network?is?up

Wed?Sep?7?12:32:06?CST?2011

network?is?up

Wed?Sep?7?12:32:09?CST?2011

network?is?down

6) the information log file is judged flow process.

Except that the described technical characterictic of instructions, be the known technology of those skilled in the art.

Claims

1. the method for supervising of an operating system active state; It is characterized in that system operation back automatically performs monitoring script under operating system; Real-time detecting system network and own activity state; And will detect the information time recording in the daily record of monitoring text, and accurately judge time point and the general fault performance that takes place when server breaks down, concrete steps are following: