CN110704281A - Method for monitoring system operation - Google Patents
Method for monitoring system operation Download PDFInfo
- Publication number
- CN110704281A CN110704281A CN201910971300.7A CN201910971300A CN110704281A CN 110704281 A CN110704281 A CN 110704281A CN 201910971300 A CN201910971300 A CN 201910971300A CN 110704281 A CN110704281 A CN 110704281A
- Authority
- CN
- China
- Prior art keywords
- monitoring
- pid
- daemon
- module
- daemon process
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/302—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1004—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a method for monitoring system operation, and relates to the field of system safety; after the daemon process is down, the synchronous pid number in the monitoring configuration file is changed into 0, whether the daemon process exits accidentally is monitored, the daemon process exiting accidentally is restarted forcibly, the pid of each process in the configuration file is read, and the pid and the daemon process automatically record the pid as an asynchronous mechanism, so that the abnormity of each daemon process can be found in time without mutual interference, the normal operation of the system is ensured, the states of each module of the system are monitored, the system alarm and log storage can be started in one step, and a user can know the abnormal state of each module visually.
Description
Technical Field
The invention discloses a method for monitoring system operation, and relates to the field of system safety.
Background
Software Systems (Software Systems) refer to computer Software Systems comprised of system Software, support Software, and application Software, which are parts of the computer Systems comprised of Software. With the development of informatization, the application of a software system in equipment is more and more popular, the interaction between the software system and a hardware system is more and more common, and the stability and the disaster tolerance capability of the equipment system in the operation process are more and more important. The invention discloses a method for monitoring system operation, after the daemon process of the system is started, the pid of the daemon process is synchronized into a monitoring configuration file, whether the pid of the daemon process is 0 is monitored by reading the monitoring configuration file, the daemon process with the pid of 0 is forcibly restarted, meanwhile, the state data of each module of the system is monitored and obtained, the state of each software daemon process and the operation state of each unit module can be monitored, data communication is carried out with each module at regular time, so as to obtain the operation state of each module, and when the operation state is abnormal, the functions of alarming and log storage are realized.
Disclosure of Invention
The invention provides a method for monitoring the operation of a system, which can monitor the state of each module of the system, carry out data communication with each module at regular time to acquire the operation state of each module, and realize the functions of alarming and log storage when the operation state is abnormal.
The specific scheme provided by the invention is as follows:
a method of monitoring operation of a system: after the daemon process of the system is started, synchronizing the pid of the daemon process into the monitoring configuration file, monitoring whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarting the daemon process with the pid of 0,
and meanwhile, monitoring and acquiring the state data of each module of the system.
The method comprises the steps that a monitoring configuration file comprises all key functions of the daemon process to be monitored, the pid of the daemon process is 0, when the daemon process exits unexpectedly, a corresponding starting command is called to start the key functions, and the daemon process is restarted forcibly.
In the method, the pid of each daemon process in the system monitoring configuration file is read at regular intervals.
In the method, the state data of each module of the system is requested at regular time, and whether the state of each module is normal or not is monitored.
In the method, CRC (cyclic redundancy check) is carried out before the status data of each module of the system is requested at regular time, and the status data of each module of the system is obtained after the CRC passes.
A tool for monitoring system operation comprises a monitoring module,
wherein, after the daemon process of the system is started, the monitoring module synchronizes the pid of the daemon process to the monitoring configuration file, monitors whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarts the daemon process with the pid of 0,
and meanwhile, the monitoring module monitors and acquires the state data of each module of the system.
The monitoring configuration file in the tool comprises all the key functions of the daemon process to be monitored, the pid of the daemon process is 0, and when the daemon process exits unexpectedly, the monitoring module calls a corresponding starting command to start the key functions and forcibly restarts the daemon process.
And the monitoring module in the tool reads the pid of each daemon process in the system monitoring configuration file at regular intervals.
The monitoring module in the tool requests the state data of each module of the system at regular time and monitors whether the state of each module is normal or not.
And the monitoring module in the tool performs CRC (cyclic redundancy check) on the system before requesting the state data of each module of the system at regular time, and acquires the state data of each module of the system after the CRC passes.
The invention has the advantages that:
the invention provides a method for monitoring system operation, which utilizes that after a daemon process is down, the synchronous pid number in a monitoring configuration file becomes 0, monitors whether the daemon process exits accidentally, and restarts the daemon process which exits accidentally, reads the pid of each process in the configuration file, and automatically records the pid as an asynchronous mechanism together with the daemon process, thereby being capable of finding the abnormity of each daemon process in time without mutual interference, ensuring the normal operation of the system, monitoring the state of each module of the system, starting system alarm and log storage in one step, and facilitating the user to know the abnormal state of each module intuitively.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic flow diagram of a monitoring daemon;
FIG. 3 is a flow diagram of modules in a monitoring system.
Detailed Description
The invention provides a method for monitoring system operation, which comprises the following steps: after the daemon process of the system is started, synchronizing the pid of the daemon process into the monitoring configuration file, monitoring whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarting the daemon process with the pid of 0,
and meanwhile, monitoring and acquiring the state data of each module of the system.
Meanwhile, a tool for monitoring the operation of the system corresponding to the method is provided, which comprises a monitoring module,
wherein, after the daemon process of the system is started, the monitoring module synchronizes the pid of the daemon process to the monitoring configuration file, monitors whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarts the daemon process with the pid of 0,
and meanwhile, the monitoring module monitors and acquires the state data of each module of the system.
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
Taking a network cipher machine system as an example, the method of the invention is used for monitoring the system operation, and the specific process is as follows:
after the daemon process of the network cipher machine system is started, the daemon process comprises management processes of all modules of the network cipher machine, a web interface management process and the like, a daemon process number pid is written into a monitoring configuration file process, the monitoring configuration file is read every 10 seconds to monitor whether the pid of the daemon process is 0 or not, the daemon process with the pid of 0 is forcibly restarted,
and simultaneously, requesting state data of each module of the system at regular time, monitoring whether the state of each module is normal, wherein each module of the system can be a CPU, a communication module, a password processing module and the like, the obtained data can comprise the CPU utilization rate, the memory utilization rate, the hard disk utilization rate, the running time, the communication efficiency, the password access data and the like, and when the running state data is abnormal or exceeds a certain threshold value, an alarm thread is started for alarming and log storage.
In the process, the monitoring configuration file can comprise key functions of all daemon processes to be monitored, names, pids, starting commands and the like of all the daemon processes, when the pids of all the daemon processes are 0 or the processes do not respond, and when the daemon processes exit unexpectedly, the corresponding starting commands are called to start the key functions, so that the daemon processes are restarted forcibly;
meanwhile, CRC check is carried out before status data of each module of the system is requested at regular time, when the system status data is sent, a sending party calculates the CRC value of a data packet and adds the CRC value into the data packet, a receiving party, namely a monitored module, calculates the CRC value after receiving the data packet and compares the CRC value with the CRC value transmitted in the data packet, and the CRC check is passed, an operation status data packet is returned, wherein the operation status data packet can contain status data information of each module of the system, such as system operation time, CPU utilization rate, memory utilization rate, hard disk utilization rate and the like,
when the state data of a certain module is acquired to be abnormal, an alarm thread is started, a buzzer and an indicator lamp are executed for alarming, and meanwhile abnormal information is written into a log database.
Still take the network cipher machine system as an example, the tool of the invention is used for monitoring the system operation, and the concrete process is as follows:
after the daemon process of the network cipher machine system is started, the daemon process comprises management processes of all modules of the network cipher machine, a web interface management process and the like, a monitoring module of the tool can be used for writing a daemon process number pid into a monitoring configuration file process, reading the monitoring configuration file every 10 seconds to monitor whether the pid of the daemon process is 0 or not, forcibly restarting the daemon process with the pid of 0,
meanwhile, the monitoring module requests state data of each module of the system at regular time and monitors whether the state of each module is normal or not, each module of the system can be a CPU, a communication module, a password processing module and the like, data acquired by the monitoring module can comprise CPU utilization rate, memory utilization rate, hard disk utilization rate, running time, communication efficiency, password access data and the like, and when the running state data is abnormal or exceeds a certain threshold value, the monitoring module can alarm and perform log storage by using a mode of starting an alarm thread.
In the process, the monitoring configuration file can comprise key functions of all daemon processes to be monitored, names, pids, starting commands and the like of all the daemon processes, when the pids of all the daemon processes are 0 or the processes do not respond, and when the daemon processes exit unexpectedly, a monitoring module calls the corresponding starting commands to start the key functions and forcibly restart the daemon processes;
meanwhile, before the monitoring module requests the status data of each module of the system in a timing manner, CRC check is carried out on the status data and the system, when the system status data is sent, the monitoring module of a sending party calculates the CRC value of a data packet and adds the CRC value into the data packet, after a receiving party receives the data packet, the monitoring module calculates the CRC value and compares the CRC value with the CRC value transmitted in the data packet, after the CRC check is passed, an operation status data packet is returned to the monitoring module, and the operation status data packet can contain the status data information of each module of the system, such as system operation time, CPU utilization rate, memory utilization rate, hard disk utilization rate and the like,
when the monitoring module acquires that state data of a certain module is abnormal, an alarm thread is started, a buzzer and an indicator lamp are executed for alarming, and abnormal information is written into a log database.
The method or the tool can monitor each system software process, the pid of the process in the configuration file can be changed into 0 after the process is accidentally down, the configuration file is read at regular time, when the pid of a certain process is monitored to be 0, the daemon process is forcibly restarted to ensure the normal operation of the system, then, the state of each module of the system is monitored, data communication is carried out with each module at regular time to obtain the operation state of each module, and when the operation state is abnormal, alarming and log storage can be further realized.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.
Claims (10)
1. A method for monitoring system operation is characterized in that after a daemon process of the system is started, the pid of the daemon process is synchronized into a monitoring configuration file, whether the pid of the daemon process is 0 is monitored by reading the monitoring configuration file, the daemon process with the pid of 0 is forcibly restarted,
and meanwhile, monitoring and acquiring the state data of each module of the system.
2. The method as claimed in claim 1, wherein the monitoring configuration file includes key functions of all daemon processes to be monitored, the pids of the daemon processes are 0, and when the daemon processes exit unexpectedly, the corresponding start command is called to start the key functions, so as to restart the daemon processes forcibly.
3. A method as claimed in claim 1 or 2, wherein the pid of each daemon in the system monitoring configuration file is read at regular intervals.
4. A method as claimed in claim 3, in which status data is periodically requested from the modules of the system to monitor the status of the modules.
5. The method as claimed in claim 4, wherein the status data of the modules of the system is periodically requested and a CRC check is performed, and the status data of the modules of the system is acquired after the CRC check is passed.
6. A tool for monitoring the operation of a system is characterized by comprising a monitoring module,
wherein, after the daemon process of the system is started, the monitoring module synchronizes the pid of the daemon process to the monitoring configuration file, monitors whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarts the daemon process with the pid of 0,
and meanwhile, the monitoring module monitors and acquires the state data of each module of the system.
7. The tool of claim 6, wherein the monitoring configuration file includes key functions of all daemon processes to be monitored, the pid of the daemon process is 0, and when the daemon process exits unexpectedly, the monitoring module calls a corresponding start command to start the key functions to forcibly restart the daemon process.
8. A tool as claimed in claim 6 or claim 7 wherein the monitor module reads the pid of each daemon in the system monitor configuration file at regular intervals.
9. The tool of claim 8, wherein the monitoring module periodically requests status data of the modules of the system to monitor whether the status of the modules is normal.
10. The tool of claim 9, wherein the monitoring module periodically requests status data from the modules of the system before performing a CRC check with the system, and wherein the status data from the modules of the system is obtained after the CRC check passes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910971300.7A CN110704281A (en) | 2019-10-14 | 2019-10-14 | Method for monitoring system operation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910971300.7A CN110704281A (en) | 2019-10-14 | 2019-10-14 | Method for monitoring system operation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110704281A true CN110704281A (en) | 2020-01-17 |
Family
ID=69198826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910971300.7A Pending CN110704281A (en) | 2019-10-14 | 2019-10-14 | Method for monitoring system operation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110704281A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111682977A (en) * | 2020-04-30 | 2020-09-18 | 普联技术有限公司 | Method and device for processing exception of network equipment, storage medium and network equipment |
CN113312246A (en) * | 2021-06-07 | 2021-08-27 | 海光信息技术股份有限公司 | Control method, device, platform, equipment and storage medium of verification environment |
CN114791835A (en) * | 2022-03-16 | 2022-07-26 | 青岛海尔科技有限公司 | Program restarting method and device, storage medium and electronic device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9822129D0 (en) * | 1998-10-09 | 1998-12-02 | Sun Microsystems Inc | Process monitoring in a computer system |
KR101249486B1 (en) * | 2011-12-09 | 2013-04-01 | 주식회사 시큐아이 | Method and system of managing daemon |
CN103678084A (en) * | 2012-09-21 | 2014-03-26 | 成都勤智数码科技股份有限公司 | Flexible application process guarding method |
CN107391343A (en) * | 2017-07-25 | 2017-11-24 | 郑州云海信息技术有限公司 | A kind of performance monitoring system and method |
CN109426591A (en) * | 2017-09-04 | 2019-03-05 | 武汉斗鱼网络科技有限公司 | Guard the method and apparatus of multiple processes of the single program of windows |
-
2019
- 2019-10-14 CN CN201910971300.7A patent/CN110704281A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9822129D0 (en) * | 1998-10-09 | 1998-12-02 | Sun Microsystems Inc | Process monitoring in a computer system |
KR101249486B1 (en) * | 2011-12-09 | 2013-04-01 | 주식회사 시큐아이 | Method and system of managing daemon |
CN103678084A (en) * | 2012-09-21 | 2014-03-26 | 成都勤智数码科技股份有限公司 | Flexible application process guarding method |
CN107391343A (en) * | 2017-07-25 | 2017-11-24 | 郑州云海信息技术有限公司 | A kind of performance monitoring system and method |
CN109426591A (en) * | 2017-09-04 | 2019-03-05 | 武汉斗鱼网络科技有限公司 | Guard the method and apparatus of multiple processes of the single program of windows |
Non-Patent Citations (2)
Title |
---|
网友: ""守护进程:看门狗watchdog的添加"", 《HTTPS://WWW.CNBLOGS.COM/SECONDTONONEWE/P/6098143.HTML》 * |
网友: ""守护进程如何实现监视,一发现自己异常退出就重新启动?"", 《HTTPS://BBS.CSDN.NET/TOPICS/370002952》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111682977A (en) * | 2020-04-30 | 2020-09-18 | 普联技术有限公司 | Method and device for processing exception of network equipment, storage medium and network equipment |
CN113312246A (en) * | 2021-06-07 | 2021-08-27 | 海光信息技术股份有限公司 | Control method, device, platform, equipment and storage medium of verification environment |
CN114791835A (en) * | 2022-03-16 | 2022-07-26 | 青岛海尔科技有限公司 | Program restarting method and device, storage medium and electronic device |
CN114791835B (en) * | 2022-03-16 | 2023-11-28 | 青岛海尔科技有限公司 | Program restarting method and device, storage medium and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110704281A (en) | Method for monitoring system operation | |
US9189348B2 (en) | High availability database management system and database management method using same | |
WO2016183967A1 (en) | Failure alarm method and apparatus for key component, and big data management system | |
CN112506702B (en) | Disaster recovery method, device, equipment and storage medium for data center | |
WO2018214887A1 (en) | Data storage method, storage server, storage medium and system | |
US7478273B2 (en) | Computer system including active system and redundant system and state acquisition method | |
CN109245962B (en) | Server monitoring method, system, computer equipment and storage medium | |
CN102708150A (en) | Method, device and system for asynchronously copying data | |
WO2015007091A1 (en) | Data record generating method and device | |
CN107729213B (en) | Background task monitoring method and device | |
CN113608964A (en) | Cluster automation monitoring method and device, electronic equipment and storage medium | |
US11930292B2 (en) | Device state monitoring method and apparatus | |
CN109474470A (en) | One kind is from monitoring method and device | |
CN115543740A (en) | Method, system, equipment and storage medium for monitoring abnormal operation of service | |
CN108958965A (en) | A kind of BMC monitoring can restore the method, device and equipment of ECC error | |
US7877646B2 (en) | Method and system for monitoring a computing device | |
CN116010199A (en) | Application service self-adjustment method, device, computer equipment and storage medium | |
CN114490196A (en) | Database switching method, system, device and medium | |
CN112187537A (en) | Method, device and equipment for synchronizing assets to security component | |
CN114327969A (en) | Information acquisition method and device, computer equipment and computer storage medium | |
CN113312320A (en) | Method and system for acquiring user operation database behavior | |
CN113127435A (en) | Intelligent synchronization method and system for files of main and standby systems | |
CN109062718B (en) | Server and data processing method | |
CN112000442A (en) | Method and device for automatically acquiring cluster state based on kubernets platform | |
CN106250255B (en) | A kind of management method and device of system exception |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200117 |
|
RJ01 | Rejection of invention patent application after publication |