CN110704281A - Method for monitoring system operation - Google Patents

Method for monitoring system operation Download PDF

Info

Publication number
CN110704281A
CN110704281A CN201910971300.7A CN201910971300A CN110704281A CN 110704281 A CN110704281 A CN 110704281A CN 201910971300 A CN201910971300 A CN 201910971300A CN 110704281 A CN110704281 A CN 110704281A
Authority
CN
China
Prior art keywords
monitoring
pid
daemon
module
daemon process
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910971300.7A
Other languages
Chinese (zh)
Inventor
黄刚
张小亮
李若寒
李岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Chaoyue CNC Electronics Co Ltd
Original Assignee
Shandong Chaoyue CNC Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Chaoyue CNC Electronics Co Ltd filed Critical Shandong Chaoyue CNC Electronics Co Ltd
Priority to CN201910971300.7A priority Critical patent/CN110704281A/en
Publication of CN110704281A publication Critical patent/CN110704281A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for monitoring system operation, and relates to the field of system safety; after the daemon process is down, the synchronous pid number in the monitoring configuration file is changed into 0, whether the daemon process exits accidentally is monitored, the daemon process exiting accidentally is restarted forcibly, the pid of each process in the configuration file is read, and the pid and the daemon process automatically record the pid as an asynchronous mechanism, so that the abnormity of each daemon process can be found in time without mutual interference, the normal operation of the system is ensured, the states of each module of the system are monitored, the system alarm and log storage can be started in one step, and a user can know the abnormal state of each module visually.

Description

Method for monitoring system operation
Technical Field
The invention discloses a method for monitoring system operation, and relates to the field of system safety.
Background
Software Systems (Software Systems) refer to computer Software Systems comprised of system Software, support Software, and application Software, which are parts of the computer Systems comprised of Software. With the development of informatization, the application of a software system in equipment is more and more popular, the interaction between the software system and a hardware system is more and more common, and the stability and the disaster tolerance capability of the equipment system in the operation process are more and more important. The invention discloses a method for monitoring system operation, after the daemon process of the system is started, the pid of the daemon process is synchronized into a monitoring configuration file, whether the pid of the daemon process is 0 is monitored by reading the monitoring configuration file, the daemon process with the pid of 0 is forcibly restarted, meanwhile, the state data of each module of the system is monitored and obtained, the state of each software daemon process and the operation state of each unit module can be monitored, data communication is carried out with each module at regular time, so as to obtain the operation state of each module, and when the operation state is abnormal, the functions of alarming and log storage are realized.
Disclosure of Invention
The invention provides a method for monitoring the operation of a system, which can monitor the state of each module of the system, carry out data communication with each module at regular time to acquire the operation state of each module, and realize the functions of alarming and log storage when the operation state is abnormal.
The specific scheme provided by the invention is as follows:
a method of monitoring operation of a system: after the daemon process of the system is started, synchronizing the pid of the daemon process into the monitoring configuration file, monitoring whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarting the daemon process with the pid of 0,
and meanwhile, monitoring and acquiring the state data of each module of the system.
The method comprises the steps that a monitoring configuration file comprises all key functions of the daemon process to be monitored, the pid of the daemon process is 0, when the daemon process exits unexpectedly, a corresponding starting command is called to start the key functions, and the daemon process is restarted forcibly.
In the method, the pid of each daemon process in the system monitoring configuration file is read at regular intervals.
In the method, the state data of each module of the system is requested at regular time, and whether the state of each module is normal or not is monitored.
In the method, CRC (cyclic redundancy check) is carried out before the status data of each module of the system is requested at regular time, and the status data of each module of the system is obtained after the CRC passes.
A tool for monitoring system operation comprises a monitoring module,
wherein, after the daemon process of the system is started, the monitoring module synchronizes the pid of the daemon process to the monitoring configuration file, monitors whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarts the daemon process with the pid of 0,
and meanwhile, the monitoring module monitors and acquires the state data of each module of the system.
The monitoring configuration file in the tool comprises all the key functions of the daemon process to be monitored, the pid of the daemon process is 0, and when the daemon process exits unexpectedly, the monitoring module calls a corresponding starting command to start the key functions and forcibly restarts the daemon process.
And the monitoring module in the tool reads the pid of each daemon process in the system monitoring configuration file at regular intervals.
The monitoring module in the tool requests the state data of each module of the system at regular time and monitors whether the state of each module is normal or not.
And the monitoring module in the tool performs CRC (cyclic redundancy check) on the system before requesting the state data of each module of the system at regular time, and acquires the state data of each module of the system after the CRC passes.
The invention has the advantages that:
the invention provides a method for monitoring system operation, which utilizes that after a daemon process is down, the synchronous pid number in a monitoring configuration file becomes 0, monitors whether the daemon process exits accidentally, and restarts the daemon process which exits accidentally, reads the pid of each process in the configuration file, and automatically records the pid as an asynchronous mechanism together with the daemon process, thereby being capable of finding the abnormity of each daemon process in time without mutual interference, ensuring the normal operation of the system, monitoring the state of each module of the system, starting system alarm and log storage in one step, and facilitating the user to know the abnormal state of each module intuitively.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic flow diagram of a monitoring daemon;
FIG. 3 is a flow diagram of modules in a monitoring system.
Detailed Description
The invention provides a method for monitoring system operation, which comprises the following steps: after the daemon process of the system is started, synchronizing the pid of the daemon process into the monitoring configuration file, monitoring whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarting the daemon process with the pid of 0,
and meanwhile, monitoring and acquiring the state data of each module of the system.
Meanwhile, a tool for monitoring the operation of the system corresponding to the method is provided, which comprises a monitoring module,
wherein, after the daemon process of the system is started, the monitoring module synchronizes the pid of the daemon process to the monitoring configuration file, monitors whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarts the daemon process with the pid of 0,
and meanwhile, the monitoring module monitors and acquires the state data of each module of the system.
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
Taking a network cipher machine system as an example, the method of the invention is used for monitoring the system operation, and the specific process is as follows:
after the daemon process of the network cipher machine system is started, the daemon process comprises management processes of all modules of the network cipher machine, a web interface management process and the like, a daemon process number pid is written into a monitoring configuration file process, the monitoring configuration file is read every 10 seconds to monitor whether the pid of the daemon process is 0 or not, the daemon process with the pid of 0 is forcibly restarted,
and simultaneously, requesting state data of each module of the system at regular time, monitoring whether the state of each module is normal, wherein each module of the system can be a CPU, a communication module, a password processing module and the like, the obtained data can comprise the CPU utilization rate, the memory utilization rate, the hard disk utilization rate, the running time, the communication efficiency, the password access data and the like, and when the running state data is abnormal or exceeds a certain threshold value, an alarm thread is started for alarming and log storage.
In the process, the monitoring configuration file can comprise key functions of all daemon processes to be monitored, names, pids, starting commands and the like of all the daemon processes, when the pids of all the daemon processes are 0 or the processes do not respond, and when the daemon processes exit unexpectedly, the corresponding starting commands are called to start the key functions, so that the daemon processes are restarted forcibly;
meanwhile, CRC check is carried out before status data of each module of the system is requested at regular time, when the system status data is sent, a sending party calculates the CRC value of a data packet and adds the CRC value into the data packet, a receiving party, namely a monitored module, calculates the CRC value after receiving the data packet and compares the CRC value with the CRC value transmitted in the data packet, and the CRC check is passed, an operation status data packet is returned, wherein the operation status data packet can contain status data information of each module of the system, such as system operation time, CPU utilization rate, memory utilization rate, hard disk utilization rate and the like,
when the state data of a certain module is acquired to be abnormal, an alarm thread is started, a buzzer and an indicator lamp are executed for alarming, and meanwhile abnormal information is written into a log database.
Still take the network cipher machine system as an example, the tool of the invention is used for monitoring the system operation, and the concrete process is as follows:
after the daemon process of the network cipher machine system is started, the daemon process comprises management processes of all modules of the network cipher machine, a web interface management process and the like, a monitoring module of the tool can be used for writing a daemon process number pid into a monitoring configuration file process, reading the monitoring configuration file every 10 seconds to monitor whether the pid of the daemon process is 0 or not, forcibly restarting the daemon process with the pid of 0,
meanwhile, the monitoring module requests state data of each module of the system at regular time and monitors whether the state of each module is normal or not, each module of the system can be a CPU, a communication module, a password processing module and the like, data acquired by the monitoring module can comprise CPU utilization rate, memory utilization rate, hard disk utilization rate, running time, communication efficiency, password access data and the like, and when the running state data is abnormal or exceeds a certain threshold value, the monitoring module can alarm and perform log storage by using a mode of starting an alarm thread.
In the process, the monitoring configuration file can comprise key functions of all daemon processes to be monitored, names, pids, starting commands and the like of all the daemon processes, when the pids of all the daemon processes are 0 or the processes do not respond, and when the daemon processes exit unexpectedly, a monitoring module calls the corresponding starting commands to start the key functions and forcibly restart the daemon processes;
meanwhile, before the monitoring module requests the status data of each module of the system in a timing manner, CRC check is carried out on the status data and the system, when the system status data is sent, the monitoring module of a sending party calculates the CRC value of a data packet and adds the CRC value into the data packet, after a receiving party receives the data packet, the monitoring module calculates the CRC value and compares the CRC value with the CRC value transmitted in the data packet, after the CRC check is passed, an operation status data packet is returned to the monitoring module, and the operation status data packet can contain the status data information of each module of the system, such as system operation time, CPU utilization rate, memory utilization rate, hard disk utilization rate and the like,
when the monitoring module acquires that state data of a certain module is abnormal, an alarm thread is started, a buzzer and an indicator lamp are executed for alarming, and abnormal information is written into a log database.
The method or the tool can monitor each system software process, the pid of the process in the configuration file can be changed into 0 after the process is accidentally down, the configuration file is read at regular time, when the pid of a certain process is monitored to be 0, the daemon process is forcibly restarted to ensure the normal operation of the system, then, the state of each module of the system is monitored, data communication is carried out with each module at regular time to obtain the operation state of each module, and when the operation state is abnormal, alarming and log storage can be further realized.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (10)

1. A method for monitoring system operation is characterized in that after a daemon process of the system is started, the pid of the daemon process is synchronized into a monitoring configuration file, whether the pid of the daemon process is 0 is monitored by reading the monitoring configuration file, the daemon process with the pid of 0 is forcibly restarted,
and meanwhile, monitoring and acquiring the state data of each module of the system.
2. The method as claimed in claim 1, wherein the monitoring configuration file includes key functions of all daemon processes to be monitored, the pids of the daemon processes are 0, and when the daemon processes exit unexpectedly, the corresponding start command is called to start the key functions, so as to restart the daemon processes forcibly.
3. A method as claimed in claim 1 or 2, wherein the pid of each daemon in the system monitoring configuration file is read at regular intervals.
4. A method as claimed in claim 3, in which status data is periodically requested from the modules of the system to monitor the status of the modules.
5. The method as claimed in claim 4, wherein the status data of the modules of the system is periodically requested and a CRC check is performed, and the status data of the modules of the system is acquired after the CRC check is passed.
6. A tool for monitoring the operation of a system is characterized by comprising a monitoring module,
wherein, after the daemon process of the system is started, the monitoring module synchronizes the pid of the daemon process to the monitoring configuration file, monitors whether the pid of the daemon process is 0 or not by reading the monitoring configuration file, forcibly restarts the daemon process with the pid of 0,
and meanwhile, the monitoring module monitors and acquires the state data of each module of the system.
7. The tool of claim 6, wherein the monitoring configuration file includes key functions of all daemon processes to be monitored, the pid of the daemon process is 0, and when the daemon process exits unexpectedly, the monitoring module calls a corresponding start command to start the key functions to forcibly restart the daemon process.
8. A tool as claimed in claim 6 or claim 7 wherein the monitor module reads the pid of each daemon in the system monitor configuration file at regular intervals.
9. The tool of claim 8, wherein the monitoring module periodically requests status data of the modules of the system to monitor whether the status of the modules is normal.
10. The tool of claim 9, wherein the monitoring module periodically requests status data from the modules of the system before performing a CRC check with the system, and wherein the status data from the modules of the system is obtained after the CRC check passes.
CN201910971300.7A 2019-10-14 2019-10-14 Method for monitoring system operation Pending CN110704281A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910971300.7A CN110704281A (en) 2019-10-14 2019-10-14 Method for monitoring system operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910971300.7A CN110704281A (en) 2019-10-14 2019-10-14 Method for monitoring system operation

Publications (1)

Publication Number Publication Date
CN110704281A true CN110704281A (en) 2020-01-17

Family

ID=69198826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910971300.7A Pending CN110704281A (en) 2019-10-14 2019-10-14 Method for monitoring system operation

Country Status (1)

Country Link
CN (1) CN110704281A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111682977A (en) * 2020-04-30 2020-09-18 普联技术有限公司 Method and device for processing exception of network equipment, storage medium and network equipment
CN113312246A (en) * 2021-06-07 2021-08-27 海光信息技术股份有限公司 Control method, device, platform, equipment and storage medium of verification environment
CN114791835A (en) * 2022-03-16 2022-07-26 青岛海尔科技有限公司 Program restarting method and device, storage medium and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9822129D0 (en) * 1998-10-09 1998-12-02 Sun Microsystems Inc Process monitoring in a computer system
KR101249486B1 (en) * 2011-12-09 2013-04-01 주식회사 시큐아이 Method and system of managing daemon
CN103678084A (en) * 2012-09-21 2014-03-26 成都勤智数码科技股份有限公司 Flexible application process guarding method
CN107391343A (en) * 2017-07-25 2017-11-24 郑州云海信息技术有限公司 A kind of performance monitoring system and method
CN109426591A (en) * 2017-09-04 2019-03-05 武汉斗鱼网络科技有限公司 Guard the method and apparatus of multiple processes of the single program of windows

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9822129D0 (en) * 1998-10-09 1998-12-02 Sun Microsystems Inc Process monitoring in a computer system
KR101249486B1 (en) * 2011-12-09 2013-04-01 주식회사 시큐아이 Method and system of managing daemon
CN103678084A (en) * 2012-09-21 2014-03-26 成都勤智数码科技股份有限公司 Flexible application process guarding method
CN107391343A (en) * 2017-07-25 2017-11-24 郑州云海信息技术有限公司 A kind of performance monitoring system and method
CN109426591A (en) * 2017-09-04 2019-03-05 武汉斗鱼网络科技有限公司 Guard the method and apparatus of multiple processes of the single program of windows

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
网友: ""守护进程:看门狗watchdog的添加"", 《HTTPS://WWW.CNBLOGS.COM/SECONDTONONEWE/P/6098143.HTML》 *
网友: ""守护进程如何实现监视,一发现自己异常退出就重新启动?"", 《HTTPS://BBS.CSDN.NET/TOPICS/370002952》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111682977A (en) * 2020-04-30 2020-09-18 普联技术有限公司 Method and device for processing exception of network equipment, storage medium and network equipment
CN113312246A (en) * 2021-06-07 2021-08-27 海光信息技术股份有限公司 Control method, device, platform, equipment and storage medium of verification environment
CN114791835A (en) * 2022-03-16 2022-07-26 青岛海尔科技有限公司 Program restarting method and device, storage medium and electronic device
CN114791835B (en) * 2022-03-16 2023-11-28 青岛海尔科技有限公司 Program restarting method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN110704281A (en) Method for monitoring system operation
US9189348B2 (en) High availability database management system and database management method using same
WO2016183967A1 (en) Failure alarm method and apparatus for key component, and big data management system
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
WO2018214887A1 (en) Data storage method, storage server, storage medium and system
US7478273B2 (en) Computer system including active system and redundant system and state acquisition method
CN109245962B (en) Server monitoring method, system, computer equipment and storage medium
CN102708150A (en) Method, device and system for asynchronously copying data
WO2015007091A1 (en) Data record generating method and device
CN107729213B (en) Background task monitoring method and device
CN113608964A (en) Cluster automation monitoring method and device, electronic equipment and storage medium
US11930292B2 (en) Device state monitoring method and apparatus
CN109474470A (en) One kind is from monitoring method and device
CN115543740A (en) Method, system, equipment and storage medium for monitoring abnormal operation of service
CN108958965A (en) A kind of BMC monitoring can restore the method, device and equipment of ECC error
US7877646B2 (en) Method and system for monitoring a computing device
CN116010199A (en) Application service self-adjustment method, device, computer equipment and storage medium
CN114490196A (en) Database switching method, system, device and medium
CN112187537A (en) Method, device and equipment for synchronizing assets to security component
CN114327969A (en) Information acquisition method and device, computer equipment and computer storage medium
CN113312320A (en) Method and system for acquiring user operation database behavior
CN113127435A (en) Intelligent synchronization method and system for files of main and standby systems
CN109062718B (en) Server and data processing method
CN112000442A (en) Method and device for automatically acquiring cluster state based on kubernets platform
CN106250255B (en) A kind of management method and device of system exception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117

RJ01 Rejection of invention patent application after publication