CN105677501A - Refined process monitoring method and system based on watchdog in Linux system - Google Patents

Refined process monitoring method and system based on watchdog in Linux system Download PDF

Info

Publication number
CN105677501A
CN105677501A CN201610007790.5A CN201610007790A CN105677501A CN 105677501 A CN105677501 A CN 105677501A CN 201610007790 A CN201610007790 A CN 201610007790A CN 105677501 A CN105677501 A CN 105677501A
Authority
CN
China
Prior art keywords
thread
monitoring
canis familiaris
feeding
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610007790.5A
Other languages
Chinese (zh)
Other versions
CN105677501B (en
Inventor
彭鹏
吴军平
郑明�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fiberhome Telecommunication Technologies Co Ltd
Wuhan Fiberhome Technical Services Co Ltd
Original Assignee
Fiberhome Telecommunication Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fiberhome Telecommunication Technologies Co Ltd filed Critical Fiberhome Telecommunication Technologies Co Ltd
Priority to CN201610007790.5A priority Critical patent/CN105677501B/en
Publication of CN105677501A publication Critical patent/CN105677501A/en
Application granted granted Critical
Publication of CN105677501B publication Critical patent/CN105677501B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3017Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/88Monitoring involving counting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a refined process monitoring method and system based on a watchdog in a Linux system and relates to the technical field of process monitoring in the Linux system. The method comprises the steps of judging whether process monitoring this time is static monitoring or dynamic monitoring according to an established monitoring configuration file; monitoring each service process by sending specific signals regularly if process monitoring this time is static monitoring, and resetting the system by stopping 'dog feeding' once it is found that no service process exists; starting service process monitoring after service process register only if process monitoring this time is dynamic monitoring, obtaining the latest state of each service process through stage frames which are sent constantly during monitoring, and resetting the system by stopping 'dog feeding' once it is found that service processes or the system becomes abnormal. By the adoption of the method and system, refined process monitoring can be achieved, and the requirement for high-quality process monitoring can be met; furthermore, the monitoring method is flexible, resources are saved, and efficiency is high.

Description

Based on become more meticulous process monitoring method and the system of house dog in linux system
Technical field
The present invention relates to the process monitoring technical field of linux system, particularly relate to become more meticulous process monitoring method and system based on house dog in a kind of linux system.
Background technology
In embedded Linux system; owing to the work of processor usually can be subject to the interference from external electromagnetic field; the race causing program flies; and it is absorbed in endless loop; and then make the properly functioning of calling program be interrupted; system cannot work on, and whole system can be caused to be absorbed in dead state, unpredictable consequence even occurs. Therefore, consider for the stability of linux system, it usually needs the process of operations various in linux system is carried out monitor in real time.
In linux system, process monitoring is substantially the message transmission that IPC (Inter-ProcessCommunication, the interprocess communication) mechanism utilizing Linux to provide completes between monitoring process and business process. Existing process monitoring method is generally adopted a kind of static monitoring mode, namely monitoring process is by reading a existing configuration file (this configuration file have recorded the business process of all operations in linux system), start all business process in configuration file are monitored, in monitoring process, monitoring process is by constantly sending signal to each business process, judging whether each business process there is also, if existing, then illustrating that the exceptions such as program fleet does not occur in this business process; If being absent from, then illustrate that this business process has been likely to occur the exceptions such as program fleet.
Although existing monitor mode is simple, but use procedure still exists following defect:
(1) the monitoring fineness of existing monitor mode is not high, whether each business process can only be occurred that the most basic states such as program fleet judge, the monitoring that up-to-date running status in the use state of system and business process etc. more cannot be become more meticulous, it is difficult to meet the high-quality requirement of process monitoring.
(2) in actual applications, some business process cycle of operation is short, use frequency low, for these business process often without carrying out long-term monitor in real time. And in existing monitor mode, monitoring process can only be unified all business process in configuration file are monitored, and monitor mode is dumb, cause a large amount of waste monitoring resource so that overall monitoring efficiency is low.
Summary of the invention
The invention aims to overcome the deficiency of above-mentioned background technology, it is provided that based on become more meticulous process monitoring method and the system of house dog in a kind of linux system, it is possible to realize the process monitoring become more meticulous, meet the high-quality requirement of process monitoring; And monitor mode is flexible, save resource, efficiency height.
For reaching object above, the present invention provides the process monitoring method that becomes more meticulous in a kind of linux system based on house dog, comprises the following steps:
S1: creating monitoring configuration file, monitoring configuration file includes the type of house dog timeout value, static traffic process name and monitor mode, the type of monitor mode is dynamically monitoring or static monitoring, proceeds to S2;
S2: according to the type of monitor mode in monitoring configuration file, it is judged that the monitor mode that this process monitoring adopts is static monitoring or dynamically monitors, if static monitoring, then proceeds to S3; If dynamically monitoring, then proceed to S4;
S3: create static state and feed Canis familiaris L. thread and send thread; Static Canis familiaris L. thread of feeding is when system is normal, and house dog hardware is performed " feeding Canis familiaris L. " operation by timing; When sending thread and regularly sending appointment signal according to the static traffic process name in monitoring configuration file to corresponding business process, according to the return value of each static traffic process, static Canis familiaris L. thread of feeding will judge whether corresponding business process still exists, if, then static Canis familiaris L. thread of feeding continues executing with " feeding Canis familiaris L. " operation, makes system continue properly functioning; Otherwise, static Canis familiaris L. thread of feeding stops performing " feeding Canis familiaris L. " operation, and after dwell time exceedes house dog timeout value, system reboot resets, and terminates;
S4: create and dynamically feed Canis familiaris L. thread and receiving thread; Dynamically feeding Canis familiaris L. thread when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, receiving thread starts to receive the status frames constantly sent by this business process, and is obtained the last state of business process by status frames; Dynamically feed the use monitoring state of the Canis familiaris L. thread last state to business process and system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, make system continue properly functioning; When finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread and then stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets, and terminates.
On the basis of technique scheme, S4 specifically includes following steps:
S401: create and initialize and dynamically feed Canis familiaris L. thread, during initialization, will create communication port and process status information table, and wherein, process status information table, for storing the status information of all monitored business process, proceeds to S402:
S402: dynamically feed Canis familiaris L. thread creation and start receiving thread; After receiving thread starts, dynamically feeding Canis familiaris L. thread and enter " feeding Canis familiaris L. " circulation: when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, the status information of the business process currently registered can be added in process status information table by receiving thread, business process constantly sends status frames to receiving thread by communication port, receiving thread is according to the status frames received, obtain the latest state information of business process, and update process status information table; Dynamically feed Canis familiaris L. thread and make regular check on process status information table the use status information of real-time acquisition system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, make system continue properly functioning; When finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread and then stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets, and terminates.
On the basis of technique scheme, the list item of described process status information table is identical with the structure of described status frames, all includes following field:
Process number: for identifying the business process sending status frames;
Thread number: for sending thread number or the handle of status frames in identification service process;
The termination time: represent that business process sends the largest interval value of status frames, value >=0 next time;
Abnormal coding: for the anomalous sign that identification service process is current, consistent with the errno of system coding;
Information: for the information warning certain thread to go wrong.
On the basis of technique scheme, in S402, when receiving thread updates process status information table every time, the termination time field of process status information table will be reset; Dynamically feed Canis familiaris L. thread when making regular check on process status information table, the operation that the termination time field of process status information table will be subtracted every time; Once the termination time field of a certain business process is kept to 0, then show that this business process occurs abnormal.
On the basis of technique scheme, in S402, when dynamically hello Canis familiaris L. thread makes regular check on process status information table, by the abnormal code field according to process status information table, judge this exception whether can the operation of influential system, if so, then show that business process occurs abnormal; Otherwise, it was shown that exception does not occur in business thread.
On the basis of technique scheme, the content monitoring configuration file described in S1 also includes system free memory lower bound threshold; The use status information of real-time acquisition system described in S402, specifically includes following steps: obtain current Flash utilization rate by df order; Cat/proc/memeinfo order is used to obtain current system memory usage; Cat/proc/stat order is used to obtain current system CPU usage; System described in S402 occurs extremely, specifically including situations below: current Flash utilization rate exceeds 80% beyond 80% or current system memory usage beyond system free memory lower bound threshold or current system CPU usage in monitoring configuration file.
On the basis of technique scheme, when business process described in S402 passes through communication port constantly to receiving thread transmission status frames, the transmission state of status frames will be judged by business process, if transmission process is made mistakes, business process will solve registration from receiving process.
On the basis of technique scheme, in S402, receiving thread, when updating process status information table, will lock up process status information table; In S402, when dynamically hello Canis familiaris L. thread makes regular check on process status information table, will lock up process status information table.
On the basis of technique scheme, described in S3 specify signal be numbered 0 spacing wave.
The present invention also provides for the process monitoring system that becomes more meticulous in a kind of linux system realizing said method based on house dog, including monitoring configuration file creation module, monitor mode judge module, static monitoring module and dynamic monitoring module;
Described monitoring configuration file creation module is used for: create monitoring configuration file, monitoring configuration file includes the type of house dog timeout value, static traffic process name and monitor mode, the type of monitor mode is dynamically monitoring or static monitoring, sends to monitor mode judge module and judges signal;
Described monitor mode judge module is used for: receive after judging signal, according to the type of monitor mode in monitoring configuration file, judge that the monitor mode that this process monitoring adopts is static monitoring or dynamically monitors, if static monitoring, then send static pilot signal to static monitoring module; If dynamically monitoring, then send dynamic pilot signal to dynamic monitoring module;
Described static monitoring module is used for: after receiving static pilot signal, creates static state and feeds Canis familiaris L. thread and send thread; Static Canis familiaris L. thread of feeding is when system is normal, and house dog hardware is performed " feeding Canis familiaris L. " operation by timing; When sending thread and regularly sending appointment signal according to the static traffic process name in monitoring configuration file to corresponding business process, according to the return value of each static traffic process, static Canis familiaris L. thread of feeding will judge whether corresponding business process still exists, if, then static Canis familiaris L. thread of feeding continues executing with " feeding Canis familiaris L. " operation, makes system continue properly functioning; Otherwise, static Canis familiaris L. thread of feeding stops performing " feeding Canis familiaris L. " operation, and after dwell time exceedes house dog timeout value, system reboot resets;
Described dynamic monitoring module is used for: after receiving dynamic pilot signal, creates and dynamically feeds Canis familiaris L. thread and receiving thread; Dynamically feeding Canis familiaris L. thread when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, receiving thread starts to receive the status frames constantly sent by this business process, and is obtained the last state of business process by status frames; Dynamically feed the use monitoring state of the Canis familiaris L. thread last state to business process and system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, system is made to continue properly functioning, when finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread then to stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets.
Compared with prior art, advantages of the present invention is as follows:
(1), compared with the static monitor mode of employing simple with prior art, the present invention adds dynamic monitor mode on the basis of general static monitor mode. In dynamic monitor mode, utilize the status frames that business process constantly sends to obtain the last state of this business process; By the use monitoring state to the last state of business process and system, thus realizing the process monitoring more become more meticulous, it is possible to meet the high-quality requirement of process monitoring.
(2) in the present invention, according to the user demand of user, the mode of process monitoring can be selected so that monitor mode is more flexible, had both remained traditional static monitor mode, and had added again the dynamic monitor mode that motility is higher; Dynamic monitor mode passes through login mechanism, reaches a kind of passive monitoring state (only at a certain business process by registering, after request is monitored, just this business process is monitored). Compared with all business process must being carried out unified monitoring with prior art, this passive monitoring state avoids situation that is cycle of operation is short, that use the business process that frequency is low to carry out actively monitoring, effectively reduce the waste of monitoring resource, improve the monitoring efficiency of entirety.
(3) what the static monitor mode of the present invention and dynamic monitor mode were based on that watchdog technique realizes system restarts reset. Once it is abnormal to find that business process or system occur, just stops house dog hardware performs " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot just can be made to reset, hardware-level ensure that the effectiveness of process monitoring and stability.
Accompanying drawing explanation
Fig. 1 be in the embodiment of the present invention in linux system based on the flow chart of the process monitoring method that becomes more meticulous of house dog;
Fig. 2 is the sequential chart of dynamically monitoring in the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
Shown in Figure 1, the embodiment of the present invention provides the process monitoring method that becomes more meticulous in a kind of linux system based on house dog, comprises the following steps:
S1: create monitoring configuration file, monitoring configuration file includes system free memory lower bound threshold, house dog timeout value, the type (dynamically monitoring or static monitoring) of monitor mode and static traffic process name, proceeds to S2.
During concrete operations, the form monitoring configuration file described in S1 is " configuration item=entry value ", and wherein, system free memory lower bound threshold value configuration item min-memory represents; House dog timeout value configuration item watchdog-timeout represents; The type configuration item type of monitor mode represents; Static traffic process name configuration item task represents.
S2: according to the type of monitor mode in monitoring configuration file, it is judged that the monitor mode that this process monitoring adopts is static monitoring or dynamically monitors, if static monitoring, then proceeds to S3; If dynamically monitoring, then proceed to S4.
S3: create static state and feed Canis familiaris L. thread and send thread; Static Canis familiaris L. thread of feeding is when system is normal, and house dog hardware is performed " feeding Canis familiaris L. " operation by timing; When sending thread and regularly sending appointment signal according to the static traffic process name in monitoring configuration file to corresponding business process, according to the return value of each static traffic process, static Canis familiaris L. thread of feeding will judge whether corresponding business process still exists, if, then static Canis familiaris L. thread of feeding continues executing with " feeding Canis familiaris L. " operation, makes system continue properly functioning; Otherwise, static Canis familiaris L. thread of feeding stops performing " feeding Canis familiaris L. " operation, and after dwell time exceedes house dog timeout value, system reboot resets, and terminates.
During concrete operations, described in S3 specify signal be numbered 0 spacing wave (being defined by POSIX.1 standard).
S4: create and dynamically feed Canis familiaris L. thread and receiving thread; Dynamically feeding Canis familiaris L. thread when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, receiving thread starts to receive the status frames constantly sent by this business process, and is obtained the last state of business process by status frames; Dynamically feed the use monitoring state of the Canis familiaris L. thread last state to business process and system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, make system continue properly functioning; When finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread and then stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets, and terminates.
House dog hardware is commonly called as " house dog " (watchdog), and house dog can be that processor is built-in, it is also possible to processor is external, and the present embodiment adopts external house dog.
House dog makes system can realize continuous operation under unmanned state, its operation principle is: watchdog chip is connected with an I/O pin of processor, by programme-control, it periodically sends into signal specific (pulse or low and high level) to this I/O pin on this pin of house dog, this program is to be arranged in other control programs dispersedly, being absorbed in a certain program segment after causing program fleet once processor due to interference and enter endless loop state, the program (being commonly called as " feeding Canis familiaris L. ") writing house dog pin just can not be performed. This time, watchdog chip is owing to can not get the signal that processor is sent here, just the pin being connected with processor reset pin at it sends a reset signal, processor is made to reset, namely program starts to perform from the original position of program storage, so just achieves the automatic recovery of processor.
The core of house dog is one, and from the enumerator that subtracts, after house dog arranges initial value, enumerator starts from subtracting, if it exceeds house dog timeout value does not also go to feed Canis familiaris L., then house dog enumerator will reduce to 0 thus causing house dog to interrupt, and causes system reset.
During practical operation, shown in Figure 2, S4 specifically includes following steps:
S401: create and initialize and dynamically feed Canis familiaris L. thread, during initialization, will create communication port and process status information table, and wherein, process status information table, for storing the status information of all monitored business process, proceeds to S402:
The communication port of the present invention adopts message queue or pipeline, and namely the transmission of status frames is realized by message queue or pipeline. Using the chained list that can be regarded as a message during message queue, message is regarded as a record by it, and this record has specific form and specific priority. The process that message queue has write permission can add new information according to certain rule; The process that message queue has read right then can read message from message queue. The advantage of the transmission mode of this message is: sender need not wait that the message that recipient checks that it receives just can work on down, and recipient without receive message also without waiting for.
S402: dynamically feed Canis familiaris L. thread creation and start receiving thread; After receiving thread starts, dynamically feeding Canis familiaris L. thread and enter " feeding Canis familiaris L. " circulation: when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, the status information of the business process currently registered can be added in process status information table by receiving thread, business process constantly sends status frames to receiving thread by communication port, receiving thread is according to the status frames received, obtain the latest state information of business process, and update process status information table; Dynamically feed Canis familiaris L. thread and make regular check on process status information table the use status information of real-time acquisition system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, make system continue properly functioning; When finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread and then stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets, and terminates.
Wherein, process status information table can adopt chained list or storage of array; Table 1 is process status information table list item, and table 2 is status frames structure:
Table 1, process status information table list item
Process number Thread number The termination time Exception code Information
4Bytes 4Bytes 4Bytes 4Bytes (1-16)Bytes
Table 2, status frames structure
Process number Thread number The termination time Exception code Information
4Bytes 4Bytes 4Bytes 4Bytes (1-16)Bytes
By table 1, table 2 it can be seen that the list item of process status information table and the structure of status frames are identical, all include following field:
Process number: for identifying the business process sending status frames;
Thread number: for sending the thread number (or handle) of status frames in identification service process; By knowing that this information field can will be apparent from, which the concrete thread sending status frames is, that is, once certain business process occurs abnormal, we can pass through this field and find concrete corresponding thread number smoothly, it is achieved thereby that the thread-level of business process is monitored so that monitoring is more fine, accurate, is more beneficial for abnormal location and investigation;
The termination time: represent that business process sends the largest interval value of status frames, value >=0 next time;
Abnormal coding: for the anomalous sign that identification service process is current, consistent with the errno of system coding;
Information: for the information warning certain thread to go wrong.
Wherein, process number, thread number, termination time, abnormal coding all use and hold greatly syllable sequence (network bytes sequence). Information use ASCII character (the corresponding UTF-8 of ascii character-set is compatible, thus the appendix A PI character limited be concentrated use in UTF-8 and ASCII can), be absent from big small end problem, send according to the order of sequence.
More specifically, in S402, when receiving thread updates process status information table every time, the termination time field of process status information table will be reset; Dynamically feeding Canis familiaris L. thread when making regular check on process status information table, by subtracting one to the termination time field of process status information table, (renewals) operates every time; Once the termination time field of a certain business process is kept to 0, then show that abnormal (exceed schedule time and do not receive status frames, then it is assumed that business process " has been died " or run and flown) occurs in this business process.
In addition, in S402, dynamically feed Canis familiaris L. thread when making regular check on process status information table, by the abnormal code field according to process status information table, it is judged that this abnormal whether can the operation of influential system, if so, then show that business process occurs abnormal; Otherwise, it was shown that exception does not occur in business thread.
Further, the use status information of real-time acquisition system described in S402, specifically include following steps: obtain current Flash utilization rate by df order; Cat/proc/memeinfo order is used to obtain current system memory usage; Cat/proc/stat order is used to obtain current system CPU usage. System described in S402 occurs extremely, specifically including situations below: current Flash utilization rate exceeds 80% beyond 80% or current system memory usage beyond system free memory lower bound threshold or current system CPU usage in monitoring configuration file.
In order to improve effectiveness and the reliability of whole monitoring process, when business process described in S402 passes through communication port constantly to receiving thread transmission status frames, the transmission state of status frames will be judged by business process, if transmission process is made mistakes, business process will solve registration from receiving process.
In practical operation, receiving thread and dynamically hello Canis familiaris L. thread are when accessing process status information table, it may occur however that thread is competed. In order to avoid race problem, receiving thread and dynamically hello Canis familiaris L. thread need when being written and read process status information table to use mutual exclusion lock. Specifically, shown in Figure 2, in S402, receiving thread, when updating process status information table, will lock up process status information table; In S402, when dynamically hello Canis familiaris L. thread makes regular check on process status information table, will lock up process status information table.
The embodiment of the present invention also provides for the process monitoring system that becomes more meticulous in a kind of linux system realizing said method based on house dog, including monitoring configuration file creation module, monitor mode judge module, static monitoring module and dynamic monitoring module;
Wherein, monitoring configuration file creation module is used for: create monitoring configuration file, monitoring configuration file includes the type of house dog timeout value, static traffic process name and monitor mode, and the type of monitor mode is dynamically monitoring or static monitoring, sends to monitor mode judge module and judges signal;
Monitor mode judge module is used for: receive after judging signal, according to the type of monitor mode in monitoring configuration file, judge that the monitor mode that this process monitoring adopts is static monitoring or dynamically monitors, if static monitoring, then send static pilot signal to static monitoring module; If dynamically monitoring, then send dynamic pilot signal to dynamic monitoring module;
Static monitoring module is used for: after receiving static pilot signal, creates static state and feeds Canis familiaris L. thread and send thread; Static Canis familiaris L. thread of feeding is when system is normal, and house dog hardware is performed " feeding Canis familiaris L. " operation by timing; When sending thread and regularly sending appointment signal according to the static traffic process name in monitoring configuration file to corresponding business process, according to the return value of each static traffic process, static Canis familiaris L. thread of feeding will judge whether corresponding business process still exists, if, then static Canis familiaris L. thread of feeding continues executing with " feeding Canis familiaris L. " operation, makes system continue properly functioning; Otherwise, static Canis familiaris L. thread of feeding stops performing " feeding Canis familiaris L. " operation, and after dwell time exceedes house dog timeout value, system reboot resets;
Dynamic monitoring module is used for: after receiving dynamic pilot signal, creates and dynamically feeds Canis familiaris L. thread and receiving thread; Dynamically feeding Canis familiaris L. thread when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, receiving thread starts to receive the status frames constantly sent by this business process, and is obtained the last state of business process by status frames; Dynamically feed the use monitoring state of the Canis familiaris L. thread last state to business process and system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, system is made to continue properly functioning, when finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread then to stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets.
Specifically, this dynamic monitoring module includes initial setting up submodule and monitoring management submodule;
Initial setting up submodule is used for: creates and initializes and dynamically feeds Canis familiaris L. thread, during initialization, create communication port and process status information table, wherein, process status information table, for storing the status information of all monitored business process, sends monitoring management signal to monitoring management submodule:
Monitoring management submodule is used for: after receiving monitoring management signal, controls dynamically feed Canis familiaris L. thread creation and start receiving thread; After receiving thread starts, dynamically feeding Canis familiaris L. thread and enter " feeding Canis familiaris L. " circulation: when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, the status information of the business process currently registered can be added in process status information table by receiving thread, business process constantly sends status frames to receiving thread by communication port, receiving thread is according to the status frames received, obtain the latest state information of business process, and update process status information table; Dynamically feed Canis familiaris L. thread and make regular check on process status information table the use status information of real-time acquisition system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, make system continue properly functioning; When finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread and then stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets.
The present invention is not limited to above-mentioned embodiment, for those skilled in the art, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, and these improvements and modifications are also considered as within protection scope of the present invention. The content not being described in detail in this specification belongs to the known prior art of professional and technical personnel in the field.

Claims (10)

1. based on the process monitoring method that becomes more meticulous of house dog in a linux system, it is characterised in that comprise the following steps:
S1: creating monitoring configuration file, monitoring configuration file includes the type of house dog timeout value, static traffic process name and monitor mode, the type of monitor mode is dynamically monitoring or static monitoring, proceeds to S2;
S2: according to the type of monitor mode in monitoring configuration file, it is judged that the monitor mode that this process monitoring adopts is static monitoring or dynamically monitors, if static monitoring, then proceeds to S3; If dynamically monitoring, then proceed to S4;
S3: create static state and feed Canis familiaris L. thread and send thread; Static Canis familiaris L. thread of feeding is when system is normal, and house dog hardware is performed " feeding Canis familiaris L. " operation by timing; When sending thread and regularly sending appointment signal according to the static traffic process name in monitoring configuration file to corresponding business process, according to the return value of each static traffic process, static Canis familiaris L. thread of feeding will judge whether corresponding business process still exists, if, then static Canis familiaris L. thread of feeding continues executing with " feeding Canis familiaris L. " operation, makes system continue properly functioning; Otherwise, static Canis familiaris L. thread of feeding stops performing " feeding Canis familiaris L. " operation, and after dwell time exceedes house dog timeout value, system reboot resets, and terminates;
S4: create and dynamically feed Canis familiaris L. thread and receiving thread; Dynamically feeding Canis familiaris L. thread when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, receiving thread starts to receive the status frames constantly sent by this business process, and is obtained the last state of business process by status frames; Dynamically feed the use monitoring state of the Canis familiaris L. thread last state to business process and system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, make system continue properly functioning; When finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread and then stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets, and terminates.
2. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 1, it is characterised in that S4 specifically includes following steps:
S401: create and initialize and dynamically feed Canis familiaris L. thread, during initialization, will create communication port and process status information table, and wherein, process status information table, for storing the status information of all monitored business process, proceeds to S402:
S402: dynamically feed Canis familiaris L. thread creation and start receiving thread; After receiving thread starts, dynamically feeding Canis familiaris L. thread and enter " feeding Canis familiaris L. " circulation: when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, the status information of the business process currently registered can be added in process status information table by receiving thread, business process constantly sends status frames to receiving thread by communication port, receiving thread is according to the status frames received, obtain the latest state information of business process, and update process status information table; Dynamically feed Canis familiaris L. thread and make regular check on process status information table the use status information of real-time acquisition system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, make system continue properly functioning; When finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread and then stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets, and terminates.
3. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 2, it is characterised in that: the list item of described process status information table is identical with the structure of described status frames, all includes following field:
Process number: for identifying the business process sending status frames;
Thread number: for sending thread number or the handle of status frames in identification service process;
The termination time: represent that business process sends the largest interval value of status frames, value >=0 next time;
Abnormal coding: for the anomalous sign that identification service process is current, consistent with the errno of system coding;
Information: for the information warning certain thread to go wrong.
4. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 3, it is characterised in that: in S402, when receiving thread updates process status information table every time, the termination time field of process status information table will be reset; Dynamically feed Canis familiaris L. thread when making regular check on process status information table, the operation that the termination time field of process status information table will be subtracted every time; Once the termination time field of a certain business process is kept to 0, then show that this business process occurs abnormal.
5. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 3, it is characterized in that: in S402, when dynamically hello Canis familiaris L. thread makes regular check on process status information table, by the abnormal code field according to process status information table, judge that this exception whether can the operation of influential system, if so, then show that business process occurs abnormal; Otherwise, it was shown that exception does not occur in business thread.
6. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 2, it is characterised in that: the content monitoring configuration file described in S1 also includes system free memory lower bound threshold;
The use status information of real-time acquisition system described in S402, specifically includes following steps: obtain current Flash utilization rate by df order; Cat/proc/memeinfo order is used to obtain current system memory usage; Cat/proc/stat order is used to obtain current system CPU usage;
System described in S402 occurs extremely, specifically including situations below: current Flash utilization rate exceeds 80% beyond 80% or current system memory usage beyond system free memory lower bound threshold or current system CPU usage in monitoring configuration file.
7. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 2, it is characterized in that: when business process described in S402 passes through communication port constantly to receiving thread transmission status frames, the transmission state of status frames will be judged by business process, if transmission process is made mistakes, business process will solve registration from receiving process.
8. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 2, it is characterised in that: in S402, receiving thread, when updating process status information table, will lock up process status information table; In S402, when dynamically hello Canis familiaris L. thread makes regular check on process status information table, will lock up process status information table.
9. based on the process monitoring method that becomes more meticulous of house dog in linux system as claimed in claim 1, it is characterised in that: specify described in S3 signal be numbered 0 spacing wave.
10. based on the process monitoring system that becomes more meticulous of house dog in the linux system realizing method described in claim 1, it is characterised in that: include monitoring configuration file creation module, monitor mode judge module, static monitoring module and dynamic monitoring module;
Described monitoring configuration file creation module is used for: create monitoring configuration file, monitoring configuration file includes the type of house dog timeout value, static traffic process name and monitor mode, the type of monitor mode is dynamically monitoring or static monitoring, sends to monitor mode judge module and judges signal;
Described monitor mode judge module is used for: receive after judging signal, according to the type of monitor mode in monitoring configuration file, judge that the monitor mode that this process monitoring adopts is static monitoring or dynamically monitors, if static monitoring, then send static pilot signal to static monitoring module; If dynamically monitoring, then send dynamic pilot signal to dynamic monitoring module;
Described static monitoring module is used for: after receiving static pilot signal, creates static state and feeds Canis familiaris L. thread and send thread; Static Canis familiaris L. thread of feeding is when system is normal, and house dog hardware is performed " feeding Canis familiaris L. " operation by timing; When sending thread and regularly sending appointment signal according to the static traffic process name in monitoring configuration file to corresponding business process, according to the return value of each static traffic process, static Canis familiaris L. thread of feeding will judge whether corresponding business process still exists, if, then static Canis familiaris L. thread of feeding continues executing with " feeding Canis familiaris L. " operation, makes system continue properly functioning; Otherwise, static Canis familiaris L. thread of feeding stops performing " feeding Canis familiaris L. " operation, and after dwell time exceedes house dog timeout value, system reboot resets;
Described dynamic monitoring module is used for: after receiving dynamic pilot signal, creates and dynamically feeds Canis familiaris L. thread and receiving thread; Dynamically feeding Canis familiaris L. thread when system is normal, house dog hardware is performed " feeding Canis familiaris L. " operation by timing; After having business process to be registered to receiving thread, receiving thread starts to receive the status frames constantly sent by this business process, and is obtained the last state of business process by status frames; Dynamically feed the use monitoring state of the Canis familiaris L. thread last state to business process and system, when business process and system are all normal, dynamically feed Canis familiaris L. thread and continue executing with " feeding Canis familiaris L. " operation, system is made to continue properly functioning, when finding business process or system occurs abnormal, dynamically feeding Canis familiaris L. thread then to stop performing " feeding Canis familiaris L. " operation, after dwell time exceedes house dog timeout value, system reboot resets.
CN201610007790.5A 2016-01-07 2016-01-07 Fining process monitoring method and system in linux system based on house dog Active CN105677501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610007790.5A CN105677501B (en) 2016-01-07 2016-01-07 Fining process monitoring method and system in linux system based on house dog

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610007790.5A CN105677501B (en) 2016-01-07 2016-01-07 Fining process monitoring method and system in linux system based on house dog

Publications (2)

Publication Number Publication Date
CN105677501A true CN105677501A (en) 2016-06-15
CN105677501B CN105677501B (en) 2019-01-29

Family

ID=56299165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610007790.5A Active CN105677501B (en) 2016-01-07 2016-01-07 Fining process monitoring method and system in linux system based on house dog

Country Status (1)

Country Link
CN (1) CN105677501B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133167A (en) * 2017-04-24 2017-09-05 北京北信源软件股份有限公司 The abnormal method and device of real-time monitoring process under a kind of linux system
CN107515796A (en) * 2017-07-31 2017-12-26 北京奇安信科技有限公司 A kind of unit exception monitor processing method and device
CN107623829A (en) * 2017-08-30 2018-01-23 中国航空无线电电子研究所 A kind of file management method in video recording apparatus
WO2018040999A1 (en) * 2016-08-29 2018-03-08 华为技术有限公司 Method and apparatus for processing process
CN107844312A (en) * 2017-11-06 2018-03-27 深圳市新国都技术股份有限公司 A kind of software upgrading monitoring method and system
CN108304275A (en) * 2018-01-09 2018-07-20 福州瑞芯微电子股份有限公司 A kind of method and apparatus of detection Android system application layer exception
CN108415806A (en) * 2018-02-07 2018-08-17 深圳市亿联智能有限公司 A kind of high efficiency thread life monitoring mode
CN108694093A (en) * 2017-04-06 2018-10-23 迈普通信技术股份有限公司 Process exception monitoring method and device
CN108762967A (en) * 2018-05-30 2018-11-06 宁波市标准化研究院 Software watchdog implementation method for monitoring Web service in linux system
CN109697075A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 File updating method, system and device
CN109891392A (en) * 2017-09-30 2019-06-14 华为技术有限公司 A kind of processing method and processing device of system service time-out
CN110032487A (en) * 2018-11-09 2019-07-19 阿里巴巴集团控股有限公司 Keep Alive supervision method, apparatus and electronic equipment
CN110912785A (en) * 2019-12-26 2020-03-24 联陆智能交通科技(上海)有限公司 RSU health detection method and system
CN111078441A (en) * 2018-10-19 2020-04-28 迈普通信技术股份有限公司 System running state monitoring method and device and electronic equipment
CN111856991A (en) * 2020-06-22 2020-10-30 北京遥测技术研究所 Signal processing system and method with five-level protection on single event upset
CN112346906A (en) * 2019-08-08 2021-02-09 丰鸟航空科技有限公司 Unmanned aerial vehicle daemon processing method, device, equipment and storage medium
CN112749038A (en) * 2021-01-26 2021-05-04 北京中电兴发科技有限公司 Method and system for realizing software watchdog in software system
CN113778724A (en) * 2021-05-17 2021-12-10 北京科益虹源光电技术有限公司 Method and device for shielding watchdog
CN112346906B (en) * 2019-08-08 2024-08-02 重庆丰鸟无人机研究院有限公司 Unmanned aerial vehicle daemon processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020091963A1 (en) * 1998-02-23 2002-07-11 Sun Albert C. Fault-tolerant architecture for in-circuit programming
CN101311910A (en) * 2008-06-27 2008-11-26 北京星网锐捷网络技术有限公司 Hardware reset control method and apparatus
CN101739305A (en) * 2010-02-09 2010-06-16 太仓市同维电子有限公司 Operating system kernel level real-time dongle monitoring device and monitoring method thereof
CN103885847A (en) * 2014-02-08 2014-06-25 京信通信系统(中国)有限公司 Dog feeding method and device based on embedded system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020091963A1 (en) * 1998-02-23 2002-07-11 Sun Albert C. Fault-tolerant architecture for in-circuit programming
CN101311910A (en) * 2008-06-27 2008-11-26 北京星网锐捷网络技术有限公司 Hardware reset control method and apparatus
CN101739305A (en) * 2010-02-09 2010-06-16 太仓市同维电子有限公司 Operating system kernel level real-time dongle monitoring device and monitoring method thereof
CN103885847A (en) * 2014-02-08 2014-06-25 京信通信系统(中国)有限公司 Dog feeding method and device based on embedded system

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018040999A1 (en) * 2016-08-29 2018-03-08 华为技术有限公司 Method and apparatus for processing process
US10983825B2 (en) 2016-08-29 2021-04-20 Huawei Technologies Co., Ltd. Processing for multiple containers are deployed on the physical machine
CN108694093A (en) * 2017-04-06 2018-10-23 迈普通信技术股份有限公司 Process exception monitoring method and device
CN107133167A (en) * 2017-04-24 2017-09-05 北京北信源软件股份有限公司 The abnormal method and device of real-time monitoring process under a kind of linux system
CN107515796A (en) * 2017-07-31 2017-12-26 北京奇安信科技有限公司 A kind of unit exception monitor processing method and device
CN107515796B (en) * 2017-07-31 2020-08-25 奇安信科技集团股份有限公司 Equipment abnormity monitoring processing method and device
CN107623829B (en) * 2017-08-30 2020-07-07 中国航空无线电电子研究所 File management method in video recording equipment
CN107623829A (en) * 2017-08-30 2018-01-23 中国航空无线电电子研究所 A kind of file management method in video recording apparatus
US11693701B2 (en) 2017-09-30 2023-07-04 Huawei Technologies Co., Ltd. System service timeout processing method, and apparatus
CN109891392A (en) * 2017-09-30 2019-06-14 华为技术有限公司 A kind of processing method and processing device of system service time-out
CN109697075A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 File updating method, system and device
CN107844312A (en) * 2017-11-06 2018-03-27 深圳市新国都技术股份有限公司 A kind of software upgrading monitoring method and system
CN108304275A (en) * 2018-01-09 2018-07-20 福州瑞芯微电子股份有限公司 A kind of method and apparatus of detection Android system application layer exception
CN108415806A (en) * 2018-02-07 2018-08-17 深圳市亿联智能有限公司 A kind of high efficiency thread life monitoring mode
CN108762967A (en) * 2018-05-30 2018-11-06 宁波市标准化研究院 Software watchdog implementation method for monitoring Web service in linux system
CN111078441A (en) * 2018-10-19 2020-04-28 迈普通信技术股份有限公司 System running state monitoring method and device and electronic equipment
CN110032487A (en) * 2018-11-09 2019-07-19 阿里巴巴集团控股有限公司 Keep Alive supervision method, apparatus and electronic equipment
CN112346906A (en) * 2019-08-08 2021-02-09 丰鸟航空科技有限公司 Unmanned aerial vehicle daemon processing method, device, equipment and storage medium
CN112346906B (en) * 2019-08-08 2024-08-02 重庆丰鸟无人机研究院有限公司 Unmanned aerial vehicle daemon processing method, device, equipment and storage medium
CN110912785A (en) * 2019-12-26 2020-03-24 联陆智能交通科技(上海)有限公司 RSU health detection method and system
CN111856991A (en) * 2020-06-22 2020-10-30 北京遥测技术研究所 Signal processing system and method with five-level protection on single event upset
CN111856991B (en) * 2020-06-22 2021-11-16 北京遥测技术研究所 Signal processing system and method with five-level protection on single event upset
CN112749038A (en) * 2021-01-26 2021-05-04 北京中电兴发科技有限公司 Method and system for realizing software watchdog in software system
CN112749038B (en) * 2021-01-26 2023-03-10 北京中电兴发科技有限公司 Method and system for realizing software watchdog in software system
CN113778724A (en) * 2021-05-17 2021-12-10 北京科益虹源光电技术有限公司 Method and device for shielding watchdog
CN113778724B (en) * 2021-05-17 2024-03-22 北京科益虹源光电技术有限公司 Method and device for shielding watchdog

Also Published As

Publication number Publication date
CN105677501B (en) 2019-01-29

Similar Documents

Publication Publication Date Title
CN105677501A (en) Refined process monitoring method and system based on watchdog in Linux system
CN101799751B (en) Method for building monitoring agent software of host machine
CN101464811B (en) Multitask monitoring management system
CN100555228C (en) A kind of method for supervising of embedded LINUX applications progress
CN113179227B (en) AT instruction control method based on queue
CN106775659A (en) Embedded dual core Flight Control Software framework method based on high speed Linkport interfaces
US8843930B2 (en) Thread scheduling and control framework
CN102750192B (en) A kind of method and apparatus of datum plane abnormality detection
CN101464810A (en) Service program processing method and server
CN102902589A (en) Method for managing and scheduling cluster MIS (Many Integrated Core) job
CN106708617A (en) Service-based application process keep-alive system and keep-alive method
CN103927180A (en) Implementation method for functional plug-ins based on Android system and system of functional plug-ins based on Android system
CN108536531B (en) Task scheduling and power management method based on single chip microcomputer
CN102073572A (en) Monitoring method for multi-core processor and system thereof
CN104899274A (en) High-efficiency remote in-memory database access method
CN104683472A (en) Data transmission method supporting large data volume
CN104320317A (en) Method and device for transmitting state of Ethernet physical layer chip
CN109947576B (en) Method for managing internal agent program of virtual machine
CN114153783B (en) Method, system, computer device and storage medium for implementing multi-core communication mechanism
CN103632331B (en) A kind of data service method of configurable plug-in type
CN103297477B (en) A kind of data acquisition reporting system and data processing method and proxy server
CN108121730B (en) Device and method for quickly synchronizing data update to service system
CN107329842B (en) Method and terminal for acquiring data based on Qt signal slot mechanism
CN111078441A (en) System running state monitoring method and device and electronic equipment
CN109413185A (en) A kind of equipment routing inspection system and its Cloud Server design method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20180322

Address after: 430074 Wuhan, Hongshan Province District Road, Department of mail, No. 88 hospital

Applicant after: Fenghuo Communication Science &. Technology Co., Ltd.

Applicant after: WUHAN FIBERHOME TECHNICAL SERVICES CO., LTD.

Address before: 430074 East Lake Development Zone, Hubei, Optics Valley Venture Street, No. 67, No.

Applicant before: Fenghuo Communication Science &. Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant