CN110795264A - Monitoring management method and system and intelligent management terminal - Google Patents

Monitoring management method and system and intelligent management terminal Download PDF

Info

Publication number
CN110795264A
CN110795264A CN201910973766.0A CN201910973766A CN110795264A CN 110795264 A CN110795264 A CN 110795264A CN 201910973766 A CN201910973766 A CN 201910973766A CN 110795264 A CN110795264 A CN 110795264A
Authority
CN
China
Prior art keywords
application program
service
configuration information
business
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910973766.0A
Other languages
Chinese (zh)
Inventor
许辉
顾林飞
李卫华
卢胜
李双全
朱程鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Haixing Power Grid Technology Co Ltd
Hangzhou Hexing Electrical Co Ltd
Ningbo Henglida Technology Co Ltd
Original Assignee
Nanjing Haixing Power Grid Technology Co Ltd
Hangzhou Hexing Electrical Co Ltd
Ningbo Henglida Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Haixing Power Grid Technology Co Ltd, Hangzhou Hexing Electrical Co Ltd, Ningbo Henglida Technology Co Ltd filed Critical Nanjing Haixing Power Grid Technology Co Ltd
Priority to CN201910973766.0A priority Critical patent/CN110795264A/en
Publication of CN110795264A publication Critical patent/CN110795264A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a monitoring management method, which comprises the following steps: acquiring configuration information of application programs to be monitored, wherein the number of the application programs to be monitored is at least 2; monitoring the running state of the application program to be monitored according to the configuration information to obtain a monitoring result; and judging whether the application program to be monitored is abnormal in operation according to the monitoring result, and restarting the application program to be monitored which is abnormal in operation. The method provided by the invention is used for a scene that a plurality of application programs run simultaneously, each application program is monitored, and the application program which runs abnormally is found in time and restarted, so that the application program can be recovered to run normally; at the moment, only the abnormal application program is restarted independently, and other application programs running normally are not affected and can still work normally.

Description

Monitoring management method and system and intelligent management terminal
Technical Field
The invention relates to the technical field of management terminals, in particular to a monitoring management method and system and an intelligent management terminal.
Background
Nowadays, intelligent management terminals are widely used in various scenes of life and work of people; however, only one application program is often operated in the existing intelligent management terminal, and taking the intelligent management terminal for meter reading as an example, the application program generally comprises a main thread, a database thread, a remote communication thread, a local meter reading thread, a log thread and the like; in actual use, if a certain thread is abnormal, the abnormal operation information is recorded and the whole machine is controlled to restart, so that if one thread is abnormal, the whole machine cannot work normally;
in view of the above, further improvements to the prior art are needed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a monitoring management method and system and an intelligent management terminal.
In order to solve the technical problem, the invention is solved by the following technical scheme:
a monitoring management method comprises the following steps:
acquiring configuration information of application programs to be monitored, wherein the number of the application programs to be monitored is at least 2;
monitoring the running state of the application program to be monitored according to the configuration information to obtain a monitoring result; and judging whether the application program to be monitored is abnormal in operation according to the monitoring result, and restarting the application program to be monitored which is abnormal in operation.
As an embodiment:
the application program to be monitored comprises a master control application program and a business application program, and the business application program is used for providing business service;
the configuration information comprises service application program configuration information and watchdog configuration information, wherein the service application program configuration information is used for identifying a corresponding service application program, and the watchdog configuration information comprises a zero clearing period and a counting threshold;
the master control application program periodically queries the running state of the corresponding business application program based on the business application program configuration information to obtain a query result, judges that the corresponding business application program runs abnormally when the query result is response overtime or abnormal running, and restarts the business application program;
the main control application program counts and clears the hardware watchdog according to the clear period and enables the hardware watchdog to count again; and simultaneously monitoring the count of the hardware watchdog, judging that the operation is abnormal when the count of the hardware watchdog exceeds the count threshold value, and restarting the hardware watchdog.
As an implementable embodiment, the main control application periodically queries the running state of the corresponding service application based on the configuration information of the service application to obtain a query result, when the query result is response timeout or running abnormality, it is determined that the corresponding service application runs abnormally, and the specific step of restarting the service application is as follows:
the main control application program queries the running state of the corresponding service application program based on the MQTT protocol and the service application program configuration information according to a preset query period;
when the running state is obtained and is abnormal, the master control application program generates and records abnormal running information and restarts the corresponding service application program;
and when the running state is not obtained and the query time exceeds a preset timeout threshold value, the main control application program judges that the query is overtime, generates and records running abnormal information, and restarts the corresponding service application program.
As an implementable scheme, the specific steps of querying the running state of the corresponding service application program based on the MQTT protocol and the service application program configuration information are as follows:
the business application program comprises a main business thread, other business threads and an MQTT receiving thread;
acquiring and recording thread information of a main service thread and other service threads, wherein the thread information comprises a thread ID, a thread name, an update cycle limit value and the latest running time, judging the running state of the corresponding thread according to the update cycle limit value and the latest running time, and judging the running state of the corresponding thread to be running abnormity when the difference value of the latest running time and the current time exceeds the update cycle limit value;
the main control application program sends a query instruction to the corresponding business application program based on the MQTT protocol and the business application program configuration information, the MQTT receiving thread receives the query instruction, the running states of the main business thread and other business threads are counted based on the query instruction, a counting result is obtained, and the counting result is returned through the MQTT receiving thread.
As an embodiment:
the number of business applications is at least one.
As an embodiment:
the business applications include database applications, telecommunications applications, meter reading applications, logging applications, and/or event alert applications.
The invention also provides a monitoring management system, which comprises a main control subsystem and at least one service subsystem:
the business subsystem is used for providing business services;
the master control subsystem comprises:
the acquisition module is used for acquiring configuration information of the service subsystem and the main control subsystem;
the monitoring module is used for monitoring the running states of the service subsystem and the main control subsystem according to the configuration information to obtain a monitoring result;
and the restarting module is used for judging whether the service subsystem and the main control subsystem operate abnormally according to the monitoring result and restarting the abnormally operated service subsystem and/or the abnormally operated main control subsystem.
As an implementable embodiment:
the configuration information comprises service application program configuration information and watchdog configuration information, wherein the service application program configuration information is used for identifying a corresponding service subsystem, and the watchdog configuration information comprises a zero clearing period and a counting threshold;
the monitoring module comprises a service monitoring unit and a watchdog monitoring unit, and the restarting module comprises a service restarting unit and a main control restarting unit;
the service monitoring unit is used for periodically inquiring the running state of the corresponding service subsystem based on the configuration information of the service application program to obtain an inquiry result;
the watchdog monitoring unit is used for counting and resetting the hardware watchdog according to the reset period, recounting the hardware watchdog and monitoring the count of the hardware watchdog;
the service restarting unit is used for judging that the corresponding service subsystem operates abnormally when the query result is response overtime or abnormal operation, and restarting the service subsystem;
and the master control restarting unit is used for judging abnormal operation when the count of the hardware watchdog exceeds the count threshold value and restarting the master control subsystem.
The invention also provides an intelligent management terminal, which comprises a memory, a processor and an application program which is stored on the memory and can be operated on the processor;
the application programs include a master application program and at least one business application program, and the master application program realizes the steps of any one of the methods when being executed by a processor.
The invention also proposes a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods described above.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:
1. the method provided by the invention is used for a scene that a plurality of application programs run simultaneously, each application program is monitored, and the application program which runs abnormally is found in time and restarted, so that the application program can be recovered to run normally; at the moment, only the abnormal application program is restarted independently, and other application programs running normally are not affected and can still work normally.
2. The invention monitors and restarts the running state of the main control application program through a hardware watchdog by designing the configuration information, and inquires the running state of each service application program through the configuration information of the service application program so as to restart the abnormal service application program in time, thereby providing stable service for users.
3. The invention generates and records the operation abnormal information according to the query result so as to facilitate the follow-up staff to trace the abnormal condition, and the invention ensures that the query overtime does not wait for the return of the operation state of the corresponding service application program any more but directly takes the query overtime as the query result by comparing the overtime threshold with the query time, thereby improving the query efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a monitoring management method of the present invention;
fig. 2 is a flowchart of thread registration in embodiment 1;
FIG. 3 is a flowchart showing the latest running time of an update thread in embodiment 1;
FIG. 4 is a schematic flow chart of the statistics of the running states of the threads in embodiment 1;
fig. 5 is a schematic operation flow diagram of the master application program of the embodiment 1;
FIG. 6 is a schematic diagram showing the operation flow of the main business thread of the embodiment 1;
FIG. 7 is a schematic view showing the operation flow of other business threads according to the embodiment 1;
FIG. 8 is a schematic diagram showing the operation flow of the MQTT receiving thread for the specific case in embodiment 1;
FIG. 9 is a schematic diagram of the module connections of the main control subsystem in the monitoring and management system according to the present invention;
fig. 10 is a schematic diagram of signal connection of a program to be monitored in the intelligent management terminal in embodiment 3.
Detailed Description
The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.
Embodiment 1, a monitoring management method, as shown in fig. 1, includes the following steps:
s100, acquiring configuration information of application programs to be monitored, wherein the number of the application programs to be monitored is at least 2;
s200, monitoring the running state of the application program to be monitored according to the configuration information to obtain a monitoring result; and judging whether the application program to be monitored is abnormal in operation according to the monitoring result, and restarting the application program to be monitored which is abnormal in operation.
The method provided by the embodiment is used for a scene in which a plurality of application programs run simultaneously, each application program is monitored, and an abnormally-running application program is found in time and restarted, so that the application program can be recovered to run normally; at the moment, only the abnormal application program is restarted independently, and other application programs running normally are not affected and can still work normally.
In the step S100, the number of the application programs to be monitored is at least 2, that is, the application programs to be monitored include a main control application program and at least one business application program, the main control application program is used for executing the steps S100 to S200 to realize monitoring management of all the application programs, the business application programs are used for providing business services and executing business processing, for example, when the application programs are used for meter reading management, the business application programs include a remote communication application program, a database application program, a meter reading application program, a log application program and an event alarm application program; those skilled in the relevant art can set the service application program according to the actual situation, so detailed description in this embodiment is not required.
When the system is actually used, the main control application program monitors the running condition of the main control application program through the hardware watchdog, and restarts the main control application program when abnormality occurs; each business application program monitors the running state of the business application program, the master control application program inquires the running state of each business application program according to a period, and the abnormal running business application program is restarted so as to realize the monitoring management of the running state of the business application program;
in step S100, the configuration information includes service application program configuration information and watchdog configuration information, where the service application program configuration information is used to identify a corresponding service application program, and the master control application program can control, according to the service application program configuration information, the corresponding service application program to start and restart, and can also send a query instruction to the corresponding service application program, that is, in this embodiment, the service application program configuration information is address information of the corresponding service application program.
The watchdog configuration information comprises a zero clearing period and a counting threshold;
note:
the number of the business application programs is at least one, and the configuration information of the business application programs corresponds to the business application programs one to one.
When the number of the service application programs is at least two, the service application programs can communicate with each other through the MQTT protocol according to actual needs, and the configuration information of the service application programs corresponds to the service application programs one to one.
In step S200, the operation state of the application program to be monitored is monitored according to the configuration information to obtain a monitoring result, whether the application program to be monitored operates abnormally is judged according to the monitoring result, and the application program to be monitored, which operates abnormally, is restarted and specifically divided into the following two cases;
the following two cases are specifically distinguished:
s210, configuring information as service application program configuration information;
the master control application program periodically queries the running state of the corresponding business application program based on the business application program configuration information to obtain a query result, and the query result is used as a monitoring result; when the query result is response overtime or abnormal operation, judging that the corresponding service application program is abnormal in operation, and restarting the service application program;
the query period for performing the periodic query is a period for detecting the operation state of the service application program, and may be set according to an actual situation, so this embodiment does not limit the query period, and the query period is set to be consistent with the zero clearing period in this embodiment.
The method comprises the following specific steps:
according to a preset query period, the main control application program queries the running state of the corresponding service application program based on the MQTT protocol and the configuration information of the service application program;
note, MQTT, message queue telemetry transport; the business programs are communicated with each other based on the MQTT protocol.
When the running state is obtained and is abnormal, the master control application program generates and records abnormal running information and restarts the corresponding service application program;
when the running state is not obtained and the query time exceeds a preset timeout threshold value, the master control application program judges that the query is overtime, generates and records running abnormal information, and restarts the corresponding service application program;
note that the preset timeout threshold may be set according to actual conditions, and therefore is not limited in this embodiment.
According to the embodiment, by comparing the overtime threshold with the query time, the query overtime is not required to wait for the return of the running state of the corresponding service application program, but the query overtime is directly used as the query result, so that the query efficiency is improved.
The query result is the monitoring result obtained when the service application is used as the application to be monitored in step S200, and as can be seen from the above, the query result includes normal operation, abnormal operation, and query timeout, and when the query result is that both the abnormal operation and the query timeout generate corresponding abnormal operation information, the abnormal operation information is written into the log for recording, and the corresponding service application is restarted.
The specific steps of the main control application program for inquiring the running state of the corresponding service application program based on the MQTT protocol and the configuration information of the service application program are as follows:
(1) and (3) pretreatment:
the business application program comprises a main business thread, and other business threads and an MQTT receiving thread can be established in the main business thread according to the actual business requirement;
the main service thread and each other service thread are registered in the linked list in advance, and since the steps of registering, updating the running state and the like of the main service thread and each other service thread are the same, the main service thread and each other service thread are collectively referred to as a service thread in the description.
Note that, since data transmission cannot be performed when the MQTT receiving thread is abnormal, that is, the query instruction cannot be received or the statistical result cannot be returned after the query instruction is received, in this embodiment, time-out determination is performed according to query duration, that is, when the MQTT receiving thread is abnormal in operation, the query result is determined to be query time-out, corresponding operation abnormal information is generated and recorded, and the corresponding service application program is restarted, so that thread registration and operation state monitoring of the MQTT receiving thread are not required.
As shown in fig. 2, the business thread registration steps are as follows:
acquiring a thread ID of a service thread, wherein the thread ID has uniqueness;
searching a registration state in the linked list based on the thread ID, judging that the service thread is registered if the same thread ID is inquired in the linked list, and ending the thread registration step, otherwise acquiring registration information and registering the service thread in the linked list;
the registration information includes a thread ID, a thread name, an update cycle limit value and a latest running time of the service thread, wherein the latest running time is the current time.
Note that, the range of the update cycle limit in this embodiment is 1-5 min, and a person skilled in the relevant art can set the update cycle limit of each thread according to actual needs, which is not limited in this embodiment.
And (3) performing service processing on the circulation when the service thread runs, wherein the circulation step is as follows: updating the latest running time, performing service processing, delaying and repeating the steps; the time length of the delay can be set according to actual needs, and is not limited in this embodiment.
The method for updating the latest running time is to update the latest running time by using the current time in a linked list, as shown in fig. 3, and comprises the following specific steps:
acquiring a thread ID of a service thread, searching a registration state in a linked list based on the thread ID, judging that the service thread is not registered when the thread ID is searched in the linked list, and ending the updating step;
and when the thread ID is found in the linked list, judging the service thread to register, and updating the latest running time of the service thread in the linked list based on the current time.
(2) And inquiring the running state:
the main control application program sends a query instruction to the corresponding business application program based on the MQTT protocol and the business application program configuration information, the MQTT receiving thread receives the query instruction, the running states of the main business thread and other business threads are counted based on the query instruction, a counting result is obtained, and the counting result is returned through the MQTT receiving thread.
The specific steps of the MQTT receiving thread for counting the running states of the main service thread and other service threads based on the query instruction are as follows:
and receiving the instruction based on the MQTT protocol, and judging whether the instruction is a query instruction.
Note that since each service application program also performs communication based on the MQTT protocol, it is necessary to determine whether the received instruction is an inquiry instruction sent by the master control application program or a service instruction sent by another service application program.
And when the judging instruction is the query instruction, counting the running state of each service thread, and returning the counting result to the main control application program.
Note that when it is determined that the instruction is not a query instruction, the quality is forwarded to the corresponding service thread for processing, which is the prior art, and therefore, detailed description is not given in this specification.
The method for counting the running state of each service thread is to circularly judge the running state of each registered service thread in the linked list, record and count running abnormal information, as shown in fig. 4, and the specific steps are as follows:
the running state of each service thread in the chain table is judged in sequence, and the method for judging the running state of the service threads comprises the following steps:
and calculating the non-operation time length according to the current time and the latest operation time of the service thread, wherein the non-operation time length is the difference obtained by subtracting the latest operation time from the current time, comparing the non-operation time length with the update cycle limit value of the service thread, and judging that the operation is abnormal when the non-operation time length is greater than the update cycle limit value, or judging that the operation is normal.
When the judgment result is abnormal operation, recording abnormal operation information, such as thread ID and non-operation time length, and judging the operation state of the next service thread until all service threads are judged;
when the judgment result is that the operation is normal, judging the operation state of the next service thread until all the service threads are judged;
and counting all the abnormal operation information to obtain a statistical result, and returning the statistical result to the master control application program.
(3) Restarting the service application programs based on the statistical result;
and the main control application program receives the statistical result, when the corresponding business application program operates normally, the statistical result is empty, namely, the statistical result is not operated with abnormal information, so that the statistical result is not empty, the operation of the corresponding business application program is judged to be abnormal, at the moment, the main control application program extracts and records the operated abnormal information, and the business application program is restarted at the same time, so that the business application program can work normally.
As can be seen from the above, each service application program monitors the working state of each service thread through registration of the service thread and registration of the latest running time, and when the master control application program queries the running state of each service application program, that is, queries the running state of each service thread in the service application program. And the master control application program obtains the abnormal operation information and writes the abnormal operation information into the log, so that the working personnel can trace the abnormal operation condition according to the log subsequently, and the subsequent optimization work is facilitated.
S220, the configuration information is watchdog configuration information;
the main control application program counts and clears the hardware watchdog according to the clear period and enables the hardware watchdog to count again; and simultaneously monitoring the count of the hardware watchdog, and restarting the hardware watchdog when the count of the hardware watchdog exceeds the count threshold.
When the master control application program starts to run, initializing a hardware watchdog and counting the hardware watchdog; and in the running process of the main control application program, the main control application program counts and clears the hardware watchdog according to the clearing period, so that the hardware watchdog is counted again. If the main control application program runs abnormally, the hardware watchdog cannot be counted and cleared according to the clearing period, the hardware watchdog continuously counts until the count exceeds the count threshold value, at the moment, the main control application program is judged to run abnormally, and the main control application program is restarted.
The counting data of the hardware watchdog is taken as the monitoring result.
Cases,
As shown in fig. 5, the specific content of the master application is as follows:
1. starting:
and starting the main control application program, initializing the MQTT service and the hardware watchdog by the main control application program at the moment, and starting each service application program according to the service configuration file.
2. Monitoring the running states of the master control application program and each service application program;
in this case, the hardware watchdog clearing and the operation state query of each service application program are sequentially performed, and a person skilled in the relevant art can make the hardware watchdog clearing and the operation state query of each service application program independent from each other and perform the operations respectively according to actual needs.
The specific monitoring steps are as follows:
and the master control application program inquires the running state of each service application program after clearing the hardware watchdog, and at the moment, the master control application program sends an inquiry instruction to each service application program by using the MQTT service according to the configuration information of the service application program to obtain an inquiry result.
And when the query result is abnormal operation or query timeout, the master control application program generates abnormal operation information and records the abnormal operation information so as to record the abnormal state and restart the corresponding service application program.
And when the query result is that the operation is normal or the operation is abnormal due to restarting, delaying, and repeating the steps, wherein the delay time length is consistent with the cycle length of the zero clearing cycle and the query cycle.
After the service application is started/restarted by the master application, as shown in fig. 6, the specific work content of the main service thread is as follows:
the MQTT service and the equipment are initialized, wherein the equipment initialization specifically refers to the initialization of equipment related to the business application program service, such as network equipment, serial equipment, IO and the like.
Creating other service threads and MQTT receiving threads according to actual needs;
the thread (main service thread) is registered in the linked list so as to facilitate the subsequent monitoring and management of the main service thread.
And after the registration is finished, updating the latest running time of the thread in each cycle of the business processing, namely updating the latest running time of the thread in the updated linked list by using the current time.
As shown in fig. 7, after the other service threads created by the main service thread start to run, the thread (other service threads) is also registered in the linked list, so as to facilitate subsequent monitoring and management of the other service threads; and after the registration is finished, updating the latest running time of the thread in each cycle of the business processing, namely updating the latest running time of the thread in the updated linked list by using the current time.
As shown in fig. 8, after the MQTT receiving thread created by the main service thread starts to run, the following steps are performed in a loop:
receiving instructions, wherein the instructions comprise query instructions sent by a master control application program and business instructions sent by other business application programs;
if the service application program comprises a database application program, a remote communication application program and a meter reading application program, the remote communication application program sends a service instruction for real-time meter reading to the meter reading application program, and the meter reading application program sends a service instruction for storing meter reading data to the database application program.
Judging the type of the instruction, counting the thread with abnormal operation based on the information of each thread in the linked list when the instruction is an inquiry instruction, and sending the counting result to the master control application program; otherwise, forwarding the service instruction to the corresponding service thread.
The master control application program of the embodiment monitors the running state of the application program by using a hardware watchdog pair, and restarts the application program when the application program runs abnormally; each business application program monitors each business thread through registration of the business thread and update of the latest running time, so that the main control application program restarts the business application program according to the information of abnormal business thread running in the business application program obtained through inquiry; and the master control application program also judges whether the query is overtime according to a preset overtime threshold value, and restarts the business application program which is queried overtime. Therefore, the master control application program of the embodiment can monitor the master control application program and each business application program, automatically restart the application program with abnormal operation, and the normal operation of the application program with normal operation cannot be influenced by the restarting operation.
Embodiment 2, a monitoring management system, as shown in fig. 9, includes a main control subsystem and at least one service subsystem:
the business subsystem is used for providing business services;
the master control subsystem comprises:
an obtaining module 100, configured to obtain configuration information of a service subsystem and a master control subsystem;
the monitoring module 200 is configured to monitor the operating states of the service subsystem and the master control subsystem according to the configuration information, and obtain a monitoring result;
and the restart module 300 is configured to determine whether the service subsystem and the master control subsystem operate abnormally according to the monitoring result, and restart the service subsystem and/or the master control subsystem that operate abnormally.
The configuration information comprises service application program configuration information and watchdog configuration information, wherein the service application program configuration information is used for identifying a corresponding service subsystem, and the watchdog configuration information comprises a zero clearing period and a counting threshold;
the monitoring module 200 includes a service monitoring unit 210 and a watchdog monitoring unit 220, and the restart module 300 includes a service restart unit 310 and a master restart unit 320;
the service monitoring unit 210 is configured to periodically query the operating state of the corresponding service subsystem based on the service application configuration information, and obtain a query result;
the watchdog monitoring unit 220 is configured to count and zero-clearing the hardware watchdog according to the zero-clearing period, recount the hardware watchdog, and monitor the count of the hardware watchdog;
the service restarting unit 310 is configured to determine that a corresponding service subsystem operates abnormally when the query result is response timeout or abnormal operation, and restart the service subsystem;
the main control restarting unit 320 is configured to determine that the operation is abnormal when the hardware watchdog count exceeds the count threshold, and restart the main control subsystem.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Embodiment 3, an intelligent management terminal, including a memory, a processor, and an application program stored on the memory and operable on the processor;
the to-be-applied programs include a master application program and at least one business application program, and the master application program implements the steps of the method of embodiment 1 when executed by a processor.
Case (2):
the intelligent management terminal is used for meter reading management, and the service application programs are shown in fig. 10 and include a database application program (database APP in fig. 10), a remote communication application program (remote communication APP in fig. 10), a meter reading application program (meter reading APP in fig. 10), a log application program and an event alarm application program.
The functions of the database application program, the remote communication application program, the meter reading application program, the log application program and the event warning application program are consistent with the functions of the database thread, the remote communication thread, the local meter reading thread, the log thread and the event warning thread of the management application program in the intelligent management terminal in the prior art, which is the prior art, and technicians in related fields can load the service application program according to actual needs by themselves, and detailed description is not given in the specification.
The master control application program is used for monitoring the running states of all application programs in the intelligent management terminal and automatically restarting the application programs with abnormal running, manual management is not needed, and normal running of all application programs in the intelligent management terminal can be guaranteed.
As shown in fig. 10, the master APP (the master APP in fig. 10) communicates with the service applications based on MQTT technology to query the operation states of the service applications, and the service applications communicate with each other based on MQTT technology (not shown in the figure) to realize the service circulation.
Embodiment 4 is a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of embodiment 1.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
In addition, it should be noted that the specific embodiments described in the present specification may differ in the shape of the components, the names of the components, and the like. All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims (10)

1. A monitoring management method is characterized by comprising the following steps:
acquiring configuration information of application programs to be monitored, wherein the number of the application programs to be monitored is at least 2;
monitoring the running state of the application program to be monitored according to the configuration information to obtain a monitoring result; and judging whether the application program to be monitored is abnormal in operation according to the monitoring result, and restarting the application program to be monitored which is abnormal in operation.
2. The monitoring management method according to claim 1, characterized in that:
the application program to be monitored comprises a master control application program and a business application program, and the business application program is used for providing business service;
the configuration information comprises service application program configuration information and watchdog configuration information, wherein the service application program configuration information is used for identifying a corresponding service application program, and the watchdog configuration information comprises a zero clearing period and a counting threshold;
the master control application program periodically queries the running state of the corresponding business application program based on the business application program configuration information to obtain a query result, judges that the corresponding business application program runs abnormally when the query result is response overtime or abnormal running, and restarts the business application program;
the main control application program counts and clears the hardware watchdog according to the clear period and enables the hardware watchdog to count again; and simultaneously monitoring the count of the hardware watchdog, judging that the operation is abnormal when the count of the hardware watchdog exceeds the count threshold value, and restarting the hardware watchdog.
3. The monitoring management method according to claim 2, wherein the main control application program periodically queries the running state of the corresponding service application program based on the configuration information of the service application program to obtain a query result, when the query result is response timeout or running abnormality, it is determined that the corresponding service application program runs abnormality, and the specific step of restarting the service application program is:
the main control application program queries the running state of the corresponding service application program based on the MQTT protocol and the service application program configuration information according to a preset query period;
when the running state is obtained and is abnormal, the master control application program generates and records abnormal running information and restarts the corresponding service application program;
and when the running state is not obtained and the query time exceeds a preset timeout threshold value, the main control application program judges that the query is overtime, generates and records running abnormal information, and restarts the corresponding service application program.
4. The monitoring management method according to claim 3, wherein the specific step of querying the running state of the corresponding service application based on the MQTT protocol and the service application configuration information is:
the business application program comprises a main business thread, other business threads and an MQTT receiving thread;
acquiring and recording thread information of a main service thread and other service threads, wherein the thread information comprises a thread ID, a thread name, an update cycle limit value and the latest running time, judging the running state of the corresponding thread according to the update cycle limit value and the latest running time, and judging the running state of the corresponding thread to be running abnormity when the difference value of the latest running time and the current time exceeds the update cycle limit value;
the main control application program sends a query instruction to the corresponding business application program based on the MQTT protocol and the business application program configuration information, the MQTT receiving thread receives the query instruction, the running states of the main business thread and other business threads are counted based on the query instruction, a counting result is obtained, and the counting result is returned through the MQTT receiving thread.
5. The monitoring management method according to any one of claims 1 to 4, characterized in that:
the number of business applications is at least one.
6. The monitoring management method according to any one of claims 1 to 4, characterized in that:
the business applications include database applications, telecommunications applications, meter reading applications, logging applications, and/or event alert applications.
7. A monitoring management system is characterized by comprising a main control subsystem and at least one service subsystem:
the business subsystem is used for providing business services;
the master control subsystem comprises:
the acquisition module is used for acquiring configuration information of the service subsystem and the main control subsystem;
the monitoring module is used for monitoring the running states of the service subsystem and the main control subsystem according to the configuration information to obtain a monitoring result;
and the restarting module is used for judging whether the service subsystem and the main control subsystem operate abnormally according to the monitoring result and restarting the abnormally operated service subsystem and/or the abnormally operated main control subsystem.
8. The monitoring management system according to claim 7, wherein:
the configuration information comprises service application program configuration information and watchdog configuration information, wherein the service application program configuration information is used for identifying a corresponding service subsystem, and the watchdog configuration information comprises a zero clearing period and a counting threshold;
the monitoring module comprises a service monitoring unit and a watchdog monitoring unit, and the restarting module comprises a service restarting unit and a main control restarting unit;
the service monitoring unit is used for periodically inquiring the running state of the corresponding service subsystem based on the configuration information of the service application program to obtain an inquiry result;
the watchdog monitoring unit is used for counting and resetting the hardware watchdog according to the reset period, recounting the hardware watchdog and monitoring the count of the hardware watchdog;
the service restarting unit is used for judging that the corresponding service subsystem operates abnormally when the query result is response overtime or abnormal operation, and restarting the service subsystem;
and the master control restarting unit is used for judging abnormal operation when the count of the hardware watchdog exceeds the count threshold value and restarting the master control subsystem.
9. An intelligent management terminal is characterized by comprising a memory, a processor and an application program which is stored on the memory and can run on the processor;
the applications comprise a master application and at least one business application, the master application when executed by a processor implementing the steps of the method of any one of claims 1 to 6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN201910973766.0A 2019-10-14 2019-10-14 Monitoring management method and system and intelligent management terminal Pending CN110795264A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910973766.0A CN110795264A (en) 2019-10-14 2019-10-14 Monitoring management method and system and intelligent management terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910973766.0A CN110795264A (en) 2019-10-14 2019-10-14 Monitoring management method and system and intelligent management terminal

Publications (1)

Publication Number Publication Date
CN110795264A true CN110795264A (en) 2020-02-14

Family

ID=69439058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910973766.0A Pending CN110795264A (en) 2019-10-14 2019-10-14 Monitoring management method and system and intelligent management terminal

Country Status (1)

Country Link
CN (1) CN110795264A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611106A (en) * 2020-05-21 2020-09-01 浙江中旻智能科技有限公司 System recovery method of face recognition system
CN111930596A (en) * 2020-08-13 2020-11-13 中国工商银行股份有限公司 Application program port monitoring method and device
CN112100034A (en) * 2020-09-29 2020-12-18 泰康保险集团股份有限公司 Service monitoring method and device
CN113747171A (en) * 2021-08-06 2021-12-03 天津津航计算技术研究所 Self-recovery video decoding method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996106A (en) * 2010-12-17 2011-03-30 南京中兴力维软件有限公司 Method for monitoring software running state
CN106776093A (en) * 2016-12-12 2017-05-31 Tcl集团股份有限公司 A kind of application exception log processing method and system
CN108694093A (en) * 2017-04-06 2018-10-23 迈普通信技术股份有限公司 Process exception monitoring method and device
CN109582486A (en) * 2018-11-20 2019-04-05 厦门科灿信息技术有限公司 A kind of house dog monitoring method, system and equipment and storage medium
CN109672583A (en) * 2018-09-25 2019-04-23 平安科技(深圳)有限公司 Method for monitoring network, equipment, storage medium and device
US20190163599A1 (en) * 2017-11-30 2019-05-30 International Business Machines Corporation Tape library integrated failure indication based on cognitive sound and vibration analysis
CN110032487A (en) * 2018-11-09 2019-07-19 阿里巴巴集团控股有限公司 Keep Alive supervision method, apparatus and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996106A (en) * 2010-12-17 2011-03-30 南京中兴力维软件有限公司 Method for monitoring software running state
CN106776093A (en) * 2016-12-12 2017-05-31 Tcl集团股份有限公司 A kind of application exception log processing method and system
CN108694093A (en) * 2017-04-06 2018-10-23 迈普通信技术股份有限公司 Process exception monitoring method and device
US20190163599A1 (en) * 2017-11-30 2019-05-30 International Business Machines Corporation Tape library integrated failure indication based on cognitive sound and vibration analysis
CN109672583A (en) * 2018-09-25 2019-04-23 平安科技(深圳)有限公司 Method for monitoring network, equipment, storage medium and device
CN110032487A (en) * 2018-11-09 2019-07-19 阿里巴巴集团控股有限公司 Keep Alive supervision method, apparatus and electronic equipment
CN109582486A (en) * 2018-11-20 2019-04-05 厦门科灿信息技术有限公司 A kind of house dog monitoring method, system and equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611106A (en) * 2020-05-21 2020-09-01 浙江中旻智能科技有限公司 System recovery method of face recognition system
CN111930596A (en) * 2020-08-13 2020-11-13 中国工商银行股份有限公司 Application program port monitoring method and device
CN112100034A (en) * 2020-09-29 2020-12-18 泰康保险集团股份有限公司 Service monitoring method and device
CN113747171A (en) * 2021-08-06 2021-12-03 天津津航计算技术研究所 Self-recovery video decoding method
CN113747171B (en) * 2021-08-06 2024-04-19 天津津航计算技术研究所 Self-recovery video decoding method

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
CN110795264A (en) Monitoring management method and system and intelligent management terminal
CN108563502B (en) Task scheduling method and device
CN107704360B (en) Monitoring data processing method, equipment, server and storage medium
CN101098260A (en) Distributed equipment monitor management method, equipment and system
CN112636979B (en) Cluster alarm method and related device
CN111538563A (en) Event analysis method and device for Kubernetes
CN104598300A (en) Distributive business process customization method and system
CN112751726A (en) Data processing method and device, electronic equipment and storage medium
CN113434327A (en) Fault processing system, method, equipment and storage medium
CN111565135A (en) Method for monitoring operation of server, monitoring server and storage medium
CN115665016A (en) Heartbeat monitoring method, device, equipment and storage medium
CN112600719A (en) Alarm clustering method, device and storage medium
CN101645736A (en) Detection method and device of validity of historical performance data
US20230359514A1 (en) Operation-based event suppression
KR20180015027A (en) Apparatus and Method for Automatic Error Alarm of DDS Applications System
CN110324208B (en) Data loss processing method, intelligent terminal and storage medium
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN112463549A (en) Auditing method, device and equipment of cloud platform and computer readable storage medium
US11930292B2 (en) Device state monitoring method and apparatus
CN113472881B (en) Statistical method and device for online terminal equipment
CN115705259A (en) Fault processing method, related device and storage medium
WO2021012819A1 (en) Database deadlock detection method and apparatus
CN108924013B (en) Network flow accurate acquisition method and device
CN112686644A (en) Project operation state monitoring method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination