CN111459770A - Server operation state warning method and device, server and storage medium - Google Patents

Server operation state warning method and device, server and storage medium Download PDF

Info

Publication number
CN111459770A
CN111459770A CN202010250178.7A CN202010250178A CN111459770A CN 111459770 A CN111459770 A CN 111459770A CN 202010250178 A CN202010250178 A CN 202010250178A CN 111459770 A CN111459770 A CN 111459770A
Authority
CN
China
Prior art keywords
service
monitoring
server
recovery
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010250178.7A
Other languages
Chinese (zh)
Inventor
赵学良
黄泽伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yolanda Technology Co ltd
Original Assignee
Shenzhen Yolanda Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yolanda Technology Co ltd filed Critical Shenzhen Yolanda Technology Co ltd
Priority to CN202010250178.7A priority Critical patent/CN111459770A/en
Publication of CN111459770A publication Critical patent/CN111459770A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a method and a device for alarming the running state of a server, the server and a storage medium, wherein the method comprises the following steps: acquiring a first monitoring index of at least one current service of a server; confirming abnormal indexes in the first monitoring indexes according to the monitoring threshold; sending an alarm instruction to a preset alarm system according to the abnormal index; receiving task information input by an administrator based on an alarm instruction; and sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index. According to the invention, the running state of the server is monitored in real time, the task recovery instruction is matched according to the task information to recover the corresponding service, the technical problem that the prior art needs to manually log in the server or a background and execute the recovery operation according to the service recovery instruction is solved, the task information is automatically judged and the recovery operation is intelligently executed, the labor cost is reduced, and the efficiency and timeliness of service recovery are improved.

Description

Server operation state warning method and device, server and storage medium
Technical Field
The embodiment of the invention relates to an information monitoring technology, in particular to a method and a device for alarming the running state of a server, the server and a storage medium.
Background
With the continuous development of software technology and the continuous evolution of micro-service technology, the number of services is increasing day by day, the services need to be monitored and self-healing is achieved, but the self-healing of the services is far enough, more centers of gravity are placed on the monitoring of the services at present, and the services are recovered in an abnormal state through some means. When the accuracy and timeliness of the alarm are ensured, a way for the service to be quickly recovered to be normal is needed.
The existing monitoring method for the service also has the technical problem that after receiving the alarm message, operation and maintenance personnel need to manually log in the server or the background to restore the service.
Disclosure of Invention
The invention provides a method and a device for alarming the running state of a server, the server and a storage medium, which are used for automatically judging task information, intelligently executing recovery operation, reducing labor cost and improving the efficiency and timeliness of service recovery.
In a first aspect, an embodiment of the present invention provides a method for alarming a server operating state, including:
acquiring a first monitoring index of at least one current service of a server;
confirming abnormal indexes in the first monitoring indexes according to a monitoring threshold value;
sending the alarm instruction to a preset alarm system according to the abnormal index;
receiving task information input by an administrator based on the alarm instruction;
and sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index.
Further, the obtaining a first monitoring index of at least one current service of the server previously includes:
generating an alarm configuration file according to the monitoring threshold and a preset monitoring frame code;
associating the alarm configuration file with a current service list of the server so as to monitor all services of the current service list;
and creating a recovery service instruction list according to a preset recovery service instruction, and storing the recovery service instruction list to a preset database, wherein the recovery service instruction comprises recovery service information.
Further, the sending the corresponding service recovery instruction to the server according to the task information includes:
confirming whether the task information is matched with the recovery service information or not according to the task information and the recovery service information;
and if the task information is matched with the recovery service information in a consistent manner, sending a recovery service instruction corresponding to the recovery service information to the server.
Further, after sending the corresponding service restoration instruction to the server according to the task information, the method includes:
re-acquiring a second monitoring index of the current task corresponding to the service recovery instruction in the server;
confirming whether the second monitoring index has an abnormal index according to the monitoring threshold;
and if the abnormal index does not exist, sending the service recovery success information to the preset alarm system.
Further, the first monitoring index includes at least one monitoring parameter, and the determining an abnormal index in the first monitoring index according to a monitoring threshold includes:
judging whether the monitoring parameter is greater than or less than or equal to the monitoring threshold value;
and if the monitoring parameter is larger than the monitoring threshold, the monitoring parameter is an abnormal index in the first monitoring index.
Further, the determining whether the task information is matched with the recovery service information according to the task information and the recovery service information further includes:
and if the task information is not matched with the recovery service information, sending a re-input instruction to the preset alarm system.
In a second aspect, an embodiment of the present invention further provides an apparatus for warning an operating status of a server, where the apparatus includes:
the index acquisition module is used for acquiring at least one first monitoring index of the current service of the server;
the abnormality confirmation module is used for confirming an abnormal index in the first monitoring index according to a monitoring threshold value;
the alarm sending module is used for sending the alarm instruction to a preset alarm system according to the abnormal index;
the task receiving module is used for receiving task information input by an administrator based on the alarm instruction;
and the service recovery module is used for sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index.
Further, the method also comprises the following steps:
the alarm configuration module is used for generating an alarm configuration file according to the monitoring threshold and a preset monitoring frame code;
the monitoring association module is used for associating the alarm configuration file with a current service list of the server so as to monitor all services of the current service list;
the list creating module is used for creating a recovery service instruction list according to a preset recovery service instruction and storing the recovery service instruction list to a preset database, wherein the recovery service instruction comprises recovery service information.
In a third aspect, an embodiment of the present invention further provides a server, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are enabled to implement the method for warning of the running state of the server according to any one of the above embodiments.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for warning an operating state of a server according to any one of the foregoing embodiments.
According to the invention, the monitoring index of the server is monitored in real time by the monitoring terminal, the corresponding task recovery instruction is matched according to the task information and is sent to the server to recover the corresponding service, the technical problem that the recovery operation is executed according to the service recovery instruction because the server or the background needs to be manually logged in the prior art is solved, and the technical effects of automatically judging the task information, intelligently executing the recovery operation, reducing the labor cost and improving the efficiency and timeliness of service recovery are realized.
Drawings
Fig. 1 is a flowchart of an alarm method for a server operating state according to an embodiment of the present invention;
fig. 2 is a flowchart of an alarm method for a server operating state according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an alarm device of a server operation state according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
Furthermore, the terms "first," "second," and the like may be used herein to describe various orientations, actions, steps, elements, or the like, but the orientations, actions, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, the first task information may be referred to as second task information, and similarly, the second task information may be referred to as first task information, without departing from the scope of the present application. Both the first task information and the second task information are task information, but they are not the same task information. The terms "first", "second", etc. are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Example one
Fig. 1 is a flowchart of an alarm method for a server operating state according to an embodiment of the present invention, where this embodiment is applicable to a situation of monitoring an operating state of a server or a service on the server in real time, and the method may be executed by a monitoring terminal, and specifically includes the following steps:
step S110, a first monitoring index of at least one current service of the server is obtained.
Specifically, the server of this embodiment may refer to a device or a terminal for providing a service function. When a plurality of programs or software are run on the server, that is, a plurality of business services are provided simultaneously, for example, a data storage service, a download service (thunderbolt, hundred-degree cloud download, 360-degree cloud download, and the like), data interaction (data call between APPs), and the like are run on one host at the same time, in order to ensure that the services on the host can run normally, it is necessary to monitor each running service (that is, current service) on the server through a monitoring terminal or a monitoring system. The monitoring system framework for the server can be established in advance, a configuration file for monitoring the running data of the server service in real time can be configured, the configuration file can be generated by inputting different monitoring index thresholds and combining preset monitoring codes in a code layer based on the preset monitoring framework, after the configuration file is associated with a current service list in the server, the monitoring terminal can monitor the running data (namely a first monitoring index) of the server service in real time, the first monitoring index can be all the running data of the current service, and part of the running data can be screened according to different monitoring requirements and service functions, and further limitation is not needed. Generally, a monitoring terminal may deploy a monitoring node for real-time monitoring for each service on the server, and then collect a first monitoring index of a current service corresponding to each monitoring node through a preset transmission form (e.g., an API transmission interface).
And step S120, confirming abnormal indexes in the first monitoring indexes according to the monitoring threshold.
Specifically, after acquiring the first monitoring index of the service or software in which the server is currently in the running state, the monitoring terminal may determine which of the first monitoring indexes is abnormal (i.e., which of the first monitoring indexes is an abnormal index) according to a preset monitoring threshold for each first monitoring index. In this embodiment, the specific determination rule may be adjusted according to the difference between the preset monitoring threshold and the first monitoring index, for example, when the monitoring threshold is a minimum value of the acceptable first monitoring index, if the value of the first monitoring index is smaller than the monitoring threshold, it indicates that the first monitoring index is an abnormal index, and when the monitoring threshold is a maximum value of the acceptable first monitoring index, it indicates that the first monitoring index is an abnormal index if the value of the first monitoring index is larger than the monitoring threshold. The acceptable maximum value and the acceptable minimum value of a first monitoring index can also be set simultaneously, that is, whether a first monitoring index is an abnormal index is determined through two corresponding monitoring threshold values, when the value of the first monitoring index is within the value range of the two corresponding monitoring threshold values, the first monitoring index is not an abnormal index, and when the value of the first monitoring index is outside the value range of the two corresponding monitoring threshold values (i.e., smaller than the minimum value or larger than the maximum value), the first monitoring index is an abnormal index.
And S130, sending the alarm instruction to a preset alarm system according to the abnormal index.
Specifically, in this embodiment, the preset alarm system may refer to a terminal or a platform that is configured to receive an alarm instruction sent by a monitoring terminal, provide a plurality of service recovery instructions or service recovery contents according to the alarm instruction, and send the service recovery instructions to a server corresponding to the service. After the monitoring terminal determines which first monitoring indexes are abnormal indexes by judging the relationship between each first monitoring index and the monitoring threshold, the monitoring terminal can also generate and send an alarm instruction corresponding to each abnormal index to a preset alarm system.
And step S140, receiving task information input by the administrator based on the alarm instruction.
Specifically, after the preset alarm system receives the alarm instruction sent by the monitoring terminal, the administrator of the preset alarm system may input different task information according to the specific content of the alarm instruction, and then send the task information to the monitoring terminal. In this embodiment, the monitoring terminal may send the alarm instruction to the preset alarm system through the cloud data transmission platform, where the cloud transmission platform may refer to communication platforms such as an arbiba cloud platform and a wechat platform that can be used for data and information transmission, and in order to ensure that the cloud transmission platforms can be used in cooperation with the monitoring terminal and the preset alarm system, before monitoring the current service of the server, the association among the cloud transmission platform, the monitoring terminal and the preset alarm system may be established first to ensure smooth and intelligent communication. For example, when an alarm instruction sends a specific content "insufficient disk space of the server D disk" to the preset alarm system through the cloud transmission platform, the administrator can check the alarm information through a preset alarm page of the preset alarm system (the preset alarm page may be provided by the cloud transmission platform or may be provided by the preset alarm system itself), and then input corresponding recovery service information (i.e., task information of the embodiment, which may refer to information indicating which task the terminal performs) in the preset alarm page according to the alarm information, such as "perform task: and the server D disk clears the disk space' and sends the recovery service information to the monitoring terminal.
And S150, sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index.
Specifically, after the monitoring terminal receives a service restoration instruction sent by a preset alarm system, the monitoring terminal may further match the service restoration content of the service restoration instruction with a service restoration list in a preset database, where the service restoration list is an association table formed by one-to-one correspondence of service restoration content and service restoration tasks, and when the service restoration content is matched with a certain service restoration task in the service restoration list, the monitoring terminal sends the service restoration instruction corresponding to the matched service restoration task to the server, so that the server can restart or restore the corresponding service or software according to the service restoration instruction.
The first embodiment of the invention has the advantages that the monitoring terminal monitors the monitoring index of the server in real time, the corresponding task recovery instruction is matched according to the task information and is sent to the server to recover the corresponding service, the technical problem that the prior art needs to manually log in the server or a background and execute the recovery operation according to the service recovery instruction is solved, and the technical effects of automatically judging the task information, intelligently executing the recovery operation, reducing the labor cost and improving the efficiency and timeliness of service recovery are realized.
Example two
The second embodiment is further optimized on the basis of the first embodiment. Fig. 2 is a flowchart of an alarm method for a server operating state according to a second embodiment of the present invention, and as shown in fig. 2, the alarm method for a server operating state according to the present embodiment includes:
step S201, generating an alarm configuration file according to the monitoring threshold and a preset monitoring frame code.
Specifically, the preset monitoring framework code may refer to a code required for constructing a monitoring function, and the code may be called by a code layer or a third-party system, or may be manually input by a programmer. After the monitoring framework is constructed, the monitoring terminal further needs to generate different monitoring rules (i.e., the alarm configuration file in this embodiment) according to the thresholds of the different monitoring indexes, and respectively monitor the monitoring indexes of each service according to the different monitoring rules.
Step S202, the alarm configuration file is associated with the current service list of the server so as to monitor all services of the current service list.
Specifically, after the monitoring terminal generates different alarm configuration files, the alarm configuration files need to be associated with the current service list of the server, so that the monitoring terminal can have a corresponding alarm configuration file to perform monitoring operation for each service in the current service list.
Step S203, a recovery service instruction list is created according to a preset recovery service instruction, and the recovery service instruction list is stored in a preset database, wherein the recovery service instruction comprises recovery service information.
Specifically, after the alarm configuration file is configured at the monitoring terminal, a corresponding restoration service instruction (that is, the preset restoration service instruction in this embodiment) needs to be set according to a problem or a fault that may occur to the service or software running on the server, and then a restoration service instruction list is created according to the preset restoration service instruction and stored in a preset database, so as to facilitate subsequent call of the restoration service instruction list.
Step S204, a first monitoring index of at least one current service of the server is obtained.
Specifically, the server of this embodiment may refer to a device or a terminal for providing a service function. The monitoring terminal may use prometheus, which is an open source, or may use other monitoring tools, which is not further limited herein. This embodiment takes prometheus as an example. When a plurality of programs or software are run on the server, that is, a plurality of business services are provided simultaneously, for example, a data storage service, a download service (thunderbolt, hundred-degree cloud download, 360-degree cloud download, and the like), data interaction (data call between APPs), and the like are run on one host at the same time, in order to ensure that the services on the host can run normally, it is necessary to monitor each running service (that is, current service) on the server through a monitoring terminal or a monitoring system. After the monitoring terminal associates the alarm configuration file with the current service list in the server and a list of service restoration instructions exists in a preset database, the monitoring terminal can monitor the operation data (i.e., a first monitoring index) of the current operation service of the server in real time, where the first monitoring index may be all the operation data of the current service or a part of the operation data screened according to different monitoring requirements and service functions, and no further limitation is made here. The monitoring terminal may deploy a monitoring node for real-time monitoring for each service on the server (i.e., deploy an agent/exporter for acquiring a monitoring index of the service for each service), and then acquire a first monitoring index of a current service corresponding to each monitoring node through a preset transmission form (e.g., an API transmission interface).
Step S205, the first monitoring index includes at least one monitoring parameter, and whether the monitoring parameter is greater than or less than or equal to the monitoring threshold is determined.
Step S206, if the monitoring parameter is greater than the monitoring threshold, the monitoring parameter is an abnormal index in the first monitoring index.
Specifically, after acquiring the first monitoring index of the service or software in which the server is currently in the running state, the monitoring terminal may determine which of the first monitoring indexes is abnormal (i.e., which of the first monitoring indexes is an abnormal index) according to a preset monitoring threshold for each first monitoring index. In this embodiment, the specific determination rule may be adjusted according to the difference between the preset monitoring threshold and the first monitoring index, for example, when the monitoring threshold is a minimum value of the acceptable first monitoring index, if the value of the first monitoring index is smaller than the monitoring threshold, it indicates that the first monitoring index is an abnormal index, and when the monitoring threshold is a maximum value of the acceptable first monitoring index, it indicates that the first monitoring index is an abnormal index if the value of the first monitoring index is larger than the monitoring threshold. The acceptable maximum value and the acceptable minimum value of a first monitoring index can also be set simultaneously, that is, whether a first monitoring index is an abnormal index is determined through two corresponding monitoring threshold values, when the value of the first monitoring index is within the value range of the two corresponding monitoring threshold values, the first monitoring index is not an abnormal index, and when the value of the first monitoring index is outside the value range of the two corresponding monitoring threshold values (i.e., smaller than the minimum value or larger than the maximum value), the first monitoring index is an abnormal index.
And step S207, sending the alarm instruction to a preset alarm system according to the abnormal index.
Specifically, after determining which monitoring parameters in the first monitoring index are abnormal indexes, the monitoring terminal may further generate corresponding alarm instructions for each abnormal index and send the alarm instructions to a preset alarm system. Here, for example, the monitoring terminal uses a prometheus monitoring tool. Before the monitoring terminal generates the alarm instruction corresponding to each abnormal index, the monitoring terminal may write an alarm information processing service (i.e., a service for intelligently sending the alarm instruction to a preset alarm system, such as a webhook service, which is a push API for web callback or http, and is a way of providing real-time information to an APP or other applications) in a code layer in advance, where the alarm information processing service may send the alarm instruction in multiple ways, and specifically, which way is to be adopted may be determined according to the receiving correspondence of the alarm instruction and the form of the alarm instruction, such as an email or a cloud short message form (i.e., a transmission form for receiving and sending data or information on a specific communication platform) to send the alarm instruction to the preset alarm system. In this embodiment, when the server cannot log in the background (e.g., a network is disconnected) or the operating device (e.g., a computer) cannot be used, a cloud short message may be sent through a Global System for Mobile Communication (GSM) to invoke a service recovery function of the server to restart or repair a corresponding service on the server.
And step S208, receiving task information input by the administrator based on the alarm instruction.
Specifically, after the preset alarm system receives the alarm instruction sent by the monitoring terminal, the administrator of the preset alarm system may input different task information according to the specific content of the alarm instruction, and then send the task information to the monitoring terminal. In this embodiment, the monitoring terminal may send the warning instruction to the preset warning system through a cloud data transmission platform or a cloud short message platform, where the cloud transmission platform may refer to communication platforms such as an arbiba cloud platform and a wechat platform that can be used for data and information transmission, and in order to ensure that the cloud transmission platforms can be used in cooperation with the monitoring terminal and the preset warning system, before monitoring the current service of the server, the association among the cloud transmission platform, the monitoring terminal and the preset warning system may be established first to ensure smooth and intelligent communication. For example, when an alarm instruction sends a specific content "the service B of the server is abnormal" to the preset alarm system through the cloud transmission platform, the administrator can check the alarm information through a preset alarm page of the preset alarm system (the preset alarm page may be provided by the cloud transmission platform or may be provided by the preset alarm system itself), and then input corresponding recovery service information (i.e., task information in this embodiment, which may refer to information indicating which task the terminal performs) in the preset alarm page according to the alarm information, such as "execute task: and the service B restarts' and sends the recovered service information to the monitoring terminal.
Step S209, confirming whether the task information is matched with the recovery service information according to the task information and the recovery service information.
And step S210, if the task information is inconsistent with the recovered service information in a matching manner, sending a re-input instruction to the preset alarm system.
Specifically, when the task information is inconsistent with the recovery service information in the recovery service list in the preset database, the monitoring terminal can also send a re-input instruction to the preset alarm system through the cloud transmission platform, and then receive the task information newly input by the administrator. In this embodiment, there may be a case where the task information input by the administrator is wrong, or the task information is not matched with the restored service information in the updated restored service list, at this time, the monitoring terminal needs to send a re-input instruction to the preset warning system, and then, according to the new task information sent by the preset warning system and the preset determination rule, it is determined whether the administrator inputs the wrong task information or the task information is not matched with the updated restored service information. When the task information is not matched with the updated recovery service information, the monitoring terminal can also send an update list instruction to a preset alarm system so as to remind an administrator to correspondingly adjust the task information according to the updated recovery service list.
Step S211, if the task information matches and matches the recovered service information, sending a recovered service instruction corresponding to the recovered service information to the server, where the recovered service instruction is used to recover or restart the current service corresponding to the abnormal index.
Specifically, after the monitoring terminal receives a service restoration instruction sent by a preset alarm system, the monitoring terminal may further match the service restoration content of the service restoration instruction with a service restoration list in a preset database, where the service restoration list is an association table formed by one-to-one correspondence of service restoration content and service restoration tasks, and when the service restoration content is matched with a certain service restoration task in the service restoration list, the monitoring terminal sends the service restoration instruction corresponding to the matched service restoration task to the server, so that the server can restart or restore the corresponding service or software according to the service restoration instruction. In this embodiment, the monitoring terminal may send the recovery service instruction to a preset recovery system of the server through the cloud transmission platform (for example, jenkins and the like, jenkins is an open source software project, and is a continuous integration tool developed based on Java and used for monitoring continuous and repeated work) so as to enable the server to automatically recover the corresponding service, and may also call a recovery program of the server through the cloud transmission platform, so as to execute a corresponding recovery task, for example, log in the server in a ssh (ssh refers to a remote connection mode) manner, and execute the recovery service instruction or a related recovery service script.
And step S212, re-acquiring a second monitoring index of the current task corresponding to the service restoration instruction in the server.
Step S213, determining whether there is an abnormal indicator in the second monitoring indicator according to the monitoring threshold.
And step S214, if the abnormal index does not exist, sending the service recovery success information to the preset alarm system.
Specifically, when the task information matches the recovery service information, the server receives a recovery service instruction corresponding to the task service information sent by the monitoring terminal, and executes the recovery service instruction. After the server executes the service restoration instruction, the monitoring terminal may also re-acquire a monitoring index (i.e., the second monitoring index in this embodiment) of the service corresponding to the service restoration instruction, and determine whether the second monitoring index is abnormal according to the monitoring threshold, where the determination process is the same as the determination process of whether the first monitoring index is abnormal. When the second monitoring index is not abnormal, it indicates that the service corresponding to the service restoration instruction has been restored to normal, and the monitoring terminal may generate and send a service restoration success information value preset alarm system of the service to notify an administrator that the service has been restored. When the second monitoring index is still abnormal, the method for recovering the service is the same as the method for recovering the first monitoring index, and the description is omitted here.
The second embodiment of the invention has the advantages that the monitoring terminal monitors the monitoring index of the server and the monitoring index of the service to be monitored in the server in real time, judges the relation between the monitoring index and the monitoring threshold value to confirm the abnormal index, matches the corresponding recovery task instruction according to the task information corresponding to the abnormal index and sends the task information to the server to recover the corresponding service, solves the technical problem that the prior art needs to manually log in the server or a background and execute the recovery operation according to the recovery service instruction, realizes automatic judgment of the task information and intelligent execution of the recovery operation, reduces the labor cost, improves the efficiency and timeliness of service recovery, can still perform the recovery operation on the service on the server under the condition that the server background cannot be logged in, and reduces the technical effect of network dependency.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an alarm device of a server operation state according to a third embodiment of the present invention. As shown in fig. 3, the apparatus 300 for warning about the operating status of a server according to the present embodiment includes:
an index obtaining module 310, configured to obtain a first monitoring index of at least one current service of a server;
an anomaly confirmation module 320, configured to confirm an anomaly indicator in the first monitoring indicator according to a monitoring threshold;
the alarm sending module 330 is configured to send the alarm instruction to a preset alarm system according to the abnormal indicator;
the task receiving module 340 is configured to receive task information input by an administrator based on the alarm instruction;
and a service recovery module 350, configured to send a corresponding service recovery instruction to the server according to the task information, where the service recovery instruction is used to recover or restart the current service corresponding to the abnormal indicator.
In this embodiment, the apparatus 300 for warning the operation state of the server further includes:
the alarm configuration module 360 is configured to generate an alarm configuration file according to the monitoring threshold and a preset monitoring frame code;
a monitoring association module 370, configured to associate the alarm configuration file with a current service list of the server, so as to monitor all services of the current service list;
the list creating module 380 is configured to create a recovery service instruction list according to a preset recovery service instruction, and store the recovery service instruction list in a preset database, where the recovery service instruction includes recovery service information.
In this embodiment, the restore service module 350 includes:
the recovery service unit is used for confirming whether the task information is matched with the recovery service information according to the task information and the recovery service information; and if the task information is matched with the recovery service information in a consistent manner, sending a recovery service instruction corresponding to the recovery service information to the server.
In this embodiment, the apparatus 300 for warning the operation status of the server includes:
a recovery confirmation module 390, configured to reacquire a second monitoring indicator of the current task corresponding to the service recovery instruction in the server; confirming whether the second monitoring index has an abnormal index according to the monitoring threshold; and if the abnormal index does not exist, sending the service recovery success information to the preset alarm system.
In this embodiment, the first monitoring index includes at least one monitoring parameter, and the abnormality determining module 320 includes:
an abnormality confirmation unit configured to determine whether the monitoring parameter is greater than or less than or equal to the monitoring threshold; and if the monitoring parameter is larger than the monitoring threshold, the monitoring parameter is an abnormal index in the first monitoring index.
In this embodiment, the recovery service unit further includes:
and the re-input unit is used for sending a re-input instruction to the preset alarm system if the task information is inconsistent with the recovered service information in a matching manner.
The alarm device for the server running state provided by the embodiment of the invention can execute the alarm method for the server running state provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention, as shown in fig. 4, the server includes a processor 410, a memory 420, an input device 430, and an output device 440; the number of the processors 410 in the server may be one or more, and one processor 410 is taken as an example in fig. 4; the processor 410, the memory 420, the input device 430 and the output device 440 in the server may be connected by a bus or other means, and the bus connection is exemplified in fig. 4.
The memory 410 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the alarm system of the server operation state in the embodiment of the present invention (for example, an index acquisition module, an abnormality confirmation module, an alarm sending module, a task receiving module, a service recovery module, an alarm configuration module, a monitoring association module, a list creation module, and a recovery confirmation module in the alarm device of the server operation state). The processor 410 executes various functional applications and data processing of the server by executing software programs, instructions and modules stored in the memory 420, that is, implements the above-mentioned method for alarming the operating state of the server, that is:
acquiring a first monitoring index of at least one current service of a server;
confirming abnormal indexes in the first monitoring indexes according to a monitoring threshold value;
sending the alarm instruction to a preset alarm system according to the abnormal index;
receiving task information input by an administrator based on the alarm instruction;
and sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index.
The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to a server over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the server. The output device 440 may include a display device such as a display screen.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a method for warning an operating status of a server, where the method includes:
acquiring a first monitoring index of at least one current service of a server;
confirming abnormal indexes in the first monitoring indexes according to a monitoring threshold value;
sending the alarm instruction to a preset alarm system according to the abnormal index;
receiving task information input by an administrator based on the alarm instruction;
and sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the above method operations, and may also perform related operations in the alarm method for the server operation state provided by any embodiment of the present invention.
Based on the understanding that the present invention may be embodied in software products which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a flash Memory (F L ASH), a hard disk, or an optical disk of a computer, and the like, and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the methods of the embodiments of the present invention.
It should be noted that, in the embodiment of the warning apparatus for the server operation state, each unit and each module included in the embodiment are only divided according to the functional logic, but are not limited to the above division, as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments illustrated herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for alarming the running state of a server is characterized by comprising the following steps:
acquiring a first monitoring index of at least one current service of a server;
confirming abnormal indexes in the first monitoring indexes according to a monitoring threshold value;
sending the alarm instruction to a preset alarm system according to the abnormal index;
receiving task information input by an administrator based on the alarm instruction;
and sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index.
2. The method for alarming an operating state of a server according to claim 1, wherein the obtaining a first monitoring index of at least one current service of the server previously comprises:
generating an alarm configuration file according to the monitoring threshold and a preset monitoring frame code;
associating the alarm configuration file with a current service list of the server so as to monitor all services of the current service list;
and creating a recovery service instruction list according to a preset recovery service instruction, and storing the recovery service instruction list to a preset database, wherein the recovery service instruction comprises recovery service information.
3. The method for alarming the running state of the server according to claim 2, wherein the sending the corresponding service restoration instruction to the server according to the task information comprises:
confirming whether the task information is matched with the recovery service information or not according to the task information and the recovery service information;
and if the task information is matched with the recovery service information in a consistent manner, sending a recovery service instruction corresponding to the recovery service information to the server.
4. The method for alarming the running state of the server according to claim 3, wherein after the step of sending the corresponding service recovery instruction to the server according to the task information comprises:
re-acquiring a second monitoring index of the current task corresponding to the service recovery instruction in the server;
confirming whether the second monitoring index has an abnormal index according to the monitoring threshold;
and if the abnormal index does not exist, sending the service recovery success information to the preset alarm system.
5. The method for alarming server operation status according to claim 1, wherein the first monitoring index includes at least one monitoring parameter, and the confirming an abnormal index in the first monitoring index according to a monitoring threshold includes:
judging whether the monitoring parameter is greater than or less than or equal to the monitoring threshold value;
and if the monitoring parameter is larger than the monitoring threshold, the monitoring parameter is an abnormal index in the first monitoring index.
6. The method for alarming of server operation state according to claim 3, wherein the confirming whether the task information matches the recovery service information according to the task information and the recovery service information further comprises:
and if the task information is not matched with the recovery service information, sending a re-input instruction to the preset alarm system.
7. An apparatus for alarming operation status of a server, comprising:
the index acquisition module is used for acquiring at least one first monitoring index of the current service of the server;
the abnormality confirmation module is used for confirming an abnormal index in the first monitoring index according to a monitoring threshold value;
the alarm sending module is used for sending the alarm instruction to a preset alarm system according to the abnormal index;
the task receiving module is used for receiving task information input by an administrator based on the alarm instruction;
and the service recovery module is used for sending a corresponding service recovery instruction to the server according to the task information, wherein the service recovery instruction is used for recovering or restarting the current service corresponding to the abnormal index.
8. The apparatus for warning about server operation status according to claim 7, further comprising:
the alarm configuration module is used for generating an alarm configuration file according to the monitoring threshold and a preset monitoring frame code;
the monitoring association module is used for associating the alarm configuration file with a current service list of the server so as to monitor all services of the current service list;
the list creating module is used for creating a recovery service instruction list according to a preset recovery service instruction and storing the recovery service instruction list to a preset database, wherein the recovery service instruction comprises recovery service information.
9. A server, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of alerting of operational status of a server of any of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of warning of an operational state of a server according to any one of claims 1 to 6.
CN202010250178.7A 2020-04-01 2020-04-01 Server operation state warning method and device, server and storage medium Pending CN111459770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010250178.7A CN111459770A (en) 2020-04-01 2020-04-01 Server operation state warning method and device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010250178.7A CN111459770A (en) 2020-04-01 2020-04-01 Server operation state warning method and device, server and storage medium

Publications (1)

Publication Number Publication Date
CN111459770A true CN111459770A (en) 2020-07-28

Family

ID=71678878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010250178.7A Pending CN111459770A (en) 2020-04-01 2020-04-01 Server operation state warning method and device, server and storage medium

Country Status (1)

Country Link
CN (1) CN111459770A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052111A (en) * 2020-09-08 2020-12-08 中国平安人寿保险股份有限公司 Processing method, device and equipment for server abnormity early warning and storage medium
CN112685256A (en) * 2020-12-30 2021-04-20 上海掌门科技有限公司 Server monitoring method, device and medium
CN113590424A (en) * 2021-07-30 2021-11-02 北京京东振世信息技术有限公司 Fault monitoring method, device, equipment and storage medium
CN113704052A (en) * 2021-07-21 2021-11-26 郑州云海信息技术有限公司 Micro-service architecture operation and maintenance system, method, equipment and medium
CN113806166A (en) * 2021-08-25 2021-12-17 合众人寿保险股份有限公司 Object monitoring method and device, storage medium and electronic equipment
CN114039836A (en) * 2021-11-05 2022-02-11 光大科技有限公司 Fault processing method and device for Exporter collector
CN114168431A (en) * 2022-02-10 2022-03-11 北京金堤科技有限公司 Method and apparatus for automatically monitoring service, electronic device and storage medium
CN115225534A (en) * 2022-07-26 2022-10-21 雷沃工程机械集团有限公司 Method for monitoring running state of monitoring server
CN115437889A (en) * 2022-11-08 2022-12-06 统信软件技术有限公司 Emergency processing method and system and computing equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170123890A1 (en) * 2015-10-29 2017-05-04 Commvault Systems, Inc. Monitoring, diagnosing, and repairing a management database in a data storage management system
CN109474685A (en) * 2018-11-16 2019-03-15 中国银行股份有限公司 Service monitoring method and system under a kind of framework based on micro services
CN109728979A (en) * 2019-03-01 2019-05-07 国网新疆电力有限公司信息通信公司 Automatic warning system and method suitable for information O&M comprehensive supervision platform
US20190179726A1 (en) * 2016-12-08 2019-06-13 Tencent Technology (Shenzhen) Company Limited Monitoring method and apparatus of server, and storage medium
CN110581773A (en) * 2018-06-07 2019-12-17 北京怡合春天科技有限公司 automatic service monitoring and alarm management system
CN110851254A (en) * 2019-11-06 2020-02-28 深圳市伊欧乐科技有限公司 Equipment production method, device, server and storage medium based on microservice

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170123890A1 (en) * 2015-10-29 2017-05-04 Commvault Systems, Inc. Monitoring, diagnosing, and repairing a management database in a data storage management system
US20190179726A1 (en) * 2016-12-08 2019-06-13 Tencent Technology (Shenzhen) Company Limited Monitoring method and apparatus of server, and storage medium
CN110581773A (en) * 2018-06-07 2019-12-17 北京怡合春天科技有限公司 automatic service monitoring and alarm management system
CN109474685A (en) * 2018-11-16 2019-03-15 中国银行股份有限公司 Service monitoring method and system under a kind of framework based on micro services
CN109728979A (en) * 2019-03-01 2019-05-07 国网新疆电力有限公司信息通信公司 Automatic warning system and method suitable for information O&M comprehensive supervision platform
CN110851254A (en) * 2019-11-06 2020-02-28 深圳市伊欧乐科技有限公司 Equipment production method, device, server and storage medium based on microservice

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052111A (en) * 2020-09-08 2020-12-08 中国平安人寿保险股份有限公司 Processing method, device and equipment for server abnormity early warning and storage medium
CN112685256A (en) * 2020-12-30 2021-04-20 上海掌门科技有限公司 Server monitoring method, device and medium
CN113704052B (en) * 2021-07-21 2023-09-22 郑州云海信息技术有限公司 Operation and maintenance system, method, equipment and medium of micro-service architecture
CN113704052A (en) * 2021-07-21 2021-11-26 郑州云海信息技术有限公司 Micro-service architecture operation and maintenance system, method, equipment and medium
CN113590424A (en) * 2021-07-30 2021-11-02 北京京东振世信息技术有限公司 Fault monitoring method, device, equipment and storage medium
CN113590424B (en) * 2021-07-30 2024-05-17 北京京东振世信息技术有限公司 Fault monitoring method, device, equipment and storage medium
CN113806166A (en) * 2021-08-25 2021-12-17 合众人寿保险股份有限公司 Object monitoring method and device, storage medium and electronic equipment
CN114039836A (en) * 2021-11-05 2022-02-11 光大科技有限公司 Fault processing method and device for Exporter collector
CN114168431A (en) * 2022-02-10 2022-03-11 北京金堤科技有限公司 Method and apparatus for automatically monitoring service, electronic device and storage medium
CN114168431B (en) * 2022-02-10 2022-04-15 北京金堤科技有限公司 Method and apparatus for automatically monitoring service, electronic device and storage medium
CN115225534A (en) * 2022-07-26 2022-10-21 雷沃工程机械集团有限公司 Method for monitoring running state of monitoring server
CN115437889A (en) * 2022-11-08 2022-12-06 统信软件技术有限公司 Emergency processing method and system and computing equipment
CN115437889B (en) * 2022-11-08 2023-03-10 统信软件技术有限公司 Emergency processing method, system and computing equipment

Similar Documents

Publication Publication Date Title
CN111459770A (en) Server operation state warning method and device, server and storage medium
EP3386150B1 (en) Terminal failure processing method, device and system
CN106997314B (en) Exception handling method, device and system for distributed system
CN107704360B (en) Monitoring data processing method, equipment, server and storage medium
CN113434327B (en) Fault processing system, method, equipment and storage medium
CN110417586B (en) Service monitoring method, service node, server and computer readable storage medium
US10747529B2 (en) Version management system and version management method
CN107797887B (en) Data backup and recovery method and device, storage medium and electronic equipment
CN112527484A (en) Workflow breakpoint continuous running method and device, computer equipment and readable storage medium
CN112764956A (en) Database exception handling system, and database exception handling method and device
WO2018202440A1 (en) Data transmission method and apparatus
JP7047621B2 (en) Operation device and operation method
CN107872363B (en) Data packet loss processing method and system, readable storage medium and electronic device
CN114172785A (en) Alarm information processing method, device, equipment and storage medium
CN113434323A (en) Task flow control method of data center station and related device
CN111190761B (en) Log output method and device, storage medium and electronic equipment
CN110930110B (en) Distributed flow monitoring method and device, storage medium and electronic equipment
CN111416857A (en) Client crash processing method, device, system, equipment and storage medium
CN111367934A (en) Data consistency checking method, device, server and medium
CN111162938A (en) Data processing system and method
CN114567536B (en) Abnormal data processing method, device, electronic equipment and storage medium
CN111147542A (en) Secret-free access setting method, device, equipment and medium
CN113179180A (en) Basalt client disaster fault repairing method, basalt client disaster fault repairing device and basalt client disaster storage medium
CN113014675A (en) Data processing method and device, electronic equipment and storage medium
CN107066366B (en) Complex event processing engine state monitoring and disaster recovery method for Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200728

RJ01 Rejection of invention patent application after publication