CN111682976B - Method for ensuring distributed multi-machine communication monitoring - Google Patents

Method for ensuring distributed multi-machine communication monitoring Download PDF

Info

Publication number
CN111682976B
CN111682976B CN202010339696.6A CN202010339696A CN111682976B CN 111682976 B CN111682976 B CN 111682976B CN 202010339696 A CN202010339696 A CN 202010339696A CN 111682976 B CN111682976 B CN 111682976B
Authority
CN
China
Prior art keywords
server
communication
servers
variable
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010339696.6A
Other languages
Chinese (zh)
Other versions
CN111682976A (en
Inventor
朱之凯
刘海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Zhongke Leinao Intelligent Technology Co ltd
Original Assignee
Hefei Zhongke Leinao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Zhongke Leinao Intelligent Technology Co ltd filed Critical Hefei Zhongke Leinao Intelligent Technology Co ltd
Priority to CN202010339696.6A priority Critical patent/CN111682976B/en
Publication of CN111682976A publication Critical patent/CN111682976A/en
Application granted granted Critical
Publication of CN111682976B publication Critical patent/CN111682976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/14Arrangements for monitoring or testing data switching networks using software, i.e. software packages

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a method for ensuring distributed multi-machine communication monitoring, which comprises the following steps that firstly, communication detection codes are deployed on each server in a distributed task system; secondly, embedding a software package of Prometheus Exporters in the communication detection codes, and reading communication variables calculated by the communication detection codes in each server through the Exporters; then, the Exporters sends the acquired communication variables to a Prometeus Server; and finally, based on the communication variable, the Prometheus Server judges whether the communication between the servers is normal. The method is more efficient in monitoring the communication condition of each server in the multitask distribution system.

Description

Method for ensuring distributed multi-machine communication monitoring
Technical Field
The invention belongs to the field of communication, and particularly relates to a method for ensuring distributed multi-machine communication monitoring.
Background
When a user submits an application for a distributed task, the task is scheduled to a different server. The distributed task has a high requirement on communication between the servers, and at least TCP (Transmission Control Protocol) communication between the servers is ensured. However, there are problems with communication between servers, and when this occurs, the distributed tasks will be in error, thereby affecting the user's service experience. The existing solution is to perform communication test on each server in the distributed task system, and record the communication test in a related file, but the maintenance cost of the scheme is high, and the management is inconvenient.
In addition, Prometheus (promimieus): the open source monitoring system is developed by using Go language, and mainly comprises a Prometheus Server (monitoring Server), a Client Library (Client Library), Exporters (data acquisition program), a Push Gateway (Push Gateway), an alert manager (alarm management), a graphical interface and the like, wherein the Prometheus roughly has the working flow:
(1) the Promultimedia Server periodically pulls metrics (indexes) from configured Exporters or Client Library, receives metrics sent by Push Gateway, or pulls metrics from other ways.
(2) The Prometheus Server runs the set alert rules after locally storing the collected metrics, and pushes the alert to the alert manager.
(3) And the alert manager processes the received alarm according to the configuration file of the alert manager and sends an alert notice such as an email, a short message and the like.
But Prometous monitors parameters such as CPU occupancy rate, GPU occupancy rate and information of a docker container of each server, and does not relate to the communication condition of the servers.
Therefore, how to design a system capable of monitoring the communication condition of the server in the platform in real time becomes an urgent technical problem to be solved.
Disclosure of Invention
In view of the foregoing problems, an object of the present invention is to provide a method for ensuring distributed multi-machine communication monitoring, which is more efficient in monitoring communication status of each server in a multitask distribution system.
The invention aims to provide a method for ensuring distributed multi-machine communication monitoring, which comprises the following steps,
deploying communication detection codes on each server in the distributed task system;
embedding a software package of Prometheus Exporters in the communication detection codes, and reading communication variables calculated by the communication detection codes in each server through the Exporters;
the Exporters sends the acquired communication variables to a Prometheus Server;
and based on the communication variable, the Prometheus Server judges whether the communication between the servers is normal.
Further, the method may further comprise,
each server communicates through TCP, and in the communication process, any one of the servers can send or feed back txt files with own IP address names to other servers and receive txt files with own IP address names fed back or sent by other servers.
Further, the method includes setting a first communication variable to determine whether communication between the two servers is abnormal, wherein,
if the current server receives a txt file with an IP address name of the current server fed back by another server, TCP communication between the two servers is normal, and a first communication variable value is 0;
if the current server does not receive the txt file with the IP address name fed back by the other server, TCP communication between the two servers is abnormal, and the value of the first communication variable is 1.
Further, the method further comprises setting a metric variable to determine whether the current server is abnormal when the TCP communication between the two servers is abnormal, wherein,
if the current server does not receive the txt file with the own IP address name fed back by all servers in other servers, the current server is abnormal, the value of the metric variable is 1, and otherwise, the value of the metric variable is 0.
Further, before the communication variables calculated by reading the communication detection codes in the servers through Exporters, the communication detection codes in the servers monitor the TCP communication with other servers in real time; wherein the content of the first and second substances,
the current server sends txt files with own IP address names to other servers;
if the current server can receive the txt file with the own IP address name fed back by one of the other servers, the TCP communication verification of the current server and the server feeding back the txt file with the own IP address name is successful,
calculating a first communication variable between the current server and the server which feeds back the txt file with the IP address name of the current server by using the communication detection code in the current server to be 0;
if the current server does not receive the txt file with the own IP address name fed back by one of the other servers, the TCP communication check between the current server and the server which does not feed back the txt file with the own IP address name fails,
the communication detection code in the current server calculates that the first communication variable between the current server and the server which does not feed back the txt file with the own IP address name is 1.
Further, the real-time monitoring of the TCP communication between the communication detection code in each server and each other server further comprises,
checking the calculated first communication variable by a communication detection code in the current server;
if the first communication variable is 1, the communication detection code in the current server continuously traverses whether a server which does not feed back the txt file with the own IP address name exists or not, wherein,
if all servers in other servers do not feed back txt files with own IP address names, calculating a metric variable in the current server to be 1 by the communication detection codes in the current server;
if one or more servers feed back txt files with own IP address names in other servers, calculating a metric variable in the current server to be 0 by the communication detection code in the current server;
if the first communication variables are all 0, the communication detection code in the current server calculates that the metric variable in the current server is 0.
Further, the Prometheus Server judges whether the communication between the servers normally includes based on the communication variables,
the Prometheus Server obtains the metric variable calculated by the communication detection code in all servers, wherein,
if the value of the metric variable of the server is 1, judging that the server is an abnormal server;
and the Prometheus Server pushes the alarm of the abnormal Server and/or stores the alarm into an abnormal database so as to isolate the abnormal Server.
Further, the method may further comprise,
each server acquires IP addresses of all other servers in the distributed task system;
each server checks the received txt file at a predetermined time for real-time monitoring of TCP communication with other respective servers by the communication detection code.
Further, the variable name of the first communication variable includes names of two servers.
The invention has the technical effects that: the method for ensuring the distributed multi-machine communication monitoring deploys the communication detection codes on each server in the distributed task system, and calculates the first communication variable and the metric variable in the TCP communication process of each server, so that the communication condition between the current server and other servers can be quickly obtained, the detection efficiency of communication faults is improved, and the method has instantaneity and rapidity. In addition, a software package of Prometeus Exporters is introduced into each Server in the multitask distribution system, and the Exporters and Prometeus servers are connected and interacted, so that the communication condition of each Server in the multitask distribution system is monitored more efficiently.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below to the drawings required for the embodiments or the technical solutions in the prior art, and for those skilled in the art, other drawings can be obtained according to the drawings of the present invention without any creative effort.
Fig. 1 shows a flow diagram of a method of ensuring distributed multi-machine communication monitoring according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the embodiment of the present invention discloses a method for ensuring distributed multi-machine communication monitoring, the method includes, first, deploying communication detection codes on each server in a distributed task system; secondly, embedding a software package of Prometheus Exporters in the communication detection codes, and reading communication variables calculated by the communication detection codes in each server through the Exporters; then, the Exporters sends the acquired communication variables to a Prometeus Server; and finally, based on the communication variable, the Prometheus Server judges whether the communication between the servers is normal. Introducing software packages of Prometheus Exporters into each Server in the multitask distribution system, wherein the Exporters and Prometheus servers are connected and interacted, and therefore the communication condition of each Server in the multitask distribution system is monitored more efficiently.
The method comprises the steps that communication detection codes are deployed on all servers in a distributed task system, then all servers communicate through TCP, and in the process of TCP communication of all the servers, any server can send or feed back txt files with own IP address names to other servers and receive txt files with own IP address names fed back or sent by other servers. Further, a server periodically sends a txt file (named by its IP address) to each of the other servers through TCP communication, and then after the txt file is obtained by the other servers, a txt file (named by the current server IP address) is returned to the server through TCP communication. The periodic interval period may be 8 hours, so that the servers can check communication between the servers through txt files transmitted to each other.
In this embodiment, the method further includes setting a first communication variable to determine whether communication between the two servers is abnormal, and a variable name of the first communication variable includes names of the two servers performing communication, wherein,
if the current server receives a txt file with an IP address name of the current server fed back by another server, TCP communication between the two servers is normal, and a first communication variable value is 0;
if the current server does not receive the txt file with the IP address name fed back by the other server, TCP communication between the two servers is abnormal, and the value of the first communication variable is 1.
If the first communication variable in the current Server has a value of 1, the Prometous Server needs to continuously judge whether the current Server is abnormal or not, so that a metric variable needs to be set, and whether the current Server is abnormal or not is judged when TCP communication between the current Server and other servers is abnormal, wherein if the current Server does not receive a txt file with an IP address name of the current Server, which is fed back by any other Server, the current Server is abnormal, namely the current Server is abnormal, the value of the metric variable is 1, and otherwise, the value of the metric variable is 0.
Illustratively, taking an example that one server in 5 servers in the distributed task system performs TCP communication with other servers as an example, the 5 servers are respectively an a server, a B server, a C server, a D server and an E server, wherein the a server first downloads IP addresses of the four B-E servers, and then a communication detection code in the a server monitors TCP communication with the four B-E servers in real time, specifically, the a server sends a txt file with its own IP address name to the four B-E servers, wherein if the a server can receive the txt file with its own IP address name fed back by one server in the four B-E servers, the TCP communication between the a server and the server fed back with the txt file with its own IP address name is successfully verified, the communication detection code in the server a calculates that the first communication variable between the server feeding back the txt file with its own IP address name is 0. For example, the server a receives the txt file with its own IP address name fed back by the server B, which indicates that the TCP communication between the server a and the server B is normal, and the server a calculates the first communication variable with the server B as 0 through the communication detection code. The server a and the server C-E also perform the same operations. And will not be described in detail herein.
If the server A does not receive the txt file with the own IP address name fed back by one server of the four servers B-E, the TCP communication check between the server A and the server which does not feed back the txt file with the own IP address name fails, and the first communication variable between the server A and the server which does not feed back the txt file with the own IP address name is calculated by the communication detection code in the server A and is 1. For example, if the txt file with its own IP address name, which is fed back by the C server, is not received by the a server, indicating that TCP communication between the two is abnormal, the a server calculates, through the communication detection code, that the first communication variable with the C server is 1.
Further, the communication detection code in the server a checks the calculated first communication variable; wherein, if the first communication variable is 1, if the first communication variable between the A server and the C server is 1, the communication detection code in the A server continuously traverses whether a server which does not feed back the txt file with the own IP address name exists or not, wherein,
if B, D, E no txt file with its own IP address name is fed back by any of the three servers, calculating by the communication detection code in the server A to obtain a metric variable 1 in the server A;
if one or more servers in B, D, E have fed back txt files with own IP address names, the communication detection code in the A server calculates that the metric variable in the A server is 0;
preferably, the first communication variables stored in the a server are all 0, and the communication detection code in the a server calculates that the metric variable in the a server is 0. Further preferably, the communication detection code in the a server directly performs the calculation of the metric variable if the calculation results in the first communication variable being 1. If the first communication variables acquired in the TCP communication process are all 0, after the traversal is finished, the communication detection code in the server A calculates that the metric variable in the server A is 0.
The communication detection codes are deployed on each server in the distributed task system, and the first communication variable and the metric variable are calculated in the TCP communication process of each server, so that the communication condition between the current server and other servers can be quickly acquired, the detection efficiency of communication faults is improved, and the method has real-time performance and rapidity.
In this embodiment, the method further includes that each server acquires IP addresses of all other servers in the download distributed task system; each server checks the received txt file at a predetermined time for comparison with the IP addresses of all other servers. The predetermined time may be 30s (seconds), but is not limited to 30s, such as 20s, 1min (minutes), and the like, are suitable for the present invention.
Specifically, the transmission of the metric variable can be realized by using a Prometheus Client Library in an Exporter software package, specifically, the Prometheus Client Library of a Python (cross-platform computer programming language) package is used, then a Label is used to package information of the server, and then an http (network protocol) port 8000 is opened to wait for monitoring of Prometheus.
Further, the Prometheus Server acquires the metric variables of all servers, wherein if the value of the metric variable of the Server is 1, the Server is judged to be an abnormal Server; and the Prometheus Server pushes the alarm of the abnormal Server and/or stores the alarm into an abnormal database so as to isolate the abnormal Server. Specifically, the Prometheus Server adds the abnormal servers into the alarm rules of AlertManager, then sends the abnormal servers to an operation and maintenance manager in an email mode, and meanwhile adds the abnormal servers into a database of the distributed task system, wherein the database is specially used for storing the abnormal servers. If the server information is stored in the database, the tasks of the user cannot be dispatched to the server, and isolation of the abnormal server is achieved. Further, after receiving the alarm information, the operation and maintenance administrator determines that the abnormal server has been added to the database, then analyzes and repairs the abnormal server, restarts or reinstalls the abnormal server if necessary until the communication is successful, and then removes the abnormal server from the database and puts the abnormal server back into the distributed task system.
In this embodiment, a software package of proteamers is introduced into each Server in the multitask distribution system, the proteamers and the promemeus Server perform connection interaction, and an alarm rule and the like under a promemeus framework are combined, so that the monitoring of the communication condition of each Server in the multitask distribution system is more efficient, and the processing efficiency of a user on the communication abnormality is further improved.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A method for ensuring distributed multi-machine communication monitoring, the method comprising,
deploying communication detection codes on each server in the distributed task system;
embedding a software package of Prometheus Exporters in the communication detection codes, and reading communication variables calculated by the communication detection codes in each server through the Exporters;
the Exporters sends the acquired communication variables to a Prometheus Server;
based on the communication variables, the Prometous Server judges whether the communication between the servers is normal or not, wherein the communication between the two servers is abnormal or not by setting a first communication variable, wherein if the current Server receives a txt file with an IP address name of the current Server fed back by the other Server, the TCP communication between the two servers is normal, and the value of the first communication variable is 0; if the current server does not receive the txt file with the own IP address name fed back by the other server, the TCP communication between the two servers is abnormal, the value of a first communication variable is 1, wherein the value of a metric variable is set to judge whether the current server is abnormal when the TCP communication between the two servers is abnormal, if the current server does not receive the txt file with the own IP address name fed back by all servers in the other servers, the value of the metric variable is 1, and otherwise, the value of the metric variable is 0.
2. The method for ensuring distributed multi-machine communication monitoring of claim 1, further comprising,
each server communicates through TCP, and in the communication process, any one of the servers can send or feed back txt files with own IP address names to other servers and receive txt files with own IP address names fed back or sent by other servers.
3. The method for guaranteeing distributed multi-machine communication monitoring according to claim 1, wherein before the communication variables calculated by reading the communication detection codes in each server by Exporters, the method further comprises monitoring the TCP communication between the communication detection codes in each server and each other server in real time; wherein the content of the first and second substances,
the current server sends txt files with own IP address names to other servers;
if the current server can receive the txt file with the own IP address name fed back by one of the other servers, the TCP communication verification of the current server and the server feeding back the txt file with the own IP address name is successful,
calculating a first communication variable between the current server and the server which feeds back the txt file with the IP address name of the current server by using the communication detection code in the current server to be 0;
if the current server does not receive the txt file with the own IP address name fed back by one of the other servers, the TCP communication check between the current server and the server which does not feed back the txt file with the own IP address name fails,
the communication detection code in the current server calculates that the first communication variable between the current server and the server which does not feed back the txt file with the own IP address name is 1.
4. The method for guaranteeing distributed multi-machine communication monitoring as recited in claim 3, wherein the real-time monitoring of TCP communication with other servers by the communication detection code in each server further comprises,
checking the calculated first communication variable by a communication detection code in the current server;
if the first communication variable is 1, the communication detection code in the current server continuously traverses whether a server which does not feed back the txt file with the own IP address name exists or not, wherein,
if all servers in other servers do not feed back txt files with own IP address names, calculating a metric variable in the current server to be 1 by the communication detection codes in the current server;
if one or more servers feed back txt files with own IP address names in other servers, calculating a metric variable in the current server to be 0 by the communication detection code in the current server;
if the first communication variables are all 0, the communication detection code in the current server calculates that the metric variable in the current server is 0.
5. The method of claim 4, wherein the Prometheus Server determines whether communications between servers are normal based on the communication variables comprises,
the Prometheus Server obtains the metric variable calculated by the communication detection code in all servers, wherein,
if the value of the metric variable of the server is 1, judging that the server is an abnormal server;
and the Prometheus Server pushes the alarm of the abnormal Server and/or stores the alarm into an abnormal database so as to isolate the abnormal Server.
6. The method for ensuring distributed multi-machine communication monitoring according to any of claims 1-5, further comprising,
each server acquires IP addresses of all other servers in the distributed task system;
each server checks the received txt file at a predetermined time for real-time monitoring of TCP communication with other respective servers by the communication detection code.
7. The method for ensuring distributed multi-machine communication monitoring of claim 1, wherein the variable name of the first communication variable comprises the names of two servers.
CN202010339696.6A 2020-04-26 2020-04-26 Method for ensuring distributed multi-machine communication monitoring Active CN111682976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010339696.6A CN111682976B (en) 2020-04-26 2020-04-26 Method for ensuring distributed multi-machine communication monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010339696.6A CN111682976B (en) 2020-04-26 2020-04-26 Method for ensuring distributed multi-machine communication monitoring

Publications (2)

Publication Number Publication Date
CN111682976A CN111682976A (en) 2020-09-18
CN111682976B true CN111682976B (en) 2022-03-01

Family

ID=72452606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010339696.6A Active CN111682976B (en) 2020-04-26 2020-04-26 Method for ensuring distributed multi-machine communication monitoring

Country Status (1)

Country Link
CN (1) CN111682976B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112202895B (en) * 2020-09-30 2022-07-08 北京达佳互联信息技术有限公司 Method and system for collecting monitoring index data, electronic equipment and storage medium
CN113570476A (en) * 2021-07-26 2021-10-29 广东电网有限责任公司 Container service monitoring method of power grid monitoring system based on custom alarm rule

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006171895A (en) * 2004-12-13 2006-06-29 Mitsubishi Electric Corp Surveillance control system
CN108092850A (en) * 2017-12-12 2018-05-29 郑州云海信息技术有限公司 A kind of cluster server method for diagnosing faults and system based on heartbeat mechanism
CN110515702A (en) * 2019-08-29 2019-11-29 浪潮云信息技术有限公司 A kind of automatic evacuation method and device of calculate node fault virtual machine
CN110874291A (en) * 2019-10-31 2020-03-10 合肥中科类脑智能技术有限公司 Real-time detection method for abnormal container

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104270268B (en) * 2014-09-28 2017-12-05 曙光信息产业股份有限公司 A kind of distributed system network performance evaluation and method for diagnosing faults
CN109697153A (en) * 2018-12-28 2019-04-30 浙江省公众信息产业有限公司 Monitoring method, monitoring system and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006171895A (en) * 2004-12-13 2006-06-29 Mitsubishi Electric Corp Surveillance control system
CN108092850A (en) * 2017-12-12 2018-05-29 郑州云海信息技术有限公司 A kind of cluster server method for diagnosing faults and system based on heartbeat mechanism
CN110515702A (en) * 2019-08-29 2019-11-29 浪潮云信息技术有限公司 A kind of automatic evacuation method and device of calculate node fault virtual machine
CN110874291A (en) * 2019-10-31 2020-03-10 合肥中科类脑智能技术有限公司 Real-time detection method for abnormal container

Also Published As

Publication number Publication date
CN111682976A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
CN106844137B (en) Server monitoring method and device
CN108600029B (en) Configuration file updating method and device, terminal equipment and storage medium
CN111682976B (en) Method for ensuring distributed multi-machine communication monitoring
US20020107958A1 (en) Method of and apparatus for notification of state changes in a monitored system
US20020194319A1 (en) Automated operations and service monitoring system for distributed computer networks
GB2418755A (en) Error handling using a structured state tear down
CN107241229B (en) Service monitoring method and device based on interface testing tool
CN107992392B (en) Automatic monitoring and repairing system and method for cloud rendering system
US8060919B2 (en) Automated password tool and method of use
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
CN112527484A (en) Workflow breakpoint continuous running method and device, computer equipment and readable storage medium
CN109273045B (en) Storage device online detection method, device, equipment and readable storage medium
JP3872412B2 (en) Integrated service management system and method
JP2003233512A (en) Client monitoring system with maintenance function, monitoring server, program, and client monitoring/ maintaining method
CN110875832B (en) Abnormal service monitoring method, device and system and computer readable storage medium
CN111082964B (en) Distribution method and device of configuration information
CN113760634A (en) Data processing method and device
WO2010010393A1 (en) Monitoring of backup activity on a computer system
CN113242147B (en) Automatic operation and maintenance deployment method, device, equipment and storage medium of multi-cloud environment
CN115529301A (en) Firmware upgrading method based on cloud edge cooperation, server side and edge gateway side
CN112260903B (en) Link monitoring method and device
CN112835780B (en) Service detection method and device
CN108021407B (en) Service processing method and device based on network equipment
CN111447329A (en) Method, system, device and medium for monitoring state server in call center
CN113179180A (en) Basalt client disaster fault repairing method, basalt client disaster fault repairing device and basalt client disaster storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant