CN112272126A

CN112272126A - Failure monitoring method for business application, computer equipment and storage medium

Info

Publication number: CN112272126A
Application number: CN202011156047.9A
Authority: CN
Inventors: 李朝霞; 游思佳; 康楠; 沈可; 王本忠; 邢鑫
Original assignee: China United Network Communications Group Co Ltd; Unicom Cloud Data Co Ltd
Current assignee: China United Network Communications Group Co Ltd; Unicom Cloud Data Co Ltd
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2021-01-26

Abstract

The application provides a failure monitoring method for business application, computer equipment and a storage medium. The method comprises the following steps: the monitoring server acquires the state of the target service application from the first log analysis system; if the state of the target service application is failure, the monitoring server acquires the running state of the cloud host when the target service application is failed from the second log analysis system; if the running state of the cloud host is normal, the monitoring server determines that the reason of the state failure of the target business application is that the target business application is abnormal; and if the running state of the cloud host is abnormal, the monitoring server determines that the reason of the state failure of the target service application is abnormal of the cloud host. The method can solve the problem that the prior art cannot solve the reason of the state failure of the service application on the cloud computing.

Description

Failure monitoring method for business application, computer equipment and storage medium

Technical Field

The present application relates to application service technologies on cloud computing services, and in particular, to a failure monitoring method for business applications, a computer device, and a storage medium.

Background

Cloud Computing (Cloud Computing) is one of distributed Computing, and the core of Cloud Computing is internet-centered, and fast and secure Cloud Computing services and data storage are provided on websites, so that every person using the internet can use huge Computing resources and data centers on the internet. Currently, cloud computing service development is very mature, and many organizations, enterprises, careers and the like gradually deploy business applications of themselves on cloud computing.

However, the development of cloud computing still faces many critical problems, for example, when the state of a business application on cloud computing fails, it is impossible to distinguish that the business state fails due to cloud computing infrastructure service, or the memory of the business application overflows, or the business state fails due to the fact that a server where the business application is located is tamped. Therefore, the method is not beneficial to rapidly recovering the state of the business application according to the state failure reason, and is also not beneficial to subsequent cloud computing or the framework and optimization of the business application.

Therefore, how to determine the reason for the failure of the state of the business application on the cloud computing is still a problem to be solved.

Disclosure of Invention

The application provides a failure monitoring method of business application, computer equipment and a storage medium, which are used for solving the problem that the reason of the state failure of the business application on cloud computing cannot be determined in the prior art.

A failure monitoring method of business application is applied to a cloud service system, the cloud service system comprises a monitoring server, a first log analysis system, a second log analysis system and a cloud host, and the failure monitoring method comprises the following steps:

the monitoring server acquires the state of the target service application from the first log analysis system;

if the state of the target service application is failure, the monitoring server acquires the running state of the cloud host when the target service application is failed from the second log analysis system;

if the running state of the cloud host is normal, the monitoring server determines that the reason of the state failure of the target business application is that the target business application is abnormal;

and if the running state of the cloud host is abnormal, the monitoring server determines that the reason of the state failure of the target service application is abnormal of the cloud host.

In one embodiment, the cloud service system further includes a business application server and a plurality of business application probe servers, and the method further includes:

the business application detection server acquires detection data of the process of the target business application from the business application server;

the service application detection server obtains the state of the process of the target service application according to the detection data of the process of the target service application;

the business application detection server sends the state of the process of the target business application to the first log analysis system;

and the first log analysis system determines the state of the target business application according to the state of the process of the target business application.

In one embodiment, before the cloud service system further includes an information configuration server, and the business application probe server obtains probe data of a process of the target business application from the business application server, the method further includes:

the information configuration server acquires detection configuration information of the target business application, wherein the detection configuration information comprises an application name, service interface information, an application access path and monitoring frequency of the target business application;

the information configuration server generates a service application detection instruction according to the detection configuration information and sends the service application detection instruction to the service application detection server; the service application detection instruction is used for instructing the service application detection server to detect the process of the target service application on the service application server.

In one embodiment, the method further comprises:

the information configuration server acquires the process state of the target business application from the first log analysis system and generates a process state analysis chart of the target business application according to the process state of the target business application; the process state analysis graph is used for displaying the state of each process in the target business application.

In one embodiment, the cloud service system further includes an alarm server, and the method further includes:

the alarm server acquires the identifier of a service application detection server corresponding to the process with the invalid state in the target service application from the first log analysis system;

the alarm server generates server abnormal information according to the identification of the business application detection server corresponding to the process with the state failure;

and the alarm server sends the server abnormal information to the service application server.

In one embodiment, after the alarm server obtains, from the first log analysis system, an identifier of a business application detection server corresponding to the process with the failed state in the target business application, the method further includes:

the restart server generates a restart command of the target service application according to the server identifier corresponding to the process with the failed state;

and the restarting server sends the restarting command to the service application server.

In one embodiment, before the monitoring server obtains the operating state of the cloud host when the target business application fails from the second log analysis system, the method further includes:

the cloud host detection server acquires detection data of the cloud host from the cloud host;

the cloud host detection server obtains the running state of the cloud host according to the detection data of the cloud host;

and the cloud host detection server sends the running state of the cloud host to the second log analysis system.

In one embodiment, the method further comprises:

the information configuration server receives a cloud host detection instruction and sends the cloud host detection instruction to the cloud host detection server, wherein the cloud host detection instruction is used for indicating the cloud host detection server to detect the operation state of the cloud host.

On the other hand, the application also provides a cloud service system which comprises a monitoring server, a first log analysis system, a second log analysis system and a cloud host;

the monitoring server is used for acquiring the state of a target service application from the first log analysis system, and acquiring the running state of the cloud host when the target service application fails from the second log analysis system if the state of the target service application is failure;

the monitoring server is further configured to determine that the reason for the state failure of the target service application is that the target service application is abnormal if the operation state of the cloud host is normal, and determine that the reason for the state failure of the target service application is that the cloud host is abnormal if the operation state of the cloud host is abnormal.

In one embodiment, the method further comprises the following steps: the system comprises a service application server and a plurality of service application detection servers;

the service application detection server is used for acquiring detection data of the process of the target service application from the service application server, obtaining the state of the process of the target service application according to the detection data of the process of the target service application, and sending the state of the process of the target service application to the first log analysis system;

and the first log analysis system is used for determining the state of the target business application according to the state of the process of the target business application.

In one embodiment, the method further comprises the following steps: an information configuration server;

the information configuration server is used for acquiring detection configuration information of the target business application, wherein the detection configuration information comprises an application name, service interface information, an application access path and monitoring frequency of the target business application, generating a business application detection instruction according to the detection configuration information, and sending the business application detection instruction to the business application detection server; the service application detection instruction is used for instructing the service application detection server to detect the process of the target service application on the service application server.

In one embodiment, the information configuration server is further configured to obtain a process state of the target service application from the first log analysis system, and generate a process state analysis graph of the target service application according to the process state of the target service application; the process state analysis graph is used for displaying the state of each process in the target business application.

In one embodiment, the method further comprises the following steps: an alarm server;

the alarm server is used for acquiring the identification of the business application detection server corresponding to the process with the invalid state in the target business application from the first log analysis system, generating server abnormal information according to the identification of the business application detection server corresponding to the process with the invalid state, and sending the server abnormal information to the business application server.

In one embodiment, the method further comprises the following steps: restarting the server;

and the restart server is used for generating a restart command of the target service application according to the server identifier corresponding to the process with the failed state and sending the restart command to the service application server.

In one embodiment, the method further comprises the following steps: a cloud host detection server;

the cloud host detection server is used for acquiring detection data of the cloud host from the cloud host, obtaining the running state of the cloud host according to the detection data of the cloud host, and sending the running state of the cloud host to the second log analysis system.

In one embodiment, the information configuration server is further configured to receive a cloud host detection instruction, and send the cloud host detection instruction to the cloud host detection server, where the cloud host detection instruction is used to instruct the cloud host detection server to detect an operating state of the cloud host.

The application provides a failure monitoring method for a business application, which obtains the running state of a cloud host when the target business application state is failed, and if the running state of the cloud host is normal when the target business application is failed, the failure caused by the abnormal running of the target business application can be determined. And if the running state of the cloud host is abnormal when the target business application fails, determining that the reason for the failure of the target business application is the failure of the application state caused by the abnormal running of the cloud host. Therefore, the method provided by the application can effectively distinguish the reason of the service application failure so as to quickly recover the state of the service application according to the reason of the service failure, and is beneficial to subsequent cloud computing or the architecture and optimization of the service application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic diagram of a cloud service system according to an embodiment of the present application.

Fig. 2 is a schematic flowchart of a failure monitoring method for a business application according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a cloud service system according to another embodiment of the present application.

Fig. 4 is a flowchart illustrating a failure monitoring method for a business application according to another embodiment of the present application.

Fig. 5 is a schematic diagram of a cloud service system according to another embodiment of the present application.

Fig. 6 is a flowchart illustrating a failure monitoring method for a business application according to another embodiment of the present application.

Fig. 7 is a flowchart illustrating a failure monitoring method for a business application according to another embodiment of the present application.

Fig. 8 is a schematic diagram of a process state analysis diagram according to an embodiment of the present application.

Fig. 9 is a schematic diagram of a cloud service system according to another embodiment of the present application.

Fig. 10 is a flowchart illustrating a failure monitoring method for a business application according to another embodiment of the present application.

Fig. 11 is a schematic diagram of a cloud service system provided in the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Cloud Computing (Cloud Computing) is one of distributed Computing, and the core of Cloud Computing is internet-centered, and fast and secure Cloud Computing services and data storage are provided on websites, so that every person using the internet can use huge Computing resources and data centers on the internet. At present, cloud computing service development is already mature, a plurality of organizations, enterprises, careers and other organizations gradually deploy own business applications on cloud computing, and take advantage of infinite computing, storage, network capacity, elastic expansion, mass storage, real-time business online and the like brought to various industries by embracing the cloud computing industry, so as to bring a new business experience to various customers using cloud computing. However, the development of cloud computing still faces many critical issues, such as the effectiveness of business applications on cloud computing, which are currently important issues. When the state of the business application on the cloud computing fails, the prior art cannot distinguish that the business state fails due to the cloud computing infrastructure service, or the business state fails due to the memory overflow of the business application, or the business state fails due to the ramming of a server where the business application is located. Therefore, the method is not only not beneficial to rapidly recovering the state of the business application according to the state failure reason, but also not beneficial to subsequent cloud computing or the architecture and optimization of the business application, and sometimes can cause no evidence dispute between a cloud computing service provider and a software business application provider. At present, the VMVARE company develops and monitors the health status of each middleware by purchasing the license of each middleware, for example, develops and detects the status of mysql middleware and develops the main status of tomcat, but the current middleware industry develops particularly fast, the speed of interface development cannot keep pace with the speed of middleware newly increasing, and some middleware developers are not willing to develop interfaces, which causes that the business status monitored by the VMVARE company is only a part of business application and cannot achieve the comprehensive monitoring of business application. Therefore, how to determine the reason for the failure of the state of the business application on the cloud computing is still a problem to be solved.

Based on the above problems, the present application provides a failure monitoring method for a service application, which determines whether a cause of failure of a target service application is a target service application abnormality or a cloud host abnormality by respectively monitoring a state of the target service application and a state of the cloud host when the target service application fails, so as to solve a problem that a cause of failure of a state of the service application on cloud computing cannot be determined in the prior art.

The failure monitoring method for business applications is applied to a cloud service system 10, and as shown in fig. 1, the cloud service system 10 includes a monitoring server 11, a first log analysis system 12, a second log analysis system 13, and a cloud host 14. The first log analysis system 12 is used for recording the state of the target business application, and the second log analysis system is used for recording the state of the cloud host. The monitoring server 11 is configured to determine a status of the target business application and a status of the cloud host 14 when the target business application fails, respectively. By linkage analysis of the service log state of the service application and the process state information of the cloud host, the problem of the cloud service can be analyzed together, valuable analysis data are provided for the service application and the cloud host infrastructure, and the optimization process of the service application and the cloud host infrastructure can be promoted.

It is understood that the monitoring server 11, the first log analysis system 12, the second log analysis system 13, and the cloud host 14 may run on separate physical devices, or may be integrated on a plurality of physical devices. The integration is performed on a plurality of physical devices, for example, the monitoring server 11 is deployed on one physical server, the first log analysis system 12 and the second log analysis system 13 are deployed on another physical server, the number of the cloud hosts 14 may be multiple, and the cloud host 14 is typically deployed on a user device, for example, the cloud host 14 is deployed on a personal computer of a user. The monitoring server 11, the first log analysis system 12 and the second log analysis system 13 can also be deployed on one physical server, and the cloud host 14 is deployed on a user device.

Referring to fig. 1 and fig. 2, the failure monitoring method for the business application shown in fig. 2 may be applied to the cloud service system 10 shown in fig. 1, and the failure monitoring method includes:

s201, the monitoring server acquires the state of the target business application from the first log analysis system.

In an alternative embodiment, the status of the target service application recorded by the first log analysis system 12 is not a continuous status of the target service application, but is the status of the target service application acquired at a preset heartbeat frequency. The predetermined heartbeat frequency is, for example, 30 seconds, and the first log analysis system 12 records the status of the target service application every 30 seconds. The preset heartbeat frequency can be set by a worker according to actual needs.

S202, if the state of the target service application is failure, the monitoring server obtains the running state of the cloud host when the target service application is failure from the second log analysis system.

The monitoring server 11 may determine whether the state of the target service application is invalid, if the state of the target service application is invalid, the monitoring server 11 continues to determine the operation state of the cloud host 14, and if the state of the target service application is normal, the monitoring server 11 does not continue to determine the operation state of the cloud host 14. Similar to the first log analysis system 12, in an alternative embodiment, the state of the cloud host 14 recorded by the second log analysis system 13 is not a continuous state of the cloud host 14, but is a state of the target business application acquired at the preset heartbeat frequency. In an alternative embodiment, the first log analysis system 12 and the second log analysis system 13 start status recording at the same time, and the status recording is performed at the same time. For example, the preset heartbeat frequency is 30 seconds, and when the first log analysis system 12 records the running state of the target service application at the preset heartbeat frequency, the recording time is 00:30 minutes, 01:00 minutes, 01:30 minutes, 02:00 minutes, 02:30 minutes, and 03:00 minutes in sequence. When the second log analysis system 13 records the state of the cloud host 14 at the preset heartbeat frequency, the recording time is 00:30 minutes, 01:00 minutes, 01:30 minutes, 02:00 minutes, 02:30 minutes, and 03:00 minutes in sequence. When the status of the target business application recorded in the score of 01:30 by the first log analysis system 12 is failure, the running status of the cloud host 14 recorded in the score of 01:30 by the second log analysis system 13 needs to be acquired.

S203, if the operating state of the cloud host is normal, the monitoring server determines that the reason for the state failure of the target service application is that the target service application is abnormal.

For example, when the first log analysis system 12 records that the state of the target business application is failure in the score of 01:30, and the operating state of the cloud host 14 recorded by the second log analysis system 13 in the score of 01:30 is normal, it is determined that the reason for the failure of the state of the target business application is abnormal for the target business application, and the abnormal target business application is, for example, an overflow of a memory of the target business application itself or a tamper of a server where the business application is located.

And S204, if the running state of the cloud host is abnormal, the monitoring server determines that the reason of the state failure of the target service application is abnormal of the cloud host.

For example, when the first log analysis system 12 records that the state of the target business application is failure in the score of 01:30, and the operating state of the cloud host 14 recorded in the score of 01:30 by the second log analysis system 13 is abnormal, it is determined that the cause of the failure of the state of the target business application is abnormal of the cloud host 14.

The embodiment provides a failure monitoring method for a service application, which obtains an operating state of a cloud host 14 when a target service application state fails, and if the operating state of the cloud host 14 is normal when the target service application fails, it may be determined that the target service application itself fails due to abnormal operation. If the operation state of the cloud host 14 is abnormal when the target service application fails, it is determined that the reason for the failure of the target service application is the application state failure caused by the abnormal operation of the cloud host 14. Therefore, the method provided by the application can effectively distinguish the reason of the service application failure so as to quickly recover the state of the service application according to the reason of the service failure, and is beneficial to subsequent cloud computing or the architecture and optimization of the service application.

Referring to fig. 3, in an embodiment of the present application, the cloud service system 10 further includes a business application server 15 and a plurality of business application detection servers 16, the target business application is run on the business application server 15, the business application detection servers 16 can detect running data and the like of the target business application from the business application server 15, referring to fig. 3 and 4, the failure monitoring method for the business application shown in fig. 4 can be applied to the cloud service system 10 shown in fig. 3, and after performing steps S201 to S203, the failure monitoring method for the business application further includes:

s401, the service application detection server obtains the detection data of the process of the target service application from the service application server.

The target service application has a plurality of processes in the running process, and each process has a respective state. Taking the newwave mailbox as an example of a target service application, a client may have a plurality of operation steps when using the newwave mailbox, for example, may log in the mailbox, open a contact list of the mailbox, create a new mail, send a mail, and the like in sequence. Each operation step corresponds to a process, one of the service application probe servers 16 may obtain probe data of one process of the target service application from the service application server 15, and a plurality of the service application probe servers 16 may obtain probe data of a plurality of processes of the target service application from the service application server 15, respectively. In actual operation, the processes involved in the use process of the target service application are determined according to actual conditions, for example, when the client uses the Sina mailbox, the involved processes only comprise the contact lists of the login mailbox and the open mailbox, and the Sina mailbox only has two involved processes in the use process.

S402, the service application detection server obtains the state of the process of the target service application according to the detection data of the process of the target service application.

The service application detection server 16 obtains the detection data of the multiple processes of the target service application, and may determine the states of the multiple processes of the target service application according to the detection data of the multiple processes of the target service application.

S403, the service application probe server sends the status of the process of the target service application to the first log analysis system.

The first log analysis system 12 can record the status of each process involved in the usage of the target business application, and if a process is in a failure status, the first log analysis system 12 can record the status of the process as a failure status. If a process is in a normal operation state, the first log analysis system 12 can record the state of the process as a normal state.

S404, the first log analysis system determines the state of the target business application according to the state of the process of the target business application.

And if the states of the processes involved in the use process of the target service application are all normal states, the state of the target service application is normal. And if the state of any process involved in the use process of the target service application is a failure state, the state of the target service application is a failure. Based on the descriptions of step S202 to step S204, the reason that the status of the target business application is failure at this time may be that the target business application itself is abnormal, or that the status of the target business application is failure due to the abnormality of the cloud host 14. If the operation state of the cloud host 14 is abnormal when the target service application fails, the monitoring server 11 determines that the reason why the state of the target service application fails is that the cloud host 14 is abnormal.

Referring to fig. 5, in an embodiment of the present application, the cloud service system further includes an information configuration server 17, and a client may configure detection configuration information of the target business application on the information configuration server 17, referring to fig. 5 and fig. 6, the business application failure monitoring method shown in fig. 6 may be applied to the cloud service system 10 shown in fig. 5, and before step S401, the business application failure monitoring method further includes:

s601, the information configuration server obtains the detection configuration information of the target service application, where the detection configuration information includes an application name, service interface information, an application access path, and a monitoring frequency of the target service application.

The application name of the target service application is, for example, updateVpePort, the application access path is, for example, http:// ip: port/v 1/yc/cmp-close-network/updateVpePort, the monitoring frequency is the above heartbeat frequency, and each service application detection server 16 needs to acquire the detection data of the corresponding process at the same time and at the same frequency. The service interface information is shown in the following table, for example:

parameter name	Description of the parameters	Location of parameter	Type of parameter	Whether it is necessary to select	Remarks for note
						baseInfo	Basic information	Is free of	JsonObject	Is that
accountId	Account ID	baseInfo	String	Is that
						userId	User ID	baseInfo	String	Is that
properties	Region object	Is free of	JsonObject	Is that
						portId	Port ID	properties	String	Is that
localPort	Home port name	properties	String	Whether or not
						localVlanId	Local port vlan id	properties	String	Whether or not
localAddress	Local port address	properties	String	Whether or not
						mtu	Port mtu	properties	String	Whether or not
description	Port description	properties	String	Whether or not

S602, the information configuration server generates a service application detection instruction according to the detection configuration information, and sends the service application detection instruction to the service application detection server 16; the service application probing instruction is used to instruct the service application probing server 16 to probe the progress of the target service application on the service application server 15.

Specifically, the service application detection instruction is used to instruct the service application detection server 16 to access the service application corresponding to the application name, that is, the target service application, according to the application access path and the service interface information, and then detect the process of the target service application according to the monitoring frequency. The monitoring frequency can be set according to actual needs, and the application is not limited. The client may configure the probe configuration information of the target business application on the self-service interface of the client, and after configuring the probe configuration information, the information configuration server may assign, according to the probe configuration information, an account name for testing, which corresponds to the target business application, for example, zhang san, and the account name for testing is authorized to access the target business application. That is, the customer uses the account name for testing instead of the name set by the customer when accessing the target business application. An account name assigned for testing may reduce the impact of the business application probe instructions on the business application. That is, when configuring the self-service information of the cloud host 14, the business application deployer may configure an account, that is, the account for testing, specifically for the cloud computing platform, and only response is performed on business logic processing of the account, and subsequent processing is not performed, so as to reduce the influence of the heartbeat test on the target business application. The heartbeat test refers to step S401.

Optionally, the cloud service system 10 further includes a cloud host detection server 18, and the information configuration server 17 is further configured to receive a cloud host detection instruction, and send the cloud host detection instruction to the cloud host detection server 17.

The failure monitoring method for the service application further comprises the following steps: the information configuration server receives the cloud host detection instruction input by the client and sends the cloud host detection instruction to the cloud host detection server, and the cloud host detection instruction is used for indicating the cloud host detection server to detect the operation state of the cloud host.

The cloud host detection instruction is, for example, an operating system top command, the information configuration server 17 sends the cloud host detection instruction to the cloud host detection server after receiving the cloud host detection instruction, and the cloud host detection server detects the operation state of the cloud host 14 after receiving the cloud host detection instruction.

Referring to fig. 5 and 7, the method for monitoring failure of a business application shown in fig. 7 may be applied to the cloud service system 10 shown in fig. 5, and before step S202, the method for monitoring failure of a business application further includes:

s701, the cloud host detection server acquires detection data of the cloud host from the cloud host.

In an alternative embodiment, the probe data for the cloud host includes processor utilization data, memory utilization data, and interface utilization data for the cloud host. The cloud host probe server 18 also acquires probe data according to the monitoring frequency used by the plurality of business application probe servers 16 when acquiring the probe data of the cloud host. For example, if the monitoring frequency of the service application probe server 16 is 30 seconds, that is, probe data of a process is acquired every 30 seconds, the cloud host probe server 18 also acquires probe data of a cloud host every 30 seconds. If the time for the cloud host probe server 18 and the time for the business application probe server 16 to start acquiring the probe data are not consistent, calibration is needed first, so that the time for the cloud host probe server and the time for the business application probe server 16 to start acquiring the probe data are consistent. That is, the cloud host probe server and the plurality of business application probe servers 16 operate at the same time and at the same monitoring frequency. It should be noted that the cloud host probe server 18 does not need to receive the cloud host probe command, and the cloud host probe server 18 may also be configured to obtain the probe data of the cloud host 12 based on the monitoring frequency.

S702, the cloud host detection server obtains the running state of the cloud host according to the detection data of the cloud host.

After the cloud host probe server 18 obtains the processor utilization data, the memory utilization data, and the interface utilization data of the cloud host 14, it determines whether the value of the processor utilization data exceeds a preset processor utilization, determines whether the value of the memory utilization data exceeds a preset memory utilization, and determines whether the value of the interface utilization data exceeds a preset interface utilization. If the value of the processor utilization data exceeds a preset processor utilization, or the value of the memory utilization data exceeds a preset memory utilization, or the value of the interface utilization data exceeds a preset interface utilization, it is determined that the operating state of the cloud host 14 is abnormal. If the value of the processor utilization data does not exceed the preset processor utilization, the value of the memory utilization data does not exceed the preset memory utilization data, and the value of the interface utilization data does not exceed the preset interface utilization data, it is determined that the operation state of the cloud host 14 is normal.

S703, the cloud host detection server sends the operating state of the cloud host to the second log analysis system.

The second log analysis system 13 records the operating states of the cloud host 14 according to the monitoring time of the monitoring frequency. For example, in the score of 01:30, the recorded operation state of the cloud host 14 is normal operation, in the score of 02:00, the recorded operation state of the cloud host 14 is normal operation, in the score of 02:30, the recorded operation state of the cloud host 14 is abnormal operation, and so on.

The method provided in this embodiment may obtain the processor utilization rate data, the memory utilization rate data, and the interface utilization rate data of the cloud host 14 through the cloud host detection server 18, and respectively determine whether the value of the processor utilization rate data exceeds the preset processor utilization rate, determine whether the value of the memory utilization rate data exceeds the preset memory utilization rate data, determine whether the value of the interface utilization rate data exceeds the preset interface utilization rate data, and further determine whether the operation state of the cloud host 14 is normal operation or abnormal operation. The method provided in this embodiment provides the running state of the cloud host 14 when the state of the target service application is invalid, so that the monitoring server 11 can determine whether the cause of the state failure of the target service application is the target service application abnormality or the running abnormality of the cloud host 14, and solve the problem that the cause of the state failure of the service application on the cloud computing cannot be determined in the prior art.

Optionally, the information configuration server 17 is further configured to: acquiring the process state of the target business application from the first log analysis system 12, and generating a process state analysis graph of the target business application according to the process state of the target business application; the process state analysis graph is used for displaying the state of each process in the target business application.

The process analysis graph is a visual graph, such as a bar graph shown in fig. 8, the vertical axis of the bar graph represents the duration of the process, and the horizontal axis of the bar graph represents different processes. If a process is in a failure state, the duration of the process corresponding to the process on the histogram is 0, for example, in the histogram shown in fig. 8, if the process 3 is in a failure state, the duration of the process 3 on the histogram is 0, and the durations of the processes after the process 3, that is, the processes of the process 4 and the process 5 on the histogram are also 0.

Referring to fig. 9, in an embodiment of the present application, the cloud service system further includes an alarm server 19 and a restart server 20, where the alarm server 19 is configured to generate a server identifier corresponding to a failed process in the target business application, and the restart server 20 and the alarm server 19 are configured to obtain the failed process in the target business application from the alarm server 19 and generate a restart script for the failed process. Referring to fig. 9 and 10, the method for monitoring failure of business application shown in fig. 10 may be applied to the cloud service system 10 shown in fig. 9, and on the basis of executing step S601 to step S602, the method for monitoring failure of business application further includes:

s1001, the alarm server obtains, from the first log analysis system, the identifier of the service application detection server 16 corresponding to the process with failed state in the target service application.

One such business application probe server 16 is used to obtain probe data for one process. The first log analysis system 12, when recording the status of the processes in the target business application, also records the identity of the business application probe server 16 corresponding to each process. When a process fails, the alarm server 19 can obtain from the first log analysis system 12 the identity of the business application probe server 16 corresponding to the failed process.

S1002, the alarm server generates server abnormal information according to the identification of the business application detection server corresponding to the process with the invalid state.

The server exception information may display the identity of the business application probe server 16 corresponding to the failed process in the target business application.

S1003, the alarm server sends the server abnormal information to the service application server.

The client can obtain the identifier of the service application detection server 16 corresponding to the failed process in the target service application through the server abnormal information, and further obtain the failed process in the target service application. After the process of the target service application failure is obtained, the specific reason of the target service application failure can be accurately determined, and effective reference is provided for restarting the target service application. The warning server 19 can also notify the client of the abnormal information of the server by short message, mail or telephone according to the information sending mode configured by the client.

S1004, the warning server sends the server identifier corresponding to the process with the failed state to the restart server.

After the alarm server 19 obtains the server identifier corresponding to the failed process, sending the server identifier corresponding to the failed process to the restart server 20 may help the restart server 20 generate a restart command, which is specifically described in step S1105.

S1005, the restart server generates a restart command of the target service application according to the server identifier corresponding to the process with the failed state.

The restart command may also be referred to as a restart script, and is specifically used to restart a failed process in the target business application.

S1006, the restart server sends the restart command to the service application server.

After the restart server 20 sends the restart command, or the restart script, to the service application server 15, the service application server 15 may execute the restart script, so as to restart the target service application.

The method provided by the embodiment can restart the target service application according to the failed process in the target service application, save the restart time of the target service application, and improve the restart efficiency of the target service application.

In the cloud service system 10 shown in fig. 9, the cloud service system includes the alarm server 19 and the restart server 20, and optionally, the cloud service system 10 may further include the alarm server 19 but not the restart server 20.

The present application further provides a cloud service system 10, please refer to fig. 9 or fig. 11, where the cloud service system 10 includes a monitoring server 11, a first log analysis system 12, a second log analysis system 13, and a cloud host 14.

The monitoring server 11 is configured to obtain a status of a target service application from the first log analysis system 12, and if the status of the target service application is failure, obtain an operation status of the cloud host 14 when the target service application fails from the second log analysis system 13.

The monitoring server 11 is further configured to determine that the reason that the state of the target service application is failed is that the target service application is abnormal if the operation state of the cloud host 14 is normal, and determine that the reason that the state of the target service application is failed is that the cloud host is abnormal if the operation state of the cloud host 14 is abnormal.

The cloud service system 10 further includes a business application server 15 and a plurality of business application probe servers 16. The service application probe server 16 is configured to obtain probe data of a process of the target service application from the service application server 15, obtain a state of the process of the target service application according to the probe data of the process of the target service application, and send the state of the process of the target service application to the first log analysis system 12. The first log analysis system 12 is configured to determine a state of the target business application according to the state of the process of the target business application.

The cloud service system 10 further includes: the information configuration server 17.

The information configuration server 17 is configured to obtain detection configuration information of the target service application, where the detection configuration information includes an application name, service interface information, an application access path, and a monitoring frequency of the target service application, generate a service application detection instruction according to the detection configuration information, and send the service application detection instruction to the service application detection server 16. The service application probing instruction is used to instruct the service application probing server 16 to probe the progress of the target service application on the service application server 15.

The information configuration server 17 is further configured to obtain the process state of the target business application from the first log analysis system 12, and generate a process state analysis graph of the target business application according to the process state of the target business application. The process state analysis graph is used for displaying the state of each process in the target business application.

The information configuration server 17 is further configured to receive a cloud host detection instruction, and send the cloud host detection instruction to the cloud host detection server 18, where the cloud host detection instruction is used to instruct the cloud host detection server 18 to detect the operating state of the cloud host 14.

The cloud service system 10 further includes: the cloud host probes the server 18.

The cloud host detection server 18 is configured to obtain the detection data of the cloud host 14 from the cloud host 14, obtain the operation state of the cloud host 14 according to the detection data of the cloud host 14, and send the operation state of the cloud host 14 to the second log analysis system 13.

The cloud server system 10 further includes an alarm server 19, where the alarm server 19 is configured to obtain, from the first log analysis system 12, an identifier of the service application detection server 16 corresponding to the process with the failed state in the target service application, generate server abnormality information according to the identifier of the service application detection server 16 corresponding to the process with the failed state, and send the server abnormality information to the service application server 15.

The cloud service system 10 further includes a restart server 20, where the restart server 20 is configured to generate a restart command of the target business application according to the server identifier corresponding to the process with the failed state, and send the restart command to the business application server 15.

It can be understood that the monitoring server 11, the first log analysis system 12, the second log analysis system 13, the cloud host 14, the business application server 15, the business application probe server 16, the information configuration server 17, the cloud host probe server 18, the alarm server 19, and the restart server 20 may all be respectively operated on separate physical devices, or may be integrated on a plurality of physical devices. The integration is on multiple physical devices. For example, the monitoring server 11 is deployed on one physical server, the first log analysis system 12 and the second log analysis system 13 are deployed on another physical server, the number of the cloud hosts 14 may be multiple, the cloud hosts 14 are typically deployed on user equipment, the business application server 15 is deployed on one physical server, the business application probe server 16, the information configuration server 17 and the cloud host probe server 18 are deployed on one physical server, and the alarm server 19 and the restart server 20 are deployed on one physical server.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A failure monitoring method of business application is applied to a cloud service system, the cloud service system comprises a monitoring server, a first log analysis system, a second log analysis system and a cloud host, and the failure monitoring method is characterized by comprising the following steps:

2. The method of claim 1, wherein the cloud service system further comprises a business application server and a plurality of business application probe servers, and wherein the method further comprises:

3. The method according to claim 2, wherein the cloud service system further comprises an information configuration server, and before the business application probe server obtains probe data of the process of the target business application from the business application server, the method further comprises:

4. The method of claim 3, further comprising:

5. The method of claim 3, wherein the cloud service system further comprises an alert server, the method further comprising:

6. The method according to claim 5, wherein the cloud service system further comprises a restart server, and after the alarm server obtains an identifier of a business application probe server corresponding to the process with failed state in the target business application from the first log analysis system, the method further comprises:

7. The method according to claim 3, wherein the cloud service system further comprises a cloud host detection server, and before the monitoring server obtains the running state of the cloud host when the target business application fails from the second log analysis system, the method further comprises:

8. The method of claim 7, further comprising:

9. A cloud service system is characterized by comprising a monitoring server, a first log analysis system, a second log analysis system and a cloud host;

10. The cloud service system of claim 9, further comprising: the system comprises a service application server and a plurality of service application detection servers;