WO2024060776A1

WO2024060776A1 - Service health status display method and apparatus, and device and storage medium

Info

Publication number: WO2024060776A1
Application number: PCT/CN2023/104819
Authority: WO
Inventors: 郭良; 侯瑞军
Original assignee: 华为云计算技术有限公司
Priority date: 2022-09-19
Filing date: 2023-06-30
Publication date: 2024-03-28
Also published as: CN117762747A

Abstract

The present application provides a service health status display method and apparatus, and a device and a storage medium. In embodiments, the method comprises: determining a query time range of a data query; determining indicator data of a service within the query time range, the indicator data of the service being used for reflecting original health statuses of the service; determining, according to the indicator data of the service within the query time range, the frequency of occurrence of each of multiple original health statuses of the service within the query time range; and displaying a status identifier of the service according to the weight corresponding to each of the multiple original health statuses of the service and the frequency of occurrence of the service in the multiple original health statuses within the query time, the status identifier being used for indicating a final health status of the service, and the weight being used for representing the degree of influence of the frequency of occurrence of the original health status on the final health status. In this way, by means of the technical solution provided by the embodiments of the present application, the accuracy of representing the health status of a service can be improved, thereby improving the accuracy of service fault location.

Description

Service health status display method, device, equipment and storage medium

This application requests the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on September 19, 2022, with the application number 202211138376. The entire contents of which are incorporated herein by reference.

Technical field

The present application relates to the field of computer technology, and in particular to a method, device, equipment and storage medium for displaying service health status.

Background technique

The rapid development of computer technology has brought stronger processing power, more storage space and faster network environment. This allows the application to serve more people and face greater load pressure. As the complexity of the internal logic of applications continues to increase, its development and operation and maintenance are facing more and more challenges. A business is usually completed by multiple services that communicate with each other. Therefore, operation and maintenance personnel need to monitor the health status of each service.

Currently, the display method of service health status in related technologies only represents the health status at the time of query, making it impossible for operation and maintenance staff to provide an overview of problems and further fault location requirements.

Therefore, the display method of the health status of services in related technologies cannot accurately represent the health status of services, causing operation and maintenance personnel to be unable to accurately locate service faults.

Contents of the invention

Embodiments of the present application provide a method, device, equipment and storage medium for displaying service health status, which can display the status identifier of the service based on the impact of the frequency of the original health status appearing within the query time range on the final health status, thereby enabling Improves the accuracy of representing the health status of a service.

In the first aspect, embodiments of the present application provide a method for displaying service health status, including:

Determine the query time range of data query;

Determine the indicator data of the service within the query time range. The indicator data of the service is used to reflect the original health status of the service;

Based on the indicator data of the service within the query time range, determine the frequency of each of the multiple original health states of the service within the query time range;

According to the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the service in the multiple original health states within the query time, the status identifier of the service is displayed for users to locate faults based on the status identifier of the service. The status identifier It is used to indicate the final health state of the service, and the weight is used to indicate the degree of influence of the frequency of the original health state on the final health state.

The embodiment of this application displays the status identifier of the service by combining the weight of the original health state and the frequency of the original health state within the query time, so that the status identifier of the service includes the impact of the frequency of the original health state on the final health state, improving Indicates the accuracy of the health status of the service, thereby improving the accuracy of service fault location.

In a possible implementation, the method for displaying the service health status further includes:

According to the preconfigured weight determination model and the preset time length threshold, the corresponding weights of multiple original health states are determined;

Among them, the weight determination model is used to indicate the correspondence between the frequency of multiple original health states of the service and the final health state of the service.

In this way, the weight is determined through the weight determination model, so that the weight of the original health state includes the degree of influence of the frequency of the original health state on the final health state.

In a possible implementation, the display method of service health status also includes:

According to the corresponding weights and preset time length thresholds of multiple original health states, determine the corresponding state score ranges of multiple final health states;

Based on the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the service in the multiple original health states within the query time, the status identifier of the service is displayed, including:

Calculate the status score corresponding to the service based on the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the multiple original health states within the query time;

The status identifier of the service is displayed based on the status score corresponding to the service and the corresponding score ranges of multiple final health states.

Determine the query step size of data query;

Determine the metric data served within the query time range, including:

According to the query step size and query time range, the indicator data of the service is collected from all indicator data of the service.

In this way, due to the large amount of indicator data of the service, all indicator data of the service can be sampled according to the query step, thereby balancing computing resources and the accuracy of health monitoring.

In a possible implementation, the data amount of the service indicator data is not greater than the target amount;

Among them, the target number is determined based on the preconfigured collection step, which is used to indicate the length of the time interval between the indicator data of the service.

In a possible implementation, determining the query step size of the data query includes:

Determine the collection step size, which is used to indicate the length of the time interval between all indicator data of the service;

The query step is determined based on the collection step, the preset time length threshold and the query time range.

In this way, the query step size is determined adaptively through the collection step size, thereby avoiding missed reports of service failures, thus balancing computing resources and the accuracy of health monitoring.

In a possible implementation, the status identification includes a color identification or a shape identification.

In a possible implementation, there are multiple services.

Receive the user's operation on the status identification of the target service;

Based on the operation, the raw health status of the target service at multiple moments in the query time range is displayed.

This allows users to easily view the original health status of the service at multiple times.

In the second aspect, embodiments of the present application provide a display device for serving health status, including:

The first determination module is used to determine the query time range of data query;

The second determination module is used to determine the indicator data of the service within the query time range. The indicator data of the service is used to reflect the original health status of the service;

The third determination module is used to determine the frequency of each of the multiple original health states of the service within the query time range based on the indicator data of the service within the query time range;

The display module is used to display the status identifier of the service based on the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the service in the multiple original health states within the query time, so that the user can perform operations based on the status identifier of the service. For fault location, the status identifier is used to indicate the final health status of the service, and the weight is used to indicate the impact of the frequency of the original health status on the final health status.

In a possible implementation, the service health status display device also includes:

A fourth determination module is used to determine the weights corresponding to the various original health states according to a pre-configured weight determination model and a preset time length threshold;

In a possible implementation, the fourth determination module is also used to determine the state score ranges corresponding to the multiple final health states based on the respective weights and preset time length thresholds of the multiple original health states;

The calculation module is used to calculate the status score corresponding to the service based on the respective weights of the multiple original health states of the service and the respective frequencies of the multiple original health states within the query time;

The display module is used to display the status identifier of the service according to the status score corresponding to the service and the score ranges corresponding to the multiple final health states.

In a possible implementation, the first determination module is also used to determine the query step size of the data query;

The second determination module is used to collect the indicator data of the service from all the indicator data of the service according to the query step size and the query time range.

In a possible implementation, the first determination module is used to determine the collection step, and the collection step is used to indicate the length of the time interval between all indicator data of the service; according to the collection step, the preset time length threshold and Query the time range and determine the query step size.

In a possible implementation, there are multiple services.

In a possible implementation, the device further includes:

The receiving module is used to receive the user's operation on the status identification of the target service;

The display module is also used to display the original health status of the target service at multiple moments within the query time range based on operations.

In this way, users can conveniently view the original health status of the service at multiple moments.

In a third aspect, embodiments of the present application provide a display device for serving health status, including: at least one memory for storing a program; at least one processor for executing the program stored in the memory. When the program stored in the memory is executed When, the processor is used to execute the method provided in the first aspect.

In a fourth aspect, embodiments of the present application provide a display device for serving health status, characterized in that the device runs computer program instructions to execute the method provided in the second aspect. For example, the device may be a chip or a processor.

In one example, the apparatus may include a processor, which may be coupled to a memory, read instructions in the memory and execute the method provided in the first aspect according to the instructions. The memory may be integrated into the chip or processor, or may be independent of the chip or processor.

In a fifth aspect, embodiments of the present application provide a computer storage medium. Instructions are stored in the computer storage medium. When the instructions are run on a computer, they cause the computer to execute the method provided in the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product containing instructions. When the instructions are run on a computer, they cause the computer to execute the method provided in the first aspect.

Description of drawings

Figure 1 is a system architecture diagram of a business system provided by an embodiment of the present application;

Figure 2 is a schematic flow chart of a service indicator data collection method provided by an embodiment of the present application;

Figure 3 is a schematic flowchart of a method for displaying service health status provided by an embodiment of the present application;

Figure 4 is a schematic diagram of a human-computer interaction interface provided by an embodiment of the present application;

Figure 5 is a schematic diagram of another human-computer interaction interface provided by an embodiment of the present application;

Figure 6 is a schematic diagram of yet another human-computer interaction interface provided by an embodiment of the present application;

Figure 7 is a schematic diagram of a service health status display interface provided by an embodiment of the present application;

Figure 8 is a schematic diagram of another service health status display interface provided by an embodiment of the present application;

Figure 9 is a schematic structural diagram of a service health status display device provided by an embodiment of the present application;

Figure 10 is a schematic structural diagram of a computing device provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

In the description of the embodiments of this application, words such as "exemplary", "for example" or "for example" are used to represent examples, illustrations or explanations. Any embodiment or design described as "exemplary," "such as," or "for example" in the embodiments of the present application is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary," "such as," or "for example" is intended to present the concepts in a concrete manner.

In the description of the embodiments of this application, the term "and/or" is only an association relationship describing associated objects, indicating that there can be three relationships. For example, A and/or B can mean: A alone exists, and A alone exists. There is B, and there are three situations A and B at the same time. In addition, unless otherwise stated, the term "plurality" means two or more. For example, multiple systems refer to two or more systems, and multiple terminals refer to two or more terminals.

In addition, the terms "first" and "second" are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

Figure 1 is an architectural diagram of a business system involved in the embodiment of this application. The business system includes the following network devices: a management platform 110 and several host devices 120.

The management platform 110 may be a single computing device, or it may be a service cluster composed of multiple computing devices, or it may be a cloud computing center, or it may be a hyper terminal.

In one example, the computing device involved in this solution can be used to provide cloud services, and it can establish communication connections with several host devices to provide computing functions and/or storage functions for the host devices.

The management platform involved in the embodiments of this application may be a hardware device, or may be embedded in a virtualized environment. For example, the management platform involved in this solution may be a virtual machine executed on a hardware device including one or more other virtual machines.

The host device 120 may be a physical host or a virtual host.

In this embodiment of the present application, the management platform 110 and several host devices 120 form a business system. The business system provides several different microservices to the outside world, and each microservice is run by one or more host devices 120 . The management device 110 can collect indicator data of microservices run by the host device 120 . The management platform 110 monitors the health of the microservices based on the collected indicator data of the microservices of various businesses, so that the operation and maintenance staff of the microservices (for convenience of description, called users) determine the microservices through the management platform. The health of the microservice, and based on the health of the microservice, locate the time when the microservice fails.

Currently, the display method of service health status in related technologies only represents the health status of the service at the time of query, and the health status of the service at each time point in a certain period of time in history has an impact on the health status of the service at the time of query. , only displays the health status of the service at the time of query, without considering the impact of the health status of the service at each time point in the historical time period, and cannot reflect the changes in the health status of the service during the entire time period queried by the user, which does not satisfy Users’ needs for problem overview and further fault location. Therefore, the display method of the health status of the service in the related technology cannot accurately represent the health of the service.

The inventors of this application discovered during the technical research process that in the related technology, the priorities of different health states are determined according to the impact of the health state of the service on the user, and the priorities of the health states are sorted, with the health states with higher priorities being used as representatives, and the identifiers are displayed according to the corresponding color identifiers of the health states. For example, the priority order of the health states is: abnormal state > lossy state > normal state. The identification color corresponding to the abnormal state is red, the identification color corresponding to the lossy state is yellow, and the identification color corresponding to the normal state is green. When querying the service topology of the past 24 hours, the services were all in normal state within 23 hours and 59 seconds, except for an abnormal state that occurred for 1 second. Since the priority of the abnormal state is higher than the normal priority, the abnormal state is used as the representative state within the query time, and the final identification color is red. At that time, the display method of the health status of the above-mentioned services did not consider the frequency of occurrence of the health state in the past 24 hours but only considered the priority. In fact, the health status of other time points was omitted, and essentially a large amount of original information was lost. The health status of the omitted time points did not contribute to the representation of the final health state. From a practical point of view, When the user checks the topology, he finds that a certain service is marked in red. However, when locating the abnormal state, he finds that the abnormal state only occurred for a moment and then quickly returned to normal. It can be seen that it is obviously unreasonable to display the service mark in red.

Alternatively, the related technology counts the frequency of occurrence of different health states of the service within the query time range, and uses the health state with the highest frequency as the representative state, and displays the color identifier corresponding to the health state with the highest frequency. Among them, the identifier color corresponding to the abnormal state is red, the identifier color corresponding to the lossy state is yellow, and the identifier color corresponding to the normal state is green. For example, when querying the 24-hour service topology, a service has an abnormal state for 10 hours within 24 hours, and the remaining 14 hours are in a normal state. According to the frequency of occurrence of different health states, the normal state will be used as a 24-hour state representative, and the identifier of the service will be displayed as a green identifier. However, the above method causes abnormal states that do not have a frequency advantage to be directly ignored, and there is a risk of underreporting abnormal states.

In summary, it can be seen that the health of services cannot be accurately expressed in related technologies.

Based on this, embodiments of the present application provide a method, device, equipment and storage medium for displaying the service health status, which can display the status identification of the service based on the impact of the frequency of the original health status appearing within the query time range on the final health status, thereby It can improve the accuracy of service health performance.

Next, based on the business system in the embodiment corresponding to FIG. 1 , the indicator data collection method for the service provided in the embodiment of the present application is described.

Figure 2 is a schematic flowchart of a service indicator data collection method provided by an embodiment of the present application. The service indicator data collection method provided by the embodiment of this application is applied to the management system shown in Figure 1. As shown in Figure 2, the service indicator data collection method provided by the embodiment of the present application includes S201-S203.

S201: The host device collects all indicator data of services run by the host device.

When the host device is running the service, it can generate a running log of the service. The running log records all the indicator data of the host device when running the service.

Hosting devices can extract service metric data from operational logs. In a possible implementation, the host device can receive an indicator collection instruction sent by the management platform, and the host device responds to the indicator collection instruction and extracts the indicator data of the service from the operation log.

In a possible implementation, all indicator data of the service can be collected through k8s indicators.

Here, the indicator data of the service carries a timestamp, and the timestamp is used to indicate the generation time of the indicator data.

S202: The host device sends all indicator data of the service to the management platform.

The host device communicates with the management platform to enable data transmission. The host device has multiple ways of sending service indicator data to the management platform.

The management device can use Prometheus' indicator acquisition method to collect service indicator data. For example, a client can be installed on the host device and linked with the server in the management platform to collect service indicator data.

In a possible implementation, the host device can actively send all indicator data of the service to the management platform. The host device can respond to the indicator collection instruction, collect all indicator data of the service, and send all indicator data of the service to the management platform.

In another possible implementation, the host device can passively send all indicator data of the service to the management platform. In order to save the storage resources of the host device, the host device can actively send all the indicator data of the service to the management platform after extracting all the indicator data of the service.

In this embodiment, since the amount of service indicator data is large, the indicator data of the service can be collected according to a certain collection step length. Here, the collection step length is configured by the user through the management platform, and the management platform can send the collection step length to the host device, so that the host device collects all indicator data of the service according to the collection step length.

S203: Management platform storage service indicator data.

The management platform can store the indicator data in time sequence and store the indicator data as time series data to facilitate subsequent retrieval in time order.

Next, based on the business system in the embodiment corresponding to Figure 1, the method for displaying the service health status provided by the embodiment of the present application will be described.

Figure 3 is a schematic flowchart of a method for displaying service health status provided by an embodiment of the present application. The method for displaying service health status provided by the embodiment of the present application is applied to the management system shown in Figure 1 . As shown in Figure 3, the method for displaying service health status provided by the embodiment of the present application includes S301-S304.

S301: The management platform determines a query time range for data query.

The query time range refers to the time range of the data that the user needs to query, including the start time and deadline of the data query. For example, if the query time range is the past 24 hours and the current time is 6 pm on June 15, 2022, then the query time range is from 6 pm on June 14, 2022 to 6 pm on June 15, 2022.

In a possible implementation, the management device includes a display, and the display is used to display a human-computer interaction interface for data query. For example, the user can input the query time range of the data query through the human-computer interaction interface of the data query.

The management device can determine the query time range of the data query after receiving the data query instruction. For example, the user can input data query instructions through a human-computer interaction interface for data query. For example, as shown in FIG. 4 , the human-computer interaction interface 40 of the display displays a data query window 41 , the data query window 41 displays “data query”, and displays a “confirm” logo 42 and a “cancel” logo 43 . When the user clicks the "confirm" logo 42, the management platform can generate a data query instruction.

There are many ways to determine the query time range, as detailed below.

In a possible implementation, the query time range may be determined according to the query duration pre-configured by the management platform. For example, the management platform defaults to the query duration of the past 24 hours. After receiving the data query instruction, the management platform can calculate the query time range according to the current time and the query duration.

In another possible implementation, the management platform can obtain the query time range through a human-computer interaction interface for data query. Users can enter the query time range through the human-computer interaction interface.

In another possible implementation, the management platform can configure multiple query durations. For example, as shown in Figure 5, when the user clicks the "Confirm" logo 52, the human-computer interaction interface 50 displays the query duration window 51. The query duration Window 51 displays multiple query durations, the past 24 hours, the past 72 hours, and the past week. And the query duration confirmation mark 53 is displayed, and the user clicks on it. The management platform can display various query durations through the human-computer interaction interface of the data query table, and users can select the query duration through the human-computer interaction interface. The management platform determines the query time range of data query based on the query duration selected by the user through the human-computer interaction interface and the current time.

S302: The management platform determines the indicator data of the service within the query time range.

Services can be services in a variety of business scenarios, such as services in mobile phone payment business scenarios, services in book query scenarios, services in purchase of items scenarios, etc. A service can be a business service or multiple microservices that make up a business service.

The indicator data of the service can reflect the health status of the service, so that the management platform can monitor the health of the service based on the indicator data of the service.

The management platform stores all indicator data of the service, and the management platform stores all indicator data of the service in chronological order. Optionally, the management platform can retrieve the indicator data within the query time range, that is, the management platform samples all indicator data of the service to determine the indicator data of the service within the query time range. Optionally, the metric data includes the service's raw health status and timestamp. The management platform can obtain the indicator data served within the query time range based on the timestamp in the indicator data.

In one possible implementation, in order to ensure the accuracy of health monitoring, the management platform samples the indicator data of services within the query time range and performs health monitoring.

In another possible implementation, since the amount of service indicator data is large, in order to balance the computing resources and the accuracy of health monitoring, all service indicator data can be sampled according to the query step size to obtain the service indicator data within the query time range. It should be noted that the service indicator data refers to the data sampled from all service indicator data.

As an example, the query step size may be input by the user through a human-computer interaction interface for data query. For example, as shown in FIG. 6 , when the user confirms the data query, the human-computer interaction interface 60 of the data query can display the query step input box 61 and the query step unit 62 (such as minutes, seconds, hours). The user can confirm the data query through the "confirm" mark 63. The management platform generates data query instructions based on the query step size input by the user. In other words, the data query instructions include the query step size.

As another example, when the collection step size is equal to the query step size, all indicator data served within the query time range are accurately presented with no false negatives. However, when the query time range is large, the amount of data will be very large, causing the interface to slow down. When the collection step size is smaller than the query step size, the advantage is that the amount of data can be controlled. The disadvantage is that the captured original sampling points may be ignored, resulting in missing data. When the queried sampling points are abnormal and happen to be ignored, the exceptions cannot be accurately reported, resulting in poor user experience. In order to balance computing resources and health monitoring accuracy, the management platform can adaptively determine the query step size.

Specifically, the management platform determines the query step length according to the collection step length, a preset time length threshold and a query time range.

The time period that users can query is not infinite. Therefore, you can limit the upper limit of query points by configuring the time length threshold. The upper limit of the number of query points determines the resolution. For example, if the user wants to report an abnormality of one point in red, if the upper limit of the number of query points is 600, then 1/600 will be used as the minimum resolution when calculating the weight (specific details See subsequent instructions).

When the query time range is less than or equal to the time length threshold, the query step size is equal to the collection step size. When the query time range is greater than the time length threshold, the query step satisfies the following formula (1):

Among them, querytime represents the query time range, T represents the preset time length threshold, and scrape_interval represents the collection step. It should be noted that the units of all parameters in the formula are the same. For example, all parameters in the formula are in minutes. The specific parameters can be set as needed and are not limited here.

S303: Determine, based on the indicator data of the service within the query time range, the frequency at which each of the multiple original health states of the service occurs within the query time range.

The raw health status of a service is used to indicate the status of the service while it is running. Users can customize the division and number of original health states according to the monitored objects and monitoring needs. For example, the original health state of a service can be divided into lossy state, abnormal state and normal state. Among them, the lossy state means that there is an unhealthy state during the running of the service, but the service can run normally and the user cannot detect it; the abnormal state means that there is an unhealthy state during the running of the service and the service cannot be provided normally, which has a great impact on the user.

The original health status of a service may be different at different moments within the query time range. Statistically analyze the indicator data of the service at different times within the query time range to determine the original health status of the service at different times. Then, the statistical service counts the frequency of occurrence of various original health states within the query time range.

S304: Display the status identifier of the service according to the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the multiple original health states within the query time, so that the user can use the service according to the The status identifier of the service is used to locate the fault. The status identifier is used to indicate the final health state of the service. The weight is used to represent the degree of impact of the frequency of the original health state on the final health state.

The final health status can be divided by users according to their own needs. Since the frequency of occurrence of different original health states has different effects on different users, the user sets the final health state according to the frequency of occurrence of different original health states. For example, the final health status that users care about is the following four: the original health status of the service is all normal within the query time range (for convenience of description, it is expressed as Healthy' below), the original health status of the service within the query time range A lossy state has occurred (for convenience of description, it is represented by Degraded' below), the original health status of the service has an abnormal state within the query time range (for convenience of description, it is represented by Failure' below), and the frequency of occurrence of the abnormal state No more than 10%, the original health status of the service has an abnormal state within the query time range (for convenience of description, it is expressed as "Failure" below) and the frequency of abnormal state is higher than 10%.

The weight of the original health state is used to represent the degree of influence of the frequency of the original health state within the query time range on the final health state.

Among them, the management platform calculates the status score of the service within the query time based on the weight of the original health status and the frequency of the original health status within the query time range. Displays the status ID corresponding to the status score of the service.

The status score of the service within the query time satisfies the following formula (2):

Among them, in the implementation of s _i , tube represents the original health state of the service, F(s _i ) represents the frequency of the i-th original health state within the query time range, and W(s _i ) represents the weight of the i-th original health state. , i is a positive integer.

In this embodiment, the management platform stores the correspondence between the status identifier and the status score range. For example, status identifier 1 is used to indicate that the original health status of the service is all normal (Healthy) within the query time range, and the status score range corresponding to status identifier 1 is score=1. Status ID 2 is used to indicate that the original health status of the service has become damaged (Degraded) within the query time range. The status score range corresponding to Status ID 2 is 1<score≤10. Status ID 3 is used to indicate that an abnormal state (Failure) occurred in the original health state of the service within the query time range and the frequency of the abnormal state is not higher than 10%, the status score range corresponding to status identifier 3 is 10<score≤1440. Status ID 4 is used to indicate that the original health status of the service has an abnormal state (Failure) within the query time range and the frequency of the abnormal status is higher than 10%. The score range corresponding to Status ID 4 is 1440<score.

For example, score=1, the green status indicator is displayed: the status is normal (Healthy) within the query time range. 1<score<10, the yellow status indicator is displayed: the degraded status (Degraded) occurs within the query time range. 10<score<1440, the orange status indicator is displayed: the abnormal status (Failure) occurs within the query time range and the frequency is less than 10%. Score>1440, the red status indicator is displayed: the failure occurs within the query time range and the frequency is higher than 10%.

When users see different status indicators, they can accurately know what these four final health states represent. For example, the user selects a query time range of 24 hours, that is, 1440 minutes: one minute has an abnormal state (Failure), and the rest are normal states (Healthy). The state score score=1439/1440+0+14400*1/ 1440=10.9993. Among them, 10<score<=1440, displays the orange mark. For another example, the user selects a query time range of 1 hour, that is, 60 minutes. Among them, the service has an abnormal state (Failure) for ten minutes, and the rest are in a normal state (Healthy), then the state score score=50/60 +0+14400*10/60＝2400.8333, score>1440 displays red mark.

Here, the status identifier can be a color identifier or a shape identifier. No limitation is made here.

In a possible implementation, the weight of the original health state and the state score range can be configured by the user as needed.

In another possible implementation, the management platform can calculate the corresponding weights of multiple original health states through a weight determination model. Specifically, according to the preconfigured weight determination model and the preset time length threshold, the respective corresponding weights and state score ranges of the multiple original health states are determined.

Among them, the weight determination model is used to indicate the correspondence between the frequency of the original health state of the service and the final health state of the service. Here, the weight determination model can be viewed as a multi-variable set of inequalities. Users can configure the corresponding relationship between the status score range and the frequency range of the original health status according to their own needs.

For example, the state score range has 4 segments, namely (0, S1], (S1, S2], (S2, S3], (S3, ∞). Among them, the larger the score, the more dangerous the state. The time length threshold is 24 hours Determine the minimum resolution, that is, the minimum resolution is 1/1440. The correspondence between the frequency of the original health state of the service and the final health state of the service is shown in Table 1:

Table I

According to the calculation formula (2) of the state score, the following inequality group (3) can be obtained, that is, the weight determination model.

Among them: W ₁ is the weight corresponding to the normal state (Healthy), W ₂ is the weight corresponding to the lossy state (Degraded), and W ₃ is the weight corresponding to the abnormal state (Failure).

Take a set of feasible solutions that meet the constraints: W ₁ =1, W ₂ =10, W ₃ =14400.

It can be strictly proved that the above feasible solution meets the requirements. From the values of each weight in the feasible solution, S1=1, S2=10, S3=1440, then the score segments (ie, the state score range) are (0, 1], (1, 10], (10, 1440], (1440, ∞). It can be seen that within the query time range:

1) As long as there is an abnormal state, the state score score is strictly greater than 10.

2) If the frequency of abnormal status is less than 10%, the status score must be less than 1440.

3) When there is and is only a normal state, the state score score=1.

4) As long as there is a lossy state, the state score score is greater than 1.

Therefore, there are four final health states, which not only include the weight of the original health state, but also include the frequency of the original health state within the query time range.

In some embodiments, a business may include multiple services. For example, as shown in Figure 7, the book purchasing business includes services: book title search service 11, book display service 12, book evaluation service 13, and order payment service 14. Among them, the correlation between the book title search service 11, the book display service 12, the book evaluation service 13, and the order payment service 14 is shown in Figure 7. Among them, book name search service 11, book display service 12 and book evaluation service 13 are all in normal status within 24 hours, and the status identifiers of book name search service 11, book display service 12 and book evaluation service 13 are all triangular identifiers. The order payment service 14 is in an abnormal state within 24 hours, and the status mark of the order payment service 14 is a square mark. The management platform can display the final health status of multiple services under the same business through a human-computer interaction interface. Furthermore, the management platform can display the final health status of respective services under multiple businesses.

In some embodiments, users can view the original health status of the service at various moments through the status identification. Specifically, a user's operation on the status identification of the target service is received; according to the operation, the original health status of the target service at multiple moments within the query time range is displayed.

For example, as shown in Figure 8, in the book purchase business, the status identifiers of the book name search service 11, the book display service 12 and the book purchase service 13 are all triangle identifiers, while the status identifier of the order payment service 14 is a square identifier. The user can By operating the status identifier of the order payment service 14 through the human-computer interaction interface, the management platform displays the original health status of the order payment service 14 at each sampling moment in the past 24 hours.

Based on the display method of service health status in Figure 3, embodiments of the present application provide a display device for service health status.

FIG. 9 is a schematic structural diagram of a service health status display device provided by an embodiment of the present application. The service health status display device provided by the embodiment of the present application can be applied to the management platform as shown in Figure 1. As shown in Figure 9, the display device for service health status provided by the embodiment of the present application includes a first determination module 901, a second determination module 902, a third determination module 903 and a display module 904.

The first determination module 901 is used to determine the query time range of data query;

The second determination module 902 is used to determine the indicator data of the service within the query time range. The indicator data of the service is used to reflect the original health status of the service;

The third determination module 903 is configured to determine the frequency of each of the multiple original health states of the service within the query time range based on the indicator data of the service within the query time range;

The display module 904 is configured to display the status identifier of the service according to the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the service in the multiple original health states within the query time, so as to It is used for users to locate faults based on the status identifier of the service. The status identifier is used to indicate the final health status of the service. The weight is used to represent the degree of impact of the frequency of the original health status on the final health status.

The fourth determination module is used to determine the corresponding weights of the multiple original health states according to the preconfigured weight determination model and the preset time length threshold;

Wherein, the weight determination model is used to indicate the corresponding relationship between the frequency of multiple original health states of the service and the final health state of the service.

In this way, the weight is determined by the weight determination model, so that the weight of the original health state includes the influence of the frequency of occurrence of the original health state on the final health state.

In a possible implementation, the fourth determination module is also used to determine the state score ranges corresponding to the multiple final health states according to the respective weights and the preset time length threshold of the multiple original health states. ;

A calculation module configured to calculate the status score corresponding to the service based on the corresponding weights of the multiple original health states of the service and the frequency of each of the multiple original health states appearing within the query time;

The display module is configured to display the status identification of the service according to the status score corresponding to the service and the score ranges corresponding to the multiple final health states.

The second determination module is configured to collect indicator data of the service from all indicator data of the service according to the query step size and the query time range.

Wherein, the target number is determined according to a preconfigured collection step, and the collection step is used to indicate the length of the time interval between the indicator data of the service.

In a possible implementation, the first determination module is used to determine the collection step, and the collection step is used to indicate the length of the time interval between all indicator data of the service; according to the collection step, the preset Set the time length threshold and the query time range to determine the query step size.

In a possible implementation, there are multiple services.

In a possible implementation, the device further includes:

The display module is also configured to display the original health status of the target service at multiple moments within the query time range according to the operation.

The device embodiment described in Figure 9 is only illustrative. For example, the division of modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. Each functional module in each embodiment of the present application can be integrated into one processing module, or each module can exist physically alone, or two or more modules can be integrated into one module. The above-mentioned modules in Figure 9 can be implemented in the form of hardware, software functional units, or a combination of software and hardware.

Referring to Figure 10, Figure 10 shows a schematic structural diagram of a computing device provided by an embodiment of the present application. The computing device may be a server or the like. Wherein, the management platform in Figure 1 includes at least one computing device. As shown in Figure 10, the computing device includes: a processor 1001, a memory 1002, and a communication interface 1003. The processor 1001, the memory 1002 and the communication interface 1003 are connected through a bus 1004. Memory 1002 includes operating system and program code modules.

Memory 1002 may include bulk storage for data or instructions. By way of example, and not limitation, the memory 1002 may include a hard disk drive (HDD), a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a Universal Serial Bus (USB) drive or two or more A combination of many of the above. Memory 1002 may include removable or non-removable (or fixed) media, where appropriate. Where appropriate, the memory 1002 may be internal or external to the integrated gateway disaster recovery device. In certain embodiments, memory 1002 is non-volatile solid-state memory.

The memory may include read-only memory (ROM), random access memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical or other physical/tangible memory storage devices. Thus, typically, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it is operable to perform the operations described with reference to the methods in the present application.

The processor 1001 reads and executes the computer program instructions stored in the memory 1002 to implement any of the service health status display methods in the above embodiments.

In one example, the electronic device may also include a communication interface 1003 and a bus 1010. Among them, as shown in Figure 10, the processor 1001, the memory 1002, and the communication interface 1003 are connected through the bus 1010 and complete communication with each other.

The communication interface 1003 is mainly used to implement communication between modules, devices, units and/or equipment in the embodiments of this application.

Bus 1010 includes hardware, software, or both, coupling components of an electronic device to one another. By way of example, and not limitation, the bus may include Accelerated Graphics Port (AGP) or other graphics bus, Enhanced Industry Standard Architecture (EISA) bus, Front Side Bus (FSB), HyperTransport (HT) interconnect, Industry Standard Architecture (ISA) Bus, Infinite Bandwidth Interconnect, Low Pin Count (LPC) Bus, Memory Bus, Micro Channel Architecture (MCA) Bus, Peripheral Component Interconnect (PCI) Bus, PCI-Express (PCI-X) Bus, Serial Advanced Technology Attachment (SATA) bus, Video Electronics Standards Association Local (VLB) bus or other suitable bus or a combination of two or more of these. Where appropriate, bus 1010 may include one or more buses. Although the embodiments of this application describe and illustrate a specific bus, this application contemplates any suitable bus or interconnection.

It can be understood that the processor in the embodiments of the present application can be a central processing unit (CPU), or other general-purpose processor, digital signal processor (DSP), or application-specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor.

The method steps in the embodiments of the present application can be implemented by hardware or by a processor executing software instructions. Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable rom) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media. An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and storage media may be located in an ASIC.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server or data center to another website through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. , computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (SSD)), etc.

It can be understood that the various numerical numbers involved in the embodiments of the present application are only for convenience of description and are not used to limit the scope of the embodiments of the present application.

Claims

A method for displaying service health status, which is characterized by including:

Determine the query time range of data query;

Determine the indicator data of the service within the query time range. The indicator data of the service is used to reflect the original health status of the service;

According to the indicator data of the service within the query time range, determine the frequency of each of the multiple original health states of the service within the query time range;

According to the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the service in the multiple original health states within the query time, the status identifier of the service is displayed for the user to use the service according to the The status identifier of the service is used to locate the fault. The status identifier is used to indicate the final health state of the service. The weight is used to represent the degree of influence of the frequency of the original health state within the query time range on the final health state.
The method of claim 1, further comprising:

Determine the respective weights corresponding to the multiple original health states according to the preconfigured weight determination model and the preset time length threshold;

Wherein, the weight determination model is used to indicate the corresponding relationship between the frequency of multiple original health states of the service and the final health state of the service.
The method of claim 2, further comprising:

Determining state score ranges corresponding to the multiple final health states according to the weights corresponding to the multiple original health states and a preset time length threshold;

Displaying the status identifier of the service based on the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the service in the multiple original health states within the query time includes:

Calculate the status score corresponding to the service according to the corresponding weights of the multiple original health states of the service and the frequency of occurrence of the multiple original health states within the query time;

The status identifier of the service is displayed according to the status score corresponding to the service and the score ranges corresponding to the multiple final health statuses.
The method of claim 1, further comprising:

Determine the query step size of data query;

The determining of the indicator data served within the query time range includes:

According to the query step size and the query time range, the indicator data of the service is collected from all the indicator data of the service.
The method according to claim 4, characterized in that the data amount of the indicator data of the service is not greater than the target amount;

Wherein, the target number is determined according to a preconfigured collection step, and the collection step is used to indicate the length of the time interval between the indicator data of the service.
The method according to claim 4, characterized in that determining the query step size of data query includes:

Determine the collection step size, which is used to indicate the length of the time interval between all indicator data of the service;

The query step length is determined according to the collection step length, a preset time length threshold and the query time range.
The method according to any one of claims 1 to 6, characterized in that the status identification includes a color identification or a shape identification.
The method according to any one of claims 1-6, characterized in that there are multiple services.
The method according to any one of claims 1-6, characterized in that the method further includes:

Receive the user's operation on the status identification of the target service;

According to the operation, the original health status of the target service at multiple moments within the query time range is displayed.
A device for displaying a service health status, comprising:

The first determination module is used to determine the query time range of data query;

The second determination module is used to determine the indicator data of the service within the query time range, and the indicator data of the service is used to reflect the original health status of the service;

The third determination module is used to determine multiple original keys of the service based on the indicator data of the service within the query time range. The frequency of each health status occurring within the query time range;

The display module is used to display the status identifier of the service according to the weights corresponding to the various original health states of the service and the frequency of occurrence of the service in the various original health states within the query time, so that the user can locate the fault according to the status identifier of the service. The status identifier is used to indicate the final health state of the service, and the weight is used to indicate the degree of influence of the frequency of occurrence of the original health state on the final health state.
The device according to claim 10, characterized in that the device further includes:

The fourth determination module is used to determine the corresponding weights of the multiple original health states according to the preconfigured weight determination model and the preset time length threshold;

Wherein, the weight determination model is used to indicate the corresponding relationship between the frequency of multiple original health states of the service and the final health state of the service.
The device according to claim 11, characterized in that the fourth determination module is further configured to determine multiple final health states based on respective weights corresponding to the multiple original health states and a preset time length threshold. The respective corresponding status score ranges;

A calculation module configured to calculate the status score corresponding to the service based on the corresponding weights of the multiple original health states of the service and the frequency of each of the multiple original health states appearing within the query time;

The display module is used to display the status identifier of the service according to the status score corresponding to the service and the score ranges corresponding to the multiple final health states.
The device according to claim 10, characterized in that the first determination module is also used to determine the query step size of the data query;

The second determination module is configured to collect indicator data of the service from all indicator data of the service according to the query step size and the query time range.
The device according to claim 13, characterized in that the data amount of the indicator data of the service is not greater than the target amount;

Wherein, the target number is determined according to a preconfigured collection step, and the collection step is used to indicate the length of the time interval between the indicator data of the service.
The device according to claim 13, characterized in that the first determination module is used to determine the collection step size, and the collection step size is used to indicate the length of the time interval between all indicator data of the service; according to the The query step is determined based on the collection step, the preset time length threshold and the query time range.
The device according to any one of claims 10 to 15, characterized in that the status identification includes a color identification or a shape identification.
The device according to any one of claims 10 to 15, characterized in that there are multiple services.
The device according to any one of claims 10 to 15, characterized in that the device further comprises:

The receiving module is used to receive the user's operation on the status identification of the target service;

The display module is also configured to display the original health status of the target service at multiple moments within the query time range according to the operation.
A device for displaying a service health status, comprising:

At least one memory for storing programs;

At least one processor, configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute the method according to any one of claims 1-9.
A display device for serving health status, characterized in that the device runs computer program instructions to execute the method according to any one of claims 1-9.
A computer storage medium. Instructions are stored in the computer storage medium. When the instructions are run on a computer, they cause the computer to execute the method according to any one of claims 1 to 9.
A computer program product containing instructions that, when run on a computer, cause the computer to perform the method according to any one of claims 1-9.