CN111107084B

CN111107084B - Monitoring method, monitoring device, electronic equipment and storage medium

Info

Publication number: CN111107084B
Application number: CN201911309723.9A
Authority: CN
Inventors: 孙雪皓; 朱明悦
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2022-10-11
Anticipated expiration: 2039-12-18
Also published as: CN111107084A

Abstract

The present disclosure provides a monitoring method, a monitoring device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring service data generated in the running process of each monitored service monitored by an RPC monitoring system; determining each service to be aggregated, which is started by an aggregation switch, in each monitored service; the aggregation switch is used for controlling whether service data of monitored services are aggregated or not, and the on-off state of the aggregation switch is determined based on the load state of the RPC monitoring system and the aggregation priority of the service type of each monitored service; and for each service to be aggregated, aggregating the acquired service data of the service to obtain an aggregation index of the service. By applying the embodiment provided by the disclosure, the influence of load overload on the normal work of the RPC monitoring system can be avoided, and the usability of the RPC monitoring system is improved.

Description

Monitoring method, monitoring device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer network technologies, and in particular, to a monitoring method and apparatus, an electronic device, and a storage medium.

Background

RPC (Remote Procedure Call Protocol) refers to Remote communication and mutual calls between different systems in a distributed system architecture, where the calls between the different systems include calls between services in the different systems.

For the RPC, an RPC monitoring system is usually adopted to monitor each service in the calling process, and aggregate indexes such as availability are obtained, so that alarm information can be provided based on the relationship between the aggregate indexes and a preset threshold value, a technician can quickly locate a fault occurring in the calling process, and the normal operation of each system is ensured.

However, as the functions of each system are increased, the number of monitored services that the RPC monitoring system needs to monitor is also increased, and thus, the load of service data aggregation performed by the RPC monitoring system is increased. Thus, when the RPC monitoring system is overloaded, the availability of the RPC monitoring system continues to decrease. Under the condition, the RPC monitoring system cannot work normally, so that each monitored service cannot be monitored, and further, fault upgrading in the system calling process may be caused.

Based on this, a monitoring method is needed to avoid the influence of the overload on the normal operation of the RPC monitoring system, and to improve the usability of the RPC monitoring system.

Disclosure of Invention

The present disclosure provides a monitoring method, an apparatus, an electronic device, and a storage medium, to at least solve the problem of reduced availability of an RPC monitoring system due to overload in related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a monitoring method, including:

acquiring service data generated in the running process of each monitored service monitored by an RPC monitoring system;

determining each service to be aggregated with an aggregation switch started in each monitored service based on the load state of the RPC monitoring system and the aggregation priority of the service type of each monitored service; the aggregation switch is used for controlling whether to aggregate service data of the monitored service or not;

and for each service to be aggregated, aggregating the acquired service data of the service to obtain an aggregation index of the service.

Optionally, in a specific implementation manner, the method further includes:

if the RPC monitoring system is overloaded, starting the closing operation of the aggregation switch of the monitored service meeting the closing condition;

wherein the shutdown conditions are: the aggregation priority of the data types is lower than the preset priority.

Optionally, in a specific implementation manner, the method further includes:

if the RPC monitoring system is not overloaded, starting the starting operation of the aggregation switch of the monitored service, which is closed by the aggregation switch;

and aggregating the acquired service data of the service aiming at the monitored service restarted by each aggregation switch to obtain the aggregation index of the service.

Optionally, in a specific implementation manner, the step of obtaining service data generated in an operation process of each monitored service monitored by the RPC monitoring system includes:

acquiring service data generated in the running process of each monitored service monitored by the RPC monitoring system in each preset period;

the step of aggregating the acquired service data of the service to obtain an aggregation index of the service for each service to be aggregated includes:

and for each service to be aggregated, aggregating the acquired service data of the service to obtain an aggregation index of the service in each period.

Optionally, in a specific implementation manner, each monitored service is: the called service;

the service data includes: at least one of the number of times of being called in each period, the number of times of success of being called in each period, the number of times of failure of being called in each period, and the delay of being called in each period;

the aggregate indicators for the service include: at least one of a query per second rate QPS within said each period, availability within said each period, and an average latency within said each period.

Optionally, in a specific implementation manner, for each service to be aggregated, an aggregation index of the service in a plurality of the periods is obtained; and determining the running state of the service based on the obtained aggregation indexes.

According to a second aspect of the embodiments of the present disclosure, there is provided an RPC monitoring apparatus, including:

the data acquisition module is configured to acquire service data generated in the running process of each monitored service monitored by the RPC monitoring system;

the service determination module is configured to determine each service to be aggregated, in each monitored service, with an aggregation switch turned on, based on the load state of the RPC monitoring system and the aggregation priority of the service type to which each monitored service belongs; wherein the aggregation switch is configured to control whether to aggregate service data of the monitored services;

and the data aggregation module is configured to aggregate the acquired service data of the service to obtain an aggregation index of the service for each service to be aggregated.

Optionally, in a specific implementation manner, the apparatus further includes:

a first starting switch configured to start a shutdown operation of an aggregation switch of monitored services satisfying a shutdown condition if the RPC monitoring system is overloaded;

a second starting switch configured to start a starting operation of an aggregation switch of a monitored service for which the aggregation switch is turned off if the RPC monitoring system load is not overloaded;

and the data recalculation module is configured to aggregate the acquired service data of the service aiming at the monitored service restarted by each aggregation switch to obtain an aggregation index of the service.

Alternatively, in one particular implementation,

the data acquisition module is specifically configured to acquire service data generated in the running process of each monitored service monitored by the RPC monitoring system in each preset period;

the data aggregation module is specifically configured to aggregate the acquired service data of the service for each service to be aggregated, so as to obtain an aggregation indicator of the service in each period.

Optionally, in a specific implementation manner, the monitored services are: the called service;

the aggregate index for the service includes: at least one of a query-per-second rate QPS over said each period, an availability over said each period, and an average latency over said each period.

the state determining module is configured to obtain an aggregation index of each service to be aggregated in a plurality of periods; and determining the running state of the service based on the obtained aggregation indexes.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory configured to store the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the steps of any of the monitoring methods as provided in the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the steps of any one of the monitoring methods as provided by the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product which, when run on a computer, causes the computer to perform the steps of any of the monitoring methods as provided by the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

by applying the monitoring method provided by the embodiment of the disclosure, when server data generated in the operation process of each monitored service is aggregated, the monitored service which needs to be subjected to data aggregation in each monitored service can be determined through the on-off state of the aggregation switch of each monitored service, so that only the service data of the monitored service which needs to be subjected to data aggregation is aggregated. The opening and closing states of the monitored services are determined based on the load state of the RPC monitoring system and the aggregation priority of the service type of each monitored service, so that the number of the monitored services which need to carry out data aggregation in the current load state can be determined according to the load degree of the RPC monitoring system and the aggregation priority of each monitored service, and the condition that the RPC monitoring system cannot work normally due to overload of the load is avoided. Based on this, by applying the monitoring method provided by the embodiment of the disclosure, the influence of the overload on the normal operation of the RPC monitoring system can be avoided, and the usability of the RPC monitoring system is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow chart illustrating a monitoring method according to an exemplary embodiment.

FIG. 2 is a flow chart illustrating another monitoring method according to an exemplary embodiment.

FIG. 3 is a flow chart illustrating another monitoring method according to an exemplary embodiment.

FIG. 4 is a schematic diagram of an actual application of an RPC monitoring system.

FIG. 5 is a block diagram illustrating a monitoring device according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.

With the continuous increase of each system function, the number of monitored services required to be monitored by the RPC monitoring system also increases, and thus, the load of service data aggregation performed by the RPC monitoring system is larger and larger. Thus, when the RPC monitoring system is overloaded, the availability of the RPC monitoring system continues to decrease. Under the condition, the RPC monitoring system cannot work normally, so that each monitored service cannot be monitored, and further, fault upgrading in the system calling process may be caused.

To facilitate an understanding of one monitoring method provided by the present disclosure, a brief description of the RPC monitoring system is first provided below.

The RPC monitoring system is a system for monitoring the remote communication and mutual call between different systems in the distributed system architecture. Among different systems in the distributed system architecture, a service running in an electronic device of each system may call a service running in an electronic device of another system, where a service initiating the call may be referred to as a calling service, and correspondingly, a service called by the calling service may be referred to as a called service.

Furthermore, in order to ensure that the calling service can smoothly call the called service and ensure that the called service can normally operate after being called, the running state of the calling service and/or the called service needs to be monitored, and therefore when a certain service fails, a fault alarm is performed, so that a technician can accurately position the fault and timely solve the fault.

In order to ensure smooth monitoring, the RPC monitoring system can be in communication connection with electronic devices capable of running various services in each system of the distributed system architecture, so as to realize interaction.

Wherein, RPC monitored control system can include: the service data collection device, the service data scanning device, the service data storage device, and the service data aggregation device, and the calling service and the called service may be collectively referred to as a monitored service.

Specifically, when the monitored service runs locally, the local electronic device may generate a data file of the monitored service, where the data file records various types of service data generated by the monitored service during the running process, for example, when the monitored service is a master service, the service data may include: calling time length, calling type of called service, calling frequency and the like; when the monitored service is a scheduled service, the service data may be: number of calls, call latency, etc.

In this way, the service data collection device in the RPC monitoring system may obtain the data file of each monitored service from the electronic device in which each monitored service is located, where the service data collection device may be an SDK (Software Development Kit). Furthermore, the service data scanning device in the RPC monitoring system can periodically scan the data files of each monitored service collected by the service data device, and store the data files in the service data storage device.

Based on this, the service data aggregation device may obtain the data file of each monitored service from the service data storage device, that is, obtain the service data generated in the operation process of each monitored service, and aggregate the service data of the monitored service for each monitored service, thereby obtaining the aggregation index of the monitored service. Furthermore, the service data aggregation device can determine whether the monitored service has a fault according to the relation between the aggregation index and a preset index threshold value, and send out an alarm signal when determining that the monitored service has the fault.

The service data aggregation device may periodically acquire the service data, generated in the operation process, of each monitored service from the service data storage device according to a preset period, and perform data aggregation.

In addition, in the RPC monitoring system, the service data collection device, the service data scanning device, the service data storage device, and the service data aggregation device may be electronic devices, that is, different electronic devices are used to respectively implement service data collection, service data scanning, service data storage, and service data aggregation; the functions implemented by the above-mentioned multiple devices may also be integrated into one device to implement, that is, one device implements multiple functions in service data collection, service data scanning, service data storage, and service data aggregation, and in the same device, different functional modules may be set to implement various functions provided by the device respectively. This is all reasonable.

Next, a monitoring method provided by an embodiment of the present disclosure is described.

Fig. 1 is a flowchart illustrating a monitoring method according to an exemplary embodiment, wherein the method is applied in a service data aggregation device in an RPC monitoring system, which is hereinafter referred to as an aggregation device.

It should be noted that the aggregation device may be used to implement only the service data aggregation function, or may also implement at least one function of the service data collection device, the service data scanning device, and the service data storage.

As shown in fig. 1, a monitoring method provided by the present disclosure may include the following steps:

in step S11, service data generated during the operation process of each monitored service monitored by the RPC monitoring system is obtained.

When the aggregation device starts to perform aggregation on the service data of each monitored service, the aggregation device may first acquire, from the storage device, the service data generated during the operation of each monitored service.

In step S12, determining each service to be aggregated, which is started by an aggregation switch, in each monitored service based on the load state of the RPC monitoring system and the aggregation priority of the service type of each monitored service;

the aggregation switch is used for controlling whether to aggregate service data of the monitored service or not;

furthermore, since the usability of the RPC monitoring system is improved in order to avoid adverse effects caused by the compliance with the overload, the aggregation device may not aggregate the service data of all the monitored data in each monitored service. Based on this, after the step S11 is executed, the aggregation device first determines, based on the load status of the RPC monitoring system and the aggregation priority of the service type to which each monitored service belongs, each service to be aggregated in each monitored service, which is to be aggregated when the aggregation switch is turned on, that is, determines the service to be aggregated, which needs to aggregate service data, in each monitored service.

Specifically, the aggregation priority of each service type may be determined according to the importance degree of the service type of each monitored service in each monitored distributed system construction in each system, for example, whether the service type is a core service in each system in the distributed system construction, whether the service type is an unexpected service, whether the service type is an illegal service, and the like. For example, the aggregate priority of core services is higher than the aggregate priority of unexpected services, which are higher than the aggregate priority of illegal services.

Therefore, when the load state of the RPC monitoring system is overload or overload, the aggregation switch of the monitored service of the service type with lower aggregation priority can be closed preferentially, so that the load of aggregation equipment is reduced, and the load of the RPC monitoring system is reduced.

It can be understood that, in order to ensure that the RPC monitoring system can operate normally, the RPC monitoring system can detect the load of the RPC monitoring system in real time.

Based on this, optionally, in a specific implementation manner, when the RPC monitoring system is overloaded, a closing operation of the aggregation switch of the monitored service meeting the closing condition may be initiated.

The closing condition is that the aggregation priority of the data type is lower than a preset priority, that is, when the RPC monitoring system is overloaded, the closing operation of the aggregation switch of the monitored service with the aggregation priority lower than the preset priority of the data type is started.

In this specific implementation manner, in order to ensure that as much service data of the monitored service as possible can be aggregated, the initial state of the aggregation switch of each monitored service may be an on state. Therefore, when the RPC monitoring system is overloaded, the closing operation of the aggregation switches of the monitored services of which the aggregation priority of the data types is lower than the preset priority can be started, and the aggregation switches of the monitored services are switched from the opening state to the closing state.

Specifically, in an embodiment, when the RPC monitoring system detects that the load of the RPC monitoring system is overloaded, an alarm signal may be sent out, so that a user may know the alarm signal, and thus, the user may manually switch the aggregation switch of the monitored service, which meets the shutdown condition, from the on state to the off state according to the preset aggregation priority of the service type to which each monitored service belongs.

In this way, in this embodiment, for each monitored server whose aggregation switch is in an on state, when the aggregation device detects the closing operation of the aggregation switch of the monitored service that meets the closing condition by the user, it may determine that the aggregation switch of the monitored service is off, and accordingly, the aggregation device may determine the monitored service for which the closing operation of the aggregation switch by the user is not detected as each service to be aggregated for which the aggregation switch is on.

For example, the aggregation priority of the service types kafka, redis, grpc, http may be: the aggregation priority of http and grpc is the same, the aggregation priority of kafka and reds is the same, and the aggregation priority of http and grpc is higher than the aggregation priority of kafka and reds. When the user knows that the RPC monitoring system is overloaded, the user can manually turn off the aggregation switch of the monitored service with the service types of kafka and redis. Furthermore, the monitoring device may detect a closing operation of the aggregation switch of the monitored services of which the service types are kafka and redis by the user, and thus determine the monitored services of which the service types are http and grpc and for which the closing operation of the aggregation switch by the user is not detected as the respective to-be-aggregated services for which the aggregation switch is turned on.

Of course, according to the specific requirements in the practical application, the user may also adjust the monitored service that needs to turn off the aggregation switch according to the specific requirements.

In addition, the on/off operation of the aggregation switch of the monitored service by the user can be recorded in the dynamic configuration file of the monitored service, so that the on/off state of the aggregation switch of each monitored service can be recorded in the dynamic configuration file, and therefore, the aggregation device can determine each service to be aggregated, in each monitored service, in which the aggregation switch is opened by acquiring the dynamic configuration file of the monitored service.

Optionally, in a specific implementation manner, the aggregation device may obtain a dynamic configuration file of the monitored service from another system, where the dynamic configuration file records an open/close state of an aggregation switch of each monitored service. In this way, after the aggregation device executes the step S11, the aggregation device may first acquire the dynamic configuration file from the other system, so as to read the on/off state of the aggregation switch of each monitored service from the dynamic configuration file, and further determine each service to be aggregated in which the aggregation switch is turned on in each monitored service.

Optionally, in another specific implementation manner, the dynamic configuration file may be stored in a database in the RPC monitoring system. In this way, after the aggregation device executes the step S11, the dynamic configuration file may be obtained from the database in the RPC monitoring system where the aggregation device is located, so as to read the on/off state of the aggregation switch of each monitored service from the dynamic configuration file, and further determine each service to be aggregated in each monitored service, in which the aggregation switch is turned on.

In this specific implementation manner, when the user switches the on/off state of the aggregation switch of each monitored service, the switching may be performed in a display interface of a web browser, for example, the user may switch the on/off state of the aggregation switch of the monitored service by clicking a switch on or switch off button corresponding to the monitored service in the display interface.

Wherein the browser interacts with the dynamic configuration file in the database through an interface. Therefore, when the user switches the on-off state of the aggregation switch of the monitored service in the display interface, the recorded on-off state of the aggregation switch of the monitored service can be changed in the dynamic configuration file. Therefore, the switching process of the on-off state of the aggregation switch of each monitored service is more convenient and quicker, and the operation is easy.

In step S13, for each service to be aggregated, aggregating the acquired service data of the service to obtain an aggregation index of the service.

Further, after determining each service to be aggregated that needs to aggregate the service data, the aggregation device may aggregate the acquired service data of each service to be aggregated, so as to obtain an aggregation result, and use the aggregation result as an aggregation index of the service.

In the process of executing step S13, the aggregation device may adopt various aggregation algorithms, for example, \8230;, 8230, and thus, the embodiment of the present disclosure is not particularly limited.

As can be seen from the above description of the RPC monitoring system, the service data generated by each monitored service in the operation process is stored in the service data storage device in the RPC monitoring system, that is, although in step S12, the aggregation device only aggregates the service data of each service to be aggregated, which is in each monitored service and is switched on by the aggregation switch, the service data of each monitored service, which is switched off by other aggregation switches, is still stored in the service data storage device and is not lost.

Therefore, when the RPC monitoring system detects that the load of the RPC monitoring system is overloaded, the aggregation switch of the monitored service meeting the closing condition is closed in each monitored service, at the moment, the aggregation equipment only aggregates the service data of each service to be aggregated, which is opened by the aggregation switch, in each monitored service, and the user can detect each monitored service so as to remove the fault causing the overload of the load.

Furthermore, when the fault causing the overload of the load is removed, the RPC monitoring system can detect that the load of the RPC monitoring system is changed from the overload state to the non-overload state. In this way, in order to comprehensively monitor the operation states of all the monitored services, in an optional specific implementation manner, the monitoring method provided by the present disclosure may further include the following steps:

when the load of the RPC monitoring system is not overloaded, starting the starting operation of the aggregation switch of the monitored service for the closing of the aggregation switch;

and aggregating the acquired service data of the service aiming at the monitored service which is restarted by each aggregation switch to obtain an aggregation index of the service.

In this specific implementation manner, a "recalculation function" is provided, that is, when the RPC monitoring system detects that the load of the RPC monitoring system is not overloaded, the RPC monitoring system may stop the alarm, so that the user may turn on the aggregation switch of each monitored service that was previously turned off. In this way, the aggregation device may determine, in the service data generated in the operation process of each monitored service acquired in step S11, the service data of each monitored service for which the aggregation switch is restarted, and re-aggregate the service data, so as to obtain an aggregation index of each monitored service for which the aggregation switch is restarted at the aggregation switch closing stage.

Obviously, in this specific implementation manner, by the "recalculation function", after the aggregation switch of the monitored service that is turned off by the aggregation switch is turned back on, the aggregation indexes that are not obtained when the aggregation switch of the monitored service is turned off by the aggregation switch may be calculated additionally, so that the aggregation indexes of each monitored service are complemented, and monitoring of all operation processes of the monitored service is achieved.

According to the above description, in the operation process of the aggregation device, the monitored services that need to perform data aggregation may be dynamically adjusted according to the load status of the PRC monitoring system and the aggregation priority of the service type to which each monitored service belongs, that is, when the load is overloaded, the aggregation switch of a part of the monitored services is turned off, so as to reduce the load of the RPC monitoring system; and when the load is not overloaded, restarting the aggregation switch of a part of monitored services to realize the calculation of the aggregation indexes of the monitored services as much as possible.

Moreover, since the aggregation priority of the core service in each system in the construction of the distributed system can be set as the highest priority, even if the aggregation switch of a part of monitoring services is closed, the continuous aggregation of the service data of the core service can be always ensured, so that the continuous monitoring capability of the RPC monitoring system on the core service is ensured.

It can be understood that the RPC monitoring system monitors each monitored service in real time, and for the aggregation device, the RPC monitoring system may aggregate the service data of each monitored service in real time, or aggregate the service data of each monitored service according to a certain period.

Based on this, optionally, in a specific implementation manner, when the aggregation device aggregates the service data of each monitored service according to a certain period, as shown in fig. 2, the step S11 may include the following steps:

in step S21, service data generated during the operation process of each monitored service monitored by the RPC monitoring system in each preset period is obtained.

Accordingly, in this embodiment, the step S13 may include the following steps:

in step S23, for each service to be aggregated, the acquired service data of the service is aggregated, so as to obtain an aggregation indicator of the service in each period.

That is, the calculation cycle of the aggregation index of the monitored service in the RPC monitoring system may be preset, so that the aggregation device may obtain, from the storage device, service data generated in each operation process when each monitored service operates within the time length every time length corresponding to the calculation cycle. Correspondingly, the aggregation index calculated by the aggregation equipment reflects the running state of the monitored equipment in the time length.

In addition, because the on-off state of the aggregation switch is determined based on the load state of the RPC monitoring system and the aggregation priority of the service type to which each monitored service belongs, when the RPC monitoring system can know the service type of each scheduled service and/or master service, in the RPC monitoring system provided by the present disclosure, the monitored service can be each scheduled service and/or master service.

In the embodiment shown in fig. 2, the execution manner of step S22 is the same as that of step S12, and is not repeated here.

Optionally, in an embodiment, each monitored service monitored by the RPC monitoring system may be each called service.

Correspondingly, when the aggregation device determines the aggregation indicator of the monitored device according to the preset period, the service data of the monitored device, which is acquired by the aggregation device, may include: at least one of the number of times of being called in each period, the number of times of success of being called in each period, the number of times of failure of being called in each period, and the time delay of being called in each period;

thus, based on the above-mentioned various service data, the aggregation device may calculate at least one of a QPS (Query Per Second) of the monitored device in the period, an availability in the period, and an average delay in the period.

Wherein QPS characterizes: the service inquiry times that the electronic equipment where the service is located can respond per second is a measurement index of the amount of flow processed by the electronic equipment where the service is located in a specified time;

usability is characterized by: the ratio of the number of times of successfully calling the called service to the total number of calling times in the period is a measurement index of the capacity of the called service for responding to the calling request;

the average delay is characterized by: in the period, when the calling service calls the called service, the average response time of the called service is a measure of the timely response capability of the called service.

Further, optionally, in a specific implementation manner, as shown in fig. 3, on the basis of the embodiment shown in fig. 2, the monitoring method provided by the present disclosure may further include the following steps:

in step S24, for each service to be aggregated, obtaining an aggregation indicator of the service in multiple periods; and determining the running state of the service based on the obtained aggregation indexes.

In this specific implementation manner, since a monitoring method provided by the present disclosure is performed periodically, and for each period, the aggregation device may obtain an aggregation index of the service in the period. In this way, after a plurality of cycles, the aggregation device may obtain a plurality of aggregation indexes of the service. And, in a plurality of aggregation indexes obtained, each aggregation index corresponds to the running state of the service in one period.

In this way, for each service to be aggregated, the aggregation device may determine, based on the obtained aggregation indicators, an operating state of the service in a duration corresponding to a plurality of periods.

Optionally, for each service to be aggregated, an index graph of the service to be aggregated may be drawn in the two-dimensional coordinate system by using the obtained multiple aggregation indexes. The abscissa of the two-dimensional coordinate system is a label of each period, wherein the labels of each period are arranged from early to late according to the actual time corresponding to the period, and the ordinate is a polymerization index in each period. Therefore, the running state of the service to be aggregated can be determined according to the transformation condition of each aggregation index in the drawn index graph.

Further, since the monitoring method provided by the present disclosure has a "recalculation function," for each monitored service for which the aggregation switch is turned off in step S12, the aggregation index of the monitored service in each of the multiple cycles can be obtained by using the "recalculation function," and thus, the operation state of the monitored service can be determined by using the obtained multiple aggregation indexes as well.

In order to facilitate understanding of one monitoring method provided by the present disclosure, the monitoring method is explained below by a schematic diagram as shown in fig. 4.

The service A is a calling service, the service B is a called service, the monitored service is a called service, the original data are service information and service data of each calling service and each called service, the service information and the service data are stored in the storage device, and the monitoring method provided by the disclosure is executed in the aggregation device.

Furthermore, after obtaining the aggregation index of each invoked service, the obtained aggregation index can be displayed by using various display devices, and since the monitored service is the invoked service, the aggregation index can also be used as the invocation index, that is, the obtained invocation index can be displayed. And then, when the obtained calling index is deviated from the preset monitoring threshold value, determining that the operation state of the called service has a fault, and alarming the fault.

Correspondingly, the service responsible person, that is, the user can obtain the calling index of each called service obtained by the aggregation device, and receive the alarm signal when the aggregation device gives an alarm.

Specifically, in the aggregation device, the on-off state of the aggregation switch of each scheduled service can be dynamically adjusted according to the load of the RPC monitoring system, so that the service data of the scheduled service that the aggregation switch is turned on are aggregated to obtain an aggregation index, and the aggregation is abandoned for the service data of the scheduled service that the aggregation switch is turned off.

Further, when the load of the RPC monitoring system is allowed, the aggregation switch of the called service whose aggregation switch is turned off may be turned back on, so as to re-aggregate the service data of the called service in the process of turning off the aggregation switch through the recalculation function, and obtain the aggregation index of the called service in the process of turning off the aggregation switch.

FIG. 5 is a block diagram illustrating a monitoring device according to an exemplary embodiment. Referring to fig. 5, the apparatus includes a data acquisition module 510, a service determination module 520, and a data aggregation module 530.

The data obtaining module 510 is configured to obtain service data generated in the running process of each monitored service monitored by the RPC monitoring system; .

The service determination module 520 is configured to determine, based on the load status of the RPC monitoring system and the aggregation priority of the service type to which each monitored service belongs, each service to be aggregated in which an aggregation switch is turned on in each monitored service; wherein the aggregation switch is configured to control whether to aggregate service data of the monitored services, and the on-off state of the aggregation switch is determined based on the load state of the RPC monitoring system and the aggregation priority of the service type to which each monitored service belongs;

the data aggregation module 530 is configured to aggregate, for each service to be aggregated, the acquired service data of the service to obtain an aggregation indicator of the service.

By applying the scheme provided by the embodiment of the disclosure, when aggregating server data generated by each monitored service in the operation process, the monitored service which needs to be subjected to data aggregation in each monitored service can be determined through the on-off state of the aggregation switch of each monitored service, so that only service data of the monitored service which needs to be subjected to data aggregation is aggregated. The opening and closing states of the monitored services are determined based on the load states of the RPC monitoring system and the aggregation priorities of the service types of the monitored services, so that the number of the monitored services which need to be subjected to data aggregation in the current load state can be determined according to the load degree of the RPC monitoring system and the aggregation priorities of the monitored services, and the condition that the RPC monitoring system cannot work normally due to overload of the load is avoided. Based on this, by applying the monitoring method provided by the embodiment of the disclosure, the influence of the overload on the normal operation of the RPC monitoring system can be avoided, and the usability of the RPC monitoring system is improved.

a first starting switch configured to start a shutdown operation of an aggregation switch of the monitored service satisfying a shutdown condition if the RPC monitoring system is overloaded;

a second starting switch configured to start a starting operation of an aggregation switch of a monitored service for aggregation switch closing if the RPC monitoring system load is not overloaded;

and the data recalculation module is configured to aggregate the acquired service data of the service aiming at the monitored service which is restarted by each aggregation switch to obtain an aggregation index of the service.

Alternatively, in one particular implementation,

the data obtaining module 510 is specifically configured to obtain service data generated in the running process of each monitored service monitored by the RPC monitoring system in each preset period;

the data aggregation module 530 is specifically configured to aggregate, for each service to be aggregated, the acquired service data of the service, so as to obtain an aggregation indicator of the service in each period.

the aggregate indicators for the service include: at least one of a query-per-second rate QPS over said each period, an availability over said each period, and an average latency over said each period.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 6 is a block diagram illustrating an electronic device 600 for monitoring in accordance with an exemplary embodiment. Referring to fig. 6, electronic device 600 includes a processing component 622 that further includes one or more processors, and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform any of the monitoring methods described above.

The electronic device 600 may also include a power component 626 configured to perform power management for the electronic device 600, a wired or wireless network interface 650 configured to connect the electronic device 600 to a network, and an input/output (I/O) interface 658. The electronic device 600 may operate based on an operating system, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, stored in the memory 632.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 632 comprising instructions, executable by the processing component 622 of the electronic device 600 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of monitoring, the method comprising:

acquiring service data generated in the running process of each monitored service monitored by an RPC monitoring system; the service data is stored in service data storage equipment in the RPC monitoring system;

determining each service to be aggregated with an aggregation switch opened in each monitored service based on the load state of the RPC monitoring system and the aggregation priority of the service type of each monitored service; the aggregation switch is used for controlling whether to aggregate service data of the monitored service or not;

if the RPC monitoring system is overloaded, starting the closing operation of the aggregation switch of the monitored service meeting the closing condition; the shutdown conditions are as follows: the aggregation priority of the data types is lower than the preset priority;

for each service to be aggregated, aggregating the acquired service data of the service to obtain an aggregation index of the service;

if the RPC monitoring system is not overloaded, starting the starting operation of the aggregation switch of the monitored service for the closing of the aggregation switch; and aggregating the acquired service data of the service stored in the service data storage equipment aiming at the monitored service restarted by each aggregation switch to obtain an aggregation index of the service.

2. The method of claim 1,

the step of obtaining the service data generated in the running process of each monitored service monitored by the RPC monitoring system comprises the following steps:

3. The method of claim 2,

the monitored services are: the called service;

the service data includes: at least one of the number of times of being called in each period, the number of times of being successfully called in each period, the number of times of failing to be called in each period, and the delay of being called each time in each period;

4. The method of claim 3, further comprising:

for each service to be aggregated, acquiring an aggregation index of the service in a plurality of periods; and determining the running state of the service based on the obtained aggregation indexes.

5. A monitoring device, the device comprising:

the data acquisition module is configured to acquire service data generated in the running process of each monitored service monitored by the RPC monitoring system; the service data is stored in service data storage equipment in the RPC monitoring system;

a first starting switch configured to start a shutdown operation of an aggregation switch of the monitored service satisfying a shutdown condition if the RPC monitoring system is overloaded; the shutdown conditions are as follows: the aggregation priority of the data types is lower than the preset priority;

the data aggregation module is configured to aggregate the acquired service data of the service to obtain an aggregation index of the service for each service to be aggregated;

and the data recalculation module is configured to aggregate the acquired service data of the service stored in the service data storage device aiming at the monitored service which is restarted by each aggregation switch to obtain an aggregation index of the service.

6. The apparatus of claim 5,

7. The apparatus of claim 6,

the respective monitored services are: the called service;

8. The apparatus of claim 7, further comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the monitoring method of any one of claims 1 to 4.

10. A storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the monitoring method according to any one of claims 1 to 4.