CN113971187A

CN113971187A - A service monitoring method and device

Info

Publication number: CN113971187A
Application number: CN202010721171.9A
Authority: CN
Inventors: 徐海平; 雷希; 马晓骥; 王小均; 瞿航
Original assignee: China Mobile Communications Group Co Ltd; China Mobile IoT Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile IoT Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2022-01-25

Abstract

Embodiments of the present invention provide a service monitoring method and apparatus. The method includes: acquiring a user's operation log; counting the user's business operation behavior according to the user's operation log; assigning different weights to each business operation behavior of the user to obtain the user's weighted business operation; and determining the user's business operation according to the user's weighted business operation Whether there is any behavior that causes the server to be abnormal. By obtaining the user's operation log, the present invention can finally accurately and quickly identify the cause of service abnormality, identify in time the behavior that may lead to service instability due to the user's abnormal operation, and give early warning and processing, which has the advantages of simple operation and low cost. lower advantage.

Description

Service monitoring method and device

Technical Field

The present invention relates to the technical field of service monitoring, and in particular, to a service monitoring method and apparatus.

Background

At present, existing service monitoring schemes can be roughly divided into two categories, one is to add a monitoring interface to a service or periodically adjust an existing interface to verify the availability of the service, and when the monitoring result shows that the interface is unavailable or monitoring information is abnormal, early warning is performed. The other type is that traditional monitoring is carried out based on CPU utilization rate, memory utilization rate, inflow and outflow of flow and the like of the server, when CPU, memory utilization rate and flow increase are monitored, service abnormity is identified, relevant operation and maintenance personnel are informed, abnormity is checked by the relevant personnel, and corresponding processing measures are taken according to response checking results.

The method that a monitoring interface is added to a service or an existing interface is periodically adjusted to verify the availability of the service needs to be modified in an intrusive mode on the original service in some existing schemes, and the service modification cost is high.

The monitoring of the operation condition of the server is the whole monitoring of the service platform, and when an exception occurs, the problem of the service or the error use of the user cannot be identified. It needs to be checked and analyzed by corresponding personnel to determine.

The prior art can give an early warning notification only after the service is abnormal, and can not avoid some service abnormalities in advance.

Disclosure of Invention

The invention aims to provide a service monitoring method and a service monitoring device, and aims to solve the problems of high cost and labor consumption of the conventional service monitoring scheme.

In order to solve the technical problems, the technical scheme of the invention is as follows:

in one aspect of the present invention, a service monitoring method is provided, including:

acquiring an operation log of a user;

counting the service operation behaviors of the user according to the operation log of the user;

giving different weights to each business operation behavior of the user to obtain the weighted business operation of the user;

and determining whether the service operation behavior of the user has a behavior causing server abnormity according to the user weighted service operation.

Further, giving different weights to each business operation behavior of the user to obtain the user weighted business operation, comprising:

and according to the preset proportion of occupying server resources, giving different weights to each business operation behavior of the user to obtain the user weighted business operation.

Further, the operation log of the user comprises at least one of the following items:

the information type comprises user service behavior information or service state information;

operating time;

user service type behavior operation information in a preset period;

the address and port of the server to which it belongs.

Further, determining whether the service operation behavior of the user has a behavior causing server abnormality according to the user weighted service operation, including:

adopting a polymerization algorithm for the user weighted service operation to obtain an abnormal value;

if the abnormal value is larger than a preset threshold value, determining that the current abnormal operation of the user is performed;

calling historical operation behaviors and historical abnormal operations of the user;

and judging whether the service operation behaviors of the user have behaviors causing server abnormity or not according to the historical operation behaviors, the historical abnormal operations and the current abnormal operations of the user, and if so, outputting a judgment result.

Further, the method further comprises:

acquiring health state information reported by a server;

and judging whether the service reliability is low or not according to the health state information, and if so, outputting a judgment result.

Further, the server health status information includes:

the information type comprises user behavior information or service state information;

the number of online devices of the current server;

current server CPU utilization;

the current server memory usage rate;

the current service condition of the server disk space;

the uplink data flow of the server in a preset period;

the server downlink data flow in a preset period;

the address and port of the server.

Further, if the health status information includes at least one of the following conditions, it is determined that the service reliability is low:

the health state information reported by the server is not obtained in N continuous preset periods;

the uplink/downlink data traffic is suddenly increased or decreased;

abnormity occurs in the availability/CPU utilization rate/memory utilization rate/disk space utilization condition of the server.

Further, the method further comprises:

and early warning is carried out according to the judgment result.

In another aspect of the present invention, there is provided a service monitoring apparatus, including:

the acquisition module is used for acquiring an operation log of a user;

the statistical module is used for counting the business operation behaviors of the user according to the operation log of the user;

the weighting module is used for giving different weights to each business operation behavior of the user to obtain the weighted business operation of the user;

and the judging module is used for determining whether the service operation behavior of the user has a behavior causing the server abnormity according to the user weighted service operation.

Further, the weighting module is specifically configured to:

The scheme of the invention at least comprises the following beneficial effects:

according to the scheme, the reason caused by the abnormal service can be accurately and quickly identified finally by acquiring the operation log of the user, the behavior possibly causing the unstable service due to the abnormal operation (such as pressure measurement) of the user can be identified in time, early warning and processing are carried out in advance, and the method and the device have the advantages of being simple in operation and low in cost.

Drawings

FIG. 1 is a step diagram of a service monitoring method of the present invention;

fig. 2 is a flowchart of step S4;

FIG. 3 is a device connection diagram of a service monitoring apparatus of the present invention;

fig. 4 is a flow chart of the operation of a service monitoring method of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 1, an embodiment of the present invention provides a service monitoring method, including:

s1, acquiring an operation log of the user;

s2, counting the service operation behavior of the user according to the operation log of the user;

s3, giving different weights to each business operation behavior of the user to obtain the weighted business operation of the user;

and S4, determining whether the business operation behaviors of the user have behaviors causing server abnormity according to the user weighted business operation.

In an optional embodiment of the present invention, step S3 assigns different weights to each business operation behavior of the user to obtain a user weighted business operation, including:

The rule to assign weights may be: the operation occupying more server resources for one service has higher weight; the weight occupied by infrequently used resources in normal services is higher. The method can be preset according to actual conditions and user requirements so as to improve adaptability and accuracy of final results.

In an optional embodiment of the present invention, the operation log of the user includes, but is not limited to, at least one of the following:

operating time;

behavior operation information of user service types (the user service types can be marked as 1, 2 and … … N) in a preset period;

ip address and port of the server to which it belongs.

By utilizing the information, whether the service operation behavior of the user has the behavior causing the server abnormity can be accurately identified.

Referring to fig. 2, in an alternative embodiment of the present invention, the step S4 determines whether there is a behavior causing server exception in the business operation behavior of the user according to the user-weighted business operation, including:

If the abnormal value is larger than the preset threshold value, the possibility of abnormal operation of the user is indicated, then the weight distribution of each service is further analyzed for the information with the possibility of abnormal operation, and whether the abnormal behavior of the server caused by pressure measurement, attack, error use and the like exists in the operation of the user is judged by combining the previous operation behavior condition of the user and a determined case (such as consistent weight distribution for a plurality of times continuously and large weight ratio of the operation of the abnormal service), so that the accuracy of the judgment result is improved.

In an optional embodiment of the invention, the method further comprises:

acquiring health state information reported by a server;

Not only does the possibility of whether the user has abnormal operation be considered, but also the server is judged, and the monitoring accuracy is improved.

In an optional embodiment of the present invention, the server health status information includes:

the number of online devices of the current server;

current server CPU utilization;

the current server memory usage rate;

the current service condition of the server disk space;

the uplink data flow of the server in a preset period;

the server downlink data flow in a preset period;

the ip address and port of the server itself.

The situation basically covers basic data of the server which can cause problems, and the reliability of the server is judged by using the information, so that the method is comprehensive and is beneficial to improving the accuracy of the judgment result.

In an optional embodiment of the present invention, if the health status information includes at least one of the following conditions, it is determined that the service reliability is low:

the uplink/downlink data traffic is suddenly increased or decreased;

In an optional embodiment of the invention, the method further comprises:

and early warning is carried out according to the judgment result.

The mode that can send the judged result to monitor terminal carries out the early warning to the control personnel, and is more convenient, swift, helps the control personnel in time to discover and handle the problem.

In an optional embodiment of the invention, the method further comprises:

and processing according to the judgment result.

The processing mode can be as follows: the temporary blocking or black-drawing processing is carried out on the user account, or the business data processing of the user with abnormal operation is transferred to a temporary area, and the temporary area only provides basic limited services or restarts corresponding services.

As shown in fig. 3, an embodiment of the present invention further provides a service monitoring apparatus, including:

the acquisition module is used for acquiring an operation log of a user;

In an optional embodiment of the present invention, the weighting module is specifically configured to:

operating time;

ip address and port of the server to which it belongs.

In an optional embodiment of the present invention, the determining module is specifically configured to:

In an optional embodiment of the present invention, the obtaining module is further configured to:

acquiring health state information reported by a server;

the number of online devices of the current server;

current server CPU utilization;

the current server memory usage rate;

the current service condition of the server disk space;

the uplink data flow of the server in a preset period;

the server downlink data flow in a preset period;

the address and port of the server.

the uplink/downlink data traffic is suddenly increased or decreased;

In an optional embodiment of the invention, the apparatus further comprises:

and the early warning module is used for carrying out early warning according to the judgment result.

In an optional embodiment of the invention, the apparatus further comprises:

and the processing module is used for processing according to the judgment result.

It should be noted that the apparatus is an apparatus corresponding to the method described in fig. 1, and all the implementations of the illustrated method are applicable to the embodiment of the apparatus, and the same technical effects can be achieved.

Referring to fig. 4, a workflow of a service monitoring method according to an embodiment of the present invention is:

fi l ebeat (which is open-source third-party software, is applied in a lightweight way, filters the operation log of a user from a server with little overhead, and has little influence on the service) pulls the operation log of the user from the server to a message queue middleware Kafka (the message queue middleware Kafka is high-concurrency and low-delay distributed middleware and is used for caching the operation log information of the user, the operation behavior data of the user and the health state data of a server). The user business behavior statistical module (including the functions of the acquisition module and the statistical module) pulls the operation log of the user from the kafka, acquires the health state data of the server, periodically (for example, 1 minute) counts the business operation behavior information of the user and the health state information of the server, and sends the business operation behavior information and the health state information of the server to the message queue middleware kakfa. And the user behavior analysis and service health analysis module (comprising the function of the judgment module) pulls the service operation behavior information and the server health state information of the user from the kafka, and analyzes and judges whether the behavior of the user, such as pressure measurement, attack and the like, causes abnormal behaviors of the server. And analyzing the current service reliability level according to the health status. The abnormity early warning and informing module (comprising the functions of the early warning module) analyzes the user behavior and judges and early warns the service health analysis module, and can inform operation and maintenance personnel and users by sending early warning information to the terminal. The early warning information can be stored, classified according to the type and the requirement of the event, and informed to relevant operation and maintenance personnel and users in an email or short message mode, so that subsequent statistical analysis is facilitated. To prevent too many emails or short messages, the following notification strategy may be adopted: and for the short message or the mail for informing the operation and maintenance personnel, checking whether an unprocessed alarm exists every 1 hour, and if so, sending the mail or the short message to inform the operation and maintenance personnel to process as soon as possible. And when a new alarm is generated, checking whether unprocessed information exists, if so, notifying within one hour before, not repeating the notification, and if not, sending an email or short message notification. The short message or the mail for notifying the user is notified only once, and the short message or the mail notification can not be continued until the user work order is processed. The abnormal processing module (namely, the processing module) performs abnormal intelligent processing according to the judgment of the user behavior analysis and service health analysis module, and the processing mode can be temporary blocking or black-drawing processing on a user account, or processing and transferring the business data of the user with abnormal operation to a temporary area, wherein the temporary area only provides basic limited service, or restarts corresponding service.

In the embodiment of the invention, each behavior of the user is counted, and the weight of different services is given according to the principle that the operation which occupies more server resources in one service occupies higher weight and the operation which occupies less frequently used resources in normal services occupies higher weight. And judging whether the user has behavior (such as pressure measurement) causing unstable service according to the weight aggregation calculation of each service of the user and the weight distribution of the service in front of the user. After the user is judged to have the action of causing the service instability, a certain processing scheme is adopted to ensure the stability of the service, and the intervention is not carried out until the visible fault of the service occurs. And judging whether the current server normally operates according to the number of online users reported by the server, the service data flow and the like so as to evaluate the health degree and reliability of the service.

According to the service monitoring method and device provided by the embodiment of the invention, based on the monitoring scheme of user operation, the reasons of abnormal pressure measurement users and abnormal service can be accurately identified, the behavior possibly causing unstable service can be identified and processed in advance, and the stability of the service is ensured; case scenes can be accumulated, and operation and maintenance and users are actively informed when abnormality occurs; scenes can be accumulated according to the operation condition of the user, and processing measures such as black drawing, temporary area switching and the like can be actively taken.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. a service monitoring method, is characterized in that, comprises:

Get the user's operation log;

According to the user's operation log, the user's business operation behavior is counted;

Different weights are given to each business operation behavior of the user, and the user's weighted business operation is obtained;

According to the user's weighted service operations, it is determined whether the user's service operation behavior has any behavior that causes the server to be abnormal.

2. service monitoring method according to claim 1, is characterized in that, giving each business operation behavior of user different weight, obtains user's weighted business operation, comprising:

According to the preset proportion of occupied server resources, different weights are given to each service operation behavior of the user, and the user weighted service operation is obtained.

3. The service monitoring method according to claim 2, wherein the operation log of the user includes at least one of the following:

Information type, including user business behavior information or service status information;

operating time;

User service type behavior operation information within a preset period;

The address and port of the owning server.

4. The service monitoring method according to claim 1, wherein, according to the user's weighted business operation, it is determined whether the user's business operation behavior has the behavior that causes the server to be abnormal, comprising:

Aggregation algorithm is used for user-weighted business operations to obtain an outlier;

If the abnormal value is greater than the preset threshold, it is determined that the user is currently operating abnormally;

Retrieve the user's historical operation behavior and historical abnormal operations;

According to the user's historical operation behavior, historical abnormal operation, and current abnormal operation, determine whether the user's business operation behavior has any behavior that causes the server to be abnormal, and if so, output the judgment result.

5. The service monitoring method according to claim 1 or 4, characterized in that, further comprising:

Obtain the health status information reported by the server;

According to the health state information, it is judged whether the service reliability is low, and if so, the judgment result is output.

6. The service monitoring method according to claim 5, wherein the server health status information comprises:

Type of information, including user behavior information or service status information;

The current number of online devices on the server;

Current server CPU usage;

Current server memory usage;

Current server disk space usage;

The upstream data traffic of the server within a preset period;

Server downlink data traffic within a preset period;

The address and port of the server.

7. The service monitoring method according to claim 6, wherein, if the health status information includes at least one of the following conditions, it is determined that the service reliability is low:

The health status information reported by the server is not obtained for N consecutive preset cycles;

Sudden increase or decrease of uplink/downlink data traffic;

The server's availability/CPU usage/memory usage/disk space usage is abnormal.

8. The service monitoring method according to claim 4, further comprising:

According to the judgment result, an early warning is given.

9. A service monitoring device, comprising:

The acquisition module is used to acquire the user's operation log;

The statistics module is used to count the user's business operation behavior according to the user's operation log;

The weighting module is used to assign different weights to the user's business operation behaviors to obtain the user's weighted business operations;

The judgment module is used to determine whether the user's business operation behavior has any behavior that causes the server to be abnormal according to the user's weighted business operation.

10. The service monitoring device according to claim 9, wherein the weighting module is specifically used for: