Disclosure of Invention
The invention aims to provide a service monitoring method and a service monitoring device, and aims to solve the problems of high cost and labor consumption of the conventional service monitoring scheme.
In order to solve the technical problems, the technical scheme of the invention is as follows:
in one aspect of the present invention, a service monitoring method is provided, including:
acquiring an operation log of a user;
counting the service operation behaviors of the user according to the operation log of the user;
giving different weights to each business operation behavior of the user to obtain the weighted business operation of the user;
and determining whether the service operation behavior of the user has a behavior causing server abnormity according to the user weighted service operation.
Further, giving different weights to each business operation behavior of the user to obtain the user weighted business operation, comprising:
and according to the preset proportion of occupying server resources, giving different weights to each business operation behavior of the user to obtain the user weighted business operation.
Further, the operation log of the user comprises at least one of the following items:
the information type comprises user service behavior information or service state information;
operating time;
user service type behavior operation information in a preset period;
the address and port of the server to which it belongs.
Further, determining whether the service operation behavior of the user has a behavior causing server abnormality according to the user weighted service operation, including:
adopting a polymerization algorithm for the user weighted service operation to obtain an abnormal value;
if the abnormal value is larger than a preset threshold value, determining that the current abnormal operation of the user is performed;
calling historical operation behaviors and historical abnormal operations of the user;
and judging whether the service operation behaviors of the user have behaviors causing server abnormity or not according to the historical operation behaviors, the historical abnormal operations and the current abnormal operations of the user, and if so, outputting a judgment result.
Further, the method further comprises:
acquiring health state information reported by a server;
and judging whether the service reliability is low or not according to the health state information, and if so, outputting a judgment result.
Further, the server health status information includes:
the information type comprises user behavior information or service state information;
the number of online devices of the current server;
current server CPU utilization;
the current server memory usage rate;
the current service condition of the server disk space;
the uplink data flow of the server in a preset period;
the server downlink data flow in a preset period;
the address and port of the server.
Further, if the health status information includes at least one of the following conditions, it is determined that the service reliability is low:
the health state information reported by the server is not obtained in N continuous preset periods;
the uplink/downlink data traffic is suddenly increased or decreased;
abnormity occurs in the availability/CPU utilization rate/memory utilization rate/disk space utilization condition of the server.
Further, the method further comprises:
and early warning is carried out according to the judgment result.
In another aspect of the present invention, there is provided a service monitoring apparatus, including:
the acquisition module is used for acquiring an operation log of a user;
the statistical module is used for counting the business operation behaviors of the user according to the operation log of the user;
the weighting module is used for giving different weights to each business operation behavior of the user to obtain the weighted business operation of the user;
and the judging module is used for determining whether the service operation behavior of the user has a behavior causing the server abnormity according to the user weighted service operation.
Further, the weighting module is specifically configured to:
and according to the preset proportion of occupying server resources, giving different weights to each business operation behavior of the user to obtain the user weighted business operation.
The scheme of the invention at least comprises the following beneficial effects:
according to the scheme, the reason caused by the abnormal service can be accurately and quickly identified finally by acquiring the operation log of the user, the behavior possibly causing the unstable service due to the abnormal operation (such as pressure measurement) of the user can be identified in time, early warning and processing are carried out in advance, and the method and the device have the advantages of being simple in operation and low in cost.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, an embodiment of the present invention provides a service monitoring method, including:
s1, acquiring an operation log of the user;
s2, counting the service operation behavior of the user according to the operation log of the user;
s3, giving different weights to each business operation behavior of the user to obtain the weighted business operation of the user;
and S4, determining whether the business operation behaviors of the user have behaviors causing server abnormity according to the user weighted business operation.
According to the scheme, the reason caused by the abnormal service can be accurately and quickly identified finally by acquiring the operation log of the user, the behavior possibly causing the unstable service due to the abnormal operation (such as pressure measurement) of the user can be identified in time, early warning and processing are carried out in advance, and the method and the device have the advantages of being simple in operation and low in cost.
In an optional embodiment of the present invention, step S3 assigns different weights to each business operation behavior of the user to obtain a user weighted business operation, including:
and according to the preset proportion of occupying server resources, giving different weights to each business operation behavior of the user to obtain the user weighted business operation.
The rule to assign weights may be: the operation occupying more server resources for one service has higher weight; the weight occupied by infrequently used resources in normal services is higher. The method can be preset according to actual conditions and user requirements so as to improve adaptability and accuracy of final results.
In an optional embodiment of the present invention, the operation log of the user includes, but is not limited to, at least one of the following:
the information type comprises user service behavior information or service state information;
operating time;
behavior operation information of user service types (the user service types can be marked as 1, 2 and … … N) in a preset period;
ip address and port of the server to which it belongs.
By utilizing the information, whether the service operation behavior of the user has the behavior causing the server abnormity can be accurately identified.
Referring to fig. 2, in an alternative embodiment of the present invention, the step S4 determines whether there is a behavior causing server exception in the business operation behavior of the user according to the user-weighted business operation, including:
adopting a polymerization algorithm for the user weighted service operation to obtain an abnormal value;
if the abnormal value is larger than a preset threshold value, determining that the current abnormal operation of the user is performed;
calling historical operation behaviors and historical abnormal operations of the user;
and judging whether the service operation behaviors of the user have behaviors causing server abnormity or not according to the historical operation behaviors, the historical abnormal operations and the current abnormal operations of the user, and if so, outputting a judgment result.
If the abnormal value is larger than the preset threshold value, the possibility of abnormal operation of the user is indicated, then the weight distribution of each service is further analyzed for the information with the possibility of abnormal operation, and whether the abnormal behavior of the server caused by pressure measurement, attack, error use and the like exists in the operation of the user is judged by combining the previous operation behavior condition of the user and a determined case (such as consistent weight distribution for a plurality of times continuously and large weight ratio of the operation of the abnormal service), so that the accuracy of the judgment result is improved.
In an optional embodiment of the invention, the method further comprises:
acquiring health state information reported by a server;
and judging whether the service reliability is low or not according to the health state information, and if so, outputting a judgment result.
Not only does the possibility of whether the user has abnormal operation be considered, but also the server is judged, and the monitoring accuracy is improved.
In an optional embodiment of the present invention, the server health status information includes:
the information type comprises user behavior information or service state information;
the number of online devices of the current server;
current server CPU utilization;
the current server memory usage rate;
the current service condition of the server disk space;
the uplink data flow of the server in a preset period;
the server downlink data flow in a preset period;
the ip address and port of the server itself.
The situation basically covers basic data of the server which can cause problems, and the reliability of the server is judged by using the information, so that the method is comprehensive and is beneficial to improving the accuracy of the judgment result.
In an optional embodiment of the present invention, if the health status information includes at least one of the following conditions, it is determined that the service reliability is low:
the health state information reported by the server is not obtained in N continuous preset periods;
the uplink/downlink data traffic is suddenly increased or decreased;
abnormity occurs in the availability/CPU utilization rate/memory utilization rate/disk space utilization condition of the server.
In an optional embodiment of the invention, the method further comprises:
and early warning is carried out according to the judgment result.
The mode that can send the judged result to monitor terminal carries out the early warning to the control personnel, and is more convenient, swift, helps the control personnel in time to discover and handle the problem.
In an optional embodiment of the invention, the method further comprises:
and processing according to the judgment result.
The processing mode can be as follows: the temporary blocking or black-drawing processing is carried out on the user account, or the business data processing of the user with abnormal operation is transferred to a temporary area, and the temporary area only provides basic limited services or restarts corresponding services.
As shown in fig. 3, an embodiment of the present invention further provides a service monitoring apparatus, including:
the acquisition module is used for acquiring an operation log of a user;
the statistical module is used for counting the business operation behaviors of the user according to the operation log of the user;
the weighting module is used for giving different weights to each business operation behavior of the user to obtain the weighted business operation of the user;
and the judging module is used for determining whether the service operation behavior of the user has a behavior causing the server abnormity according to the user weighted service operation.
According to the scheme, the reason caused by the abnormal service can be accurately and quickly identified finally by acquiring the operation log of the user, the behavior possibly causing the unstable service due to the abnormal operation (such as pressure measurement) of the user can be identified in time, early warning and processing are carried out in advance, and the method and the device have the advantages of being simple in operation and low in cost.
In an optional embodiment of the present invention, the weighting module is specifically configured to:
and according to the preset proportion of occupying server resources, giving different weights to each business operation behavior of the user to obtain the user weighted business operation.
The rule to assign weights may be: the operation occupying more server resources for one service has higher weight; the weight occupied by infrequently used resources in normal services is higher. The method can be preset according to actual conditions and user requirements so as to improve adaptability and accuracy of final results.
In an optional embodiment of the present invention, the operation log of the user includes, but is not limited to, at least one of the following:
the information type comprises user service behavior information or service state information;
operating time;
behavior operation information of user service types (the user service types can be marked as 1, 2 and … … N) in a preset period;
ip address and port of the server to which it belongs.
By utilizing the information, whether the service operation behavior of the user has the behavior causing the server abnormity can be accurately identified.
In an optional embodiment of the present invention, the determining module is specifically configured to:
adopting a polymerization algorithm for the user weighted service operation to obtain an abnormal value;
if the abnormal value is larger than a preset threshold value, determining that the current abnormal operation of the user is performed;
calling historical operation behaviors and historical abnormal operations of the user;
and judging whether the service operation behaviors of the user have behaviors causing server abnormity or not according to the historical operation behaviors, the historical abnormal operations and the current abnormal operations of the user, and if so, outputting a judgment result.
If the abnormal value is larger than the preset threshold value, the possibility of abnormal operation of the user is indicated, then the weight distribution of each service is further analyzed for the information with the possibility of abnormal operation, and whether the abnormal behavior of the server caused by pressure measurement, attack, error use and the like exists in the operation of the user is judged by combining the previous operation behavior condition of the user and a determined case (such as consistent weight distribution for a plurality of times continuously and large weight ratio of the operation of the abnormal service), so that the accuracy of the judgment result is improved.
In an optional embodiment of the present invention, the obtaining module is further configured to:
acquiring health state information reported by a server;
and judging whether the service reliability is low or not according to the health state information, and if so, outputting a judgment result.
Not only does the possibility of whether the user has abnormal operation be considered, but also the server is judged, and the monitoring accuracy is improved.
In an optional embodiment of the present invention, the server health status information includes:
the information type comprises user behavior information or service state information;
the number of online devices of the current server;
current server CPU utilization;
the current server memory usage rate;
the current service condition of the server disk space;
the uplink data flow of the server in a preset period;
the server downlink data flow in a preset period;
the address and port of the server.
The situation basically covers basic data of the server which can cause problems, and the reliability of the server is judged by using the information, so that the method is comprehensive and is beneficial to improving the accuracy of the judgment result.
In an optional embodiment of the present invention, if the health status information includes at least one of the following conditions, it is determined that the service reliability is low:
the health state information reported by the server is not obtained in N continuous preset periods;
the uplink/downlink data traffic is suddenly increased or decreased;
abnormity occurs in the availability/CPU utilization rate/memory utilization rate/disk space utilization condition of the server.
In an optional embodiment of the invention, the apparatus further comprises:
and the early warning module is used for carrying out early warning according to the judgment result.
The mode that can send the judged result to monitor terminal carries out the early warning to the control personnel, and is more convenient, swift, helps the control personnel in time to discover and handle the problem.
In an optional embodiment of the invention, the apparatus further comprises:
and the processing module is used for processing according to the judgment result.
The processing mode can be as follows: the temporary blocking or black-drawing processing is carried out on the user account, or the business data processing of the user with abnormal operation is transferred to a temporary area, and the temporary area only provides basic limited services or restarts corresponding services.
It should be noted that the apparatus is an apparatus corresponding to the method described in fig. 1, and all the implementations of the illustrated method are applicable to the embodiment of the apparatus, and the same technical effects can be achieved.
Referring to fig. 4, a workflow of a service monitoring method according to an embodiment of the present invention is:
fi l ebeat (which is open-source third-party software, is applied in a lightweight way, filters the operation log of a user from a server with little overhead, and has little influence on the service) pulls the operation log of the user from the server to a message queue middleware Kafka (the message queue middleware Kafka is high-concurrency and low-delay distributed middleware and is used for caching the operation log information of the user, the operation behavior data of the user and the health state data of a server). The user business behavior statistical module (including the functions of the acquisition module and the statistical module) pulls the operation log of the user from the kafka, acquires the health state data of the server, periodically (for example, 1 minute) counts the business operation behavior information of the user and the health state information of the server, and sends the business operation behavior information and the health state information of the server to the message queue middleware kakfa. And the user behavior analysis and service health analysis module (comprising the function of the judgment module) pulls the service operation behavior information and the server health state information of the user from the kafka, and analyzes and judges whether the behavior of the user, such as pressure measurement, attack and the like, causes abnormal behaviors of the server. And analyzing the current service reliability level according to the health status. The abnormity early warning and informing module (comprising the functions of the early warning module) analyzes the user behavior and judges and early warns the service health analysis module, and can inform operation and maintenance personnel and users by sending early warning information to the terminal. The early warning information can be stored, classified according to the type and the requirement of the event, and informed to relevant operation and maintenance personnel and users in an email or short message mode, so that subsequent statistical analysis is facilitated. To prevent too many emails or short messages, the following notification strategy may be adopted: and for the short message or the mail for informing the operation and maintenance personnel, checking whether an unprocessed alarm exists every 1 hour, and if so, sending the mail or the short message to inform the operation and maintenance personnel to process as soon as possible. And when a new alarm is generated, checking whether unprocessed information exists, if so, notifying within one hour before, not repeating the notification, and if not, sending an email or short message notification. The short message or the mail for notifying the user is notified only once, and the short message or the mail notification can not be continued until the user work order is processed. The abnormal processing module (namely, the processing module) performs abnormal intelligent processing according to the judgment of the user behavior analysis and service health analysis module, and the processing mode can be temporary blocking or black-drawing processing on a user account, or processing and transferring the business data of the user with abnormal operation to a temporary area, wherein the temporary area only provides basic limited service, or restarts corresponding service.
In the embodiment of the invention, each behavior of the user is counted, and the weight of different services is given according to the principle that the operation which occupies more server resources in one service occupies higher weight and the operation which occupies less frequently used resources in normal services occupies higher weight. And judging whether the user has behavior (such as pressure measurement) causing unstable service according to the weight aggregation calculation of each service of the user and the weight distribution of the service in front of the user. After the user is judged to have the action of causing the service instability, a certain processing scheme is adopted to ensure the stability of the service, and the intervention is not carried out until the visible fault of the service occurs. And judging whether the current server normally operates according to the number of online users reported by the server, the service data flow and the like so as to evaluate the health degree and reliability of the service.
According to the service monitoring method and device provided by the embodiment of the invention, based on the monitoring scheme of user operation, the reasons of abnormal pressure measurement users and abnormal service can be accurately identified, the behavior possibly causing unstable service can be identified and processed in advance, and the stability of the service is ensured; case scenes can be accumulated, and operation and maintenance and users are actively informed when abnormality occurs; scenes can be accumulated according to the operation condition of the user, and processing measures such as black drawing, temporary area switching and the like can be actively taken.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.