CN117194175A - Log alarm monitoring method and device and computer storage medium - Google Patents

Log alarm monitoring method and device and computer storage medium Download PDF

Info

Publication number
CN117194175A
CN117194175A CN202311447729.9A CN202311447729A CN117194175A CN 117194175 A CN117194175 A CN 117194175A CN 202311447729 A CN202311447729 A CN 202311447729A CN 117194175 A CN117194175 A CN 117194175A
Authority
CN
China
Prior art keywords
log
data
monitoring
configuration information
alarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311447729.9A
Other languages
Chinese (zh)
Inventor
张路
毛东成
华成龙
宋乃鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiajia Technology Co ltd
Original Assignee
Jiajia Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiajia Technology Co ltd filed Critical Jiajia Technology Co ltd
Priority to CN202311447729.9A priority Critical patent/CN117194175A/en
Publication of CN117194175A publication Critical patent/CN117194175A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application provides a log alarm monitoring method, a device and a computer storage medium, wherein the method comprises the following steps: acquiring acquisition configuration information; according to the acquisition configuration information, acquiring log data of a target server, and shunting the log data to a disk for storage through kafka; converting the log data into structured data, and shunting the structured data to an elastic search cluster through kafka for storage; acquiring monitoring strategy configuration information on a monitoring alarm system; performing pre-aggregation calculation on the structured data according to the monitoring strategy configuration information, generating index data, and reporting the index data to a monitoring alarm system, wherein the index data comprises log keywords and log indexes; and carrying out alarm detection on the index data by the monitoring alarm system according to the monitoring strategy configuration information, and outputting alarm information. The application can improve the monitoring performance of the monitoring system and the operation and maintenance quality of the log system.

Description

Log alarm monitoring method and device and computer storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a log alarm monitoring method, a log alarm monitoring device, and a computer storage medium.
Background
The log can be used for monitoring the running state of the system and describing fault scenes, and plays an indispensable role in daily operation and maintenance work, so that log alarms are required to be monitored.
However, with the development of computer science, the system environment becomes more complex, and the workload of log monitoring alarms increases. Based on the above, how to improve the monitoring performance of the log alarming monitoring system so as to improve the operation and maintenance quality of the system becomes a problem to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a log alarm monitoring method, a log alarm monitoring device and a computer storage medium, which are used for solving the problems of the related technology and have the following technical scheme:
in a first aspect, an embodiment of the present application provides a log alarm monitoring method, including:
acquiring acquisition configuration information;
according to the acquisition configuration information, acquiring log data of a target server, and shunting the log data to a disk for storage through kafka;
converting the log data into structured data, and shunting the structured data to an elastic search cluster through the kafka for storage;
acquiring monitoring strategy configuration information on a monitoring alarm system;
Performing pre-aggregation calculation on the structured data according to the monitoring strategy configuration information to generate index data, and reporting the index data to the monitoring alarm system, wherein the index data comprises log keywords and log indexes;
and carrying out alarm detection on the index data by the monitoring alarm system according to the monitoring strategy configuration information, and outputting alarm information.
In one embodiment, the acquisition configuration information is obtained by:
configuring an acquisition item as the target server;
based on a log theme, configuring plug-in configuration information and log filtering information, wherein the plug-in configuration information comprises an acquisition type, an acquisition target, a log path and a log character set, and the log filtering information comprises filtering rules;
and acquiring the acquisition configuration information based on the acquisition item, the plug-in configuration information and the log filtering information.
In one embodiment, according to the acquisition configuration information, acquiring log data of the target server, and shunting the log data to the disk through kafka for storage includes:
calling a log collector to collect original log data of the target server according to the collection item and the plug-in configuration information, wherein the log collector is hosted on a agent, and the agent is installed and deployed on the target server;
Filtering the original log data according to the log filtering information to obtain the log data;
and reporting the log data to the kafka, and shunting the log data to the disk for storage through the kafka.
In one embodiment, the method further comprises:
and managing the life cycle of the log collector by using an Agent command pipeline, a data pipeline and a file pipeline.
In one embodiment, converting the log data into structured data, shunting the structured data to an elastic search cluster for storage by the kafka comprises:
acquiring a pre-configured cleaning rule;
according to the cleaning rule, cleaning the log data to convert the log data into the structured data;
acquiring pre-configured storage configuration information;
and shunting the structured data to the elastiscearch cluster for storage according to the storage configuration information through the kafka.
In one embodiment, the monitoring policy configuration information includes an index field, an aggregation method, and an aggregation period;
performing pre-aggregation calculation on the structured data according to the monitoring policy configuration information, and generating index data to report to the monitoring alarm system comprises the following steps:
And periodically performing pre-aggregation calculation on the structured data by taking the index field, the aggregation method and the aggregation period as aggregation dimensions, and generating the index data to report to the monitoring alarm system.
In one embodiment, the monitoring policy configuration information further includes a detection algorithm and an alarm triggering condition corresponding to the detection algorithm;
and carrying out alarm detection on the index data by the monitoring alarm system according to the monitoring strategy configuration information, wherein the output alarm information comprises:
detecting whether the index data meets the alarm triggering condition or not in the convergence period according to the detection algorithm by the monitoring alarm system;
generating an alarm event when the index data is determined to meet the alarm triggering condition;
and outputting the alarm information according to the alarm event.
In a second aspect, an embodiment of the present application further provides a log alarm monitoring device, including:
the acquisition module is used for acquiring acquisition configuration information; according to the acquisition configuration information, acquiring log data of a target server, and shunting the log data to a disk for storage through kafka;
the processing module is used for converting the log data into structured data, and shunting the structured data to an elastic search cluster through the kafka for storage;
The pre-aggregation module is used for acquiring monitoring strategy configuration information on the monitoring alarm system; performing pre-aggregation calculation on the structured data according to the monitoring strategy configuration information to generate index data, and reporting the index data to the monitoring alarm system, wherein the index data comprises log keywords and log indexes;
and the monitoring alarm module is used for carrying out alarm detection on the index data through the monitoring alarm system according to the monitoring strategy configuration information and outputting alarm information.
In one embodiment, the acquisition configuration information is obtained by the acquisition module by:
configuring an acquisition item as the target server;
based on a log theme, configuring plug-in configuration information and log filtering information, wherein the plug-in configuration information comprises an acquisition type, an acquisition target, a log path and a log character set, and the log filtering information comprises filtering rules;
and acquiring the acquisition configuration information based on the acquisition item, the plug-in configuration information and the log filtering information.
In one embodiment, the acquisition module is specifically configured to:
calling a log collector to collect original log data of the target server according to the collection item and the plug-in configuration information, wherein the log collector is hosted on a agent, and the agent is installed and deployed on the target server;
Filtering the original log data according to the log filtering information to obtain the log data;
and reporting the log data to the kafka, and shunting the log data to the disk for storage through the kafka.
In one embodiment, the acquisition module is further configured to:
and managing the life cycle of the log collector by using an Agent command pipeline, a data pipeline and a file pipeline.
In one embodiment, the processing module is specifically configured to:
acquiring a pre-configured cleaning rule;
according to the cleaning rule, cleaning the log data to convert the log data into the structured data;
acquiring pre-configured storage configuration information;
and shunting the structured data to the elastiscearch cluster for storage according to the storage configuration information through the kafka.
In one embodiment, the monitoring policy configuration information includes an index field, an aggregation method, and an aggregation period; the pre-polymerization module is specifically used for:
and periodically performing pre-aggregation calculation on the structured data by taking the index field, the aggregation method and the aggregation period as aggregation dimensions, and generating the index data to report to the monitoring alarm system.
In one embodiment, the monitoring policy configuration information further includes a detection algorithm and an alarm triggering condition corresponding to the detection algorithm; the monitoring alarm module is specifically used for:
detecting whether the index data meets the alarm triggering condition or not in the convergence period according to the detection algorithm by the monitoring alarm system;
generating an alarm event when the index data is determined to meet the alarm triggering condition;
and outputting the alarm information according to the alarm event.
In a third aspect, an embodiment of the present application further provides a computer apparatus, including: memory and a processor. The memory stores instructions that are loaded and executed by a processor to implement the method of any of the embodiments of the aspects described above. Wherein the memory and the processor communicate with each other through an internal connection path.
In a fourth aspect, embodiments of the present application further provide a computer readable storage medium, in which a computer program is stored, which when run on a computer implements the method in any of the embodiments of the above aspects.
The advantages or beneficial effects in the technical scheme at least comprise:
According to the application, the collected log data and the converted structured data are subjected to split-flow storage by introducing kafka, so that fault isolation between log data collection and log data cleaning can be realized, the expansibility and reliability of the system can be improved, the upgrading cost of the system can be reduced, the monitoring performance of a monitoring system can be improved, and fault isolation among systems (such as a log system and an alarm monitoring system) can be realized, thereby improving the operation and maintenance quality of the log system; according to the application, the structured data is pre-aggregated and calculated according to the configuration information of the monitoring strategy, and index data comprising log keywords and log indexes are generated and reported to the monitoring alarm system for alarm detection, so that the monitoring alarm requirements of enterprises on the log keywords and the log indexes can be met, the operation conditions of the operation and maintenance personnel monitoring system and the service can be assisted, faults are perceived in advance, the operation and maintenance quality can be improved, the pre-aggregation treatment of a large amount of log data can be realized, the detection performance of the monitoring system is improved, meanwhile, the aggregation calculation task can be converged, the request quantity of ES aggregation inquiry can be reduced, and the performance consumption of an ES cluster can be reduced.
The foregoing summary is for the purpose of the specification only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will become apparent by reference to the drawings and the following detailed description.
Drawings
In the drawings, the same reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily drawn to scale. It is appreciated that these drawings depict only some embodiments according to the disclosure and are not therefore to be considered limiting of its scope.
FIG. 1 is a schematic flow chart of a log alarm monitoring method provided by the application;
FIG. 2 is a schematic flow chart of another log alert monitoring method according to the present application;
FIG. 3 is a schematic flow chart of acquiring acquisition configuration information according to the present application;
FIG. 4 is a schematic diagram of an acquisition configuration editing page provided by the present application;
FIG. 5 is a schematic diagram of an interface for converting log data cleansing into structured data according to the present application;
FIG. 6 is a schematic diagram of a configuration interface of a storage device according to the present application;
FIG. 7 is a schematic diagram of a configuration interface for a save cycle according to the present application;
FIG. 8 is a schematic diagram of a configuration interface for monitoring policy configuration information according to the present application;
FIG. 9 is a diagram illustrating a message notification data protocol according to the present application;
Fig. 10 is a schematic diagram of a reporting protocol provided in the present application;
FIG. 11 is a schematic diagram of another configuration interface for monitoring policy configuration information according to the present application;
FIG. 12 is a schematic diagram of an alarm event according to the present application;
FIG. 13 is a schematic diagram of a configuration interface of an alarm notification rule according to the present application;
FIG. 14 is a block diagram of a log alert monitoring device according to the present application;
fig. 15 is a block diagram of a computer device according to the present application.
Detailed Description
Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
In the related art, there are two general types of log alert architecture designs of the conventional log system: monolithic application architecture and microservice architecture.
For a single application architecture, a log monitoring and alarming module is generally integrated in a log system, and the architecture design has the following disadvantages:
a. the expansibility is poor: as systems become larger, it becomes difficult to expand and maintain, resulting in development difficulties, high risk of reconfiguration;
b. The upgrading cost is high: when the system is required to be changed or upgraded, the whole system must be redeployed, so that service interruption is caused;
c. the reliability is poor: if a problem occurs in one module, the normal operation of the whole system may be affected.
For adopting a micro-service architecture, the log system, the monitoring and alarming module are generally split, but the coupling degree of the monitoring system and the log system is high, and the monitoring system depends on the query and analysis functions provided by the log system, so that the architecture design has the following defects:
A. the upgrading cost is high: if the inquiry and analysis functions of the log system are changed, the monitoring system needs to be correspondingly upgraded;
B. poor fault isolation: if the log system is abnormal in service and performs upgrading maintenance operation, the usability of the monitoring service is also affected.
In order to solve the defects of the two design architectures, the application provides a log alarm monitoring scheme which is used for carrying out service splitting on a log system, a monitoring system and an alarm system, and carrying out pre-aggregation treatment on log data to obtain index data and report the index data to the monitoring system so as to improve the expansibility and reliability of the system, reduce the upgrading cost of the system, improve the monitoring performance of the monitoring system and realize fault isolation among the systems, thereby improving the operation and maintenance quality of the log system.
1-2, the log alarm monitoring method according to an embodiment of the present application may include the following steps:
s110, acquiring acquisition configuration information.
In one embodiment, as shown in fig. 3, the acquisition configuration information may be obtained by:
s111, configuring the acquisition item as a target server.
In a specific implementation, the collection item can be configured as the target server by configuring the collection item name as the name of the target server.
Illustratively, as shown in fig. 4, the collection item name is configured as an nmginx access log, i.e., the target server is a server named nmginx access log.
S112, configuring plug-in configuration information and log filtering information based on the log theme.
The log subject is a management unit of log products. It can be understood that, in this embodiment, the plug-in configuration information and the log filtering information are configured by using the management unit of the log product as a dimension.
In particular implementations, the plug-in configuration information may include, but is not limited to: the collection type, the collection target, the log path and the log character set. The log filtering information may include filtering rules.
Illustratively, as shown in FIG. 4, the plug-in configuration information may be configured to:
Acquisition type: a single line text log;
collecting a target: a node;
log path: data/bkey/logs/nginx/. Log;
log character set: UTF-8.
In specific implementation, the plug-in configuration information and/or the log filtering information may be configured by a preset template or may be configured in a user-defined manner, which is not limited in this embodiment. The plug-in configuration information and/or log filtering information configured by the preset template can be stored as the template so as to be convenient for the next direct use.
S113, acquiring acquisition configuration information based on the acquisition items, the plug-in configuration information and the log filtering information.
It is understood that collecting configuration information includes collecting items, plug-in information, and log filter information.
In one embodiment, an acquisition configuration editing page may be provided to facilitate configuring corresponding acquisition configuration information according to actual requirements. The acquisition configuration edit page may be as shown in fig. 4, for example.
In one embodiment, the collection configuration information may be stored on the target server, or may exist on a cloud, which is not limited in this embodiment. When step S110 is performed, the acquisition configuration information may be acquired from the target server or the cloud.
S120, collecting log data of the target server according to the collection configuration information, and shunting the log data to a disk for storage through kafka.
In one embodiment, the log collector may be invoked to collect the original log data of the target server according to the collection item and plug-in configuration information in the collection configuration information. The log collector is hosted on the agent, and the agent is installed and deployed on the target server.
In one embodiment, the original log data can be filtered according to the log filtering information in the acquisition configuration information to obtain the log data, so that the acquisition of the whole log data can be avoided, and the network bandwidth and the storage consumption are reduced. The log data is then reported to kafka, and is shunted to disk (not shown in FIG. 2) for storage via kafka. Among them, kafka is a distributed message system developed for collecting and delivering large-capacity log data with low latency.
For the collected original log data, filtering rules such as keyword matching rules, separator segmentation matching rules and the like can be set for filtering, collecting and reporting the original log data, so that the collected original log data can be filtered according to requirements, collection of the whole log data is avoided, and network bandwidth and storage consumption are reduced.
In the application, by using the log collector, logs such as the log of an operating system, the log of an application system, the log of a network device and the like, such as the log of a Linux/Windows system text (single-line and multi-line text logs), the log of a Windows event, the log of a Syslog and the like, can be supported, and breakpoint continuous collection can be supported, so that data loss is avoided. For example, the log collector caches the collection position of each log file after each collection report. If the log collector is abnormally stopped, the log collector can continue to collect from the last collecting position after recovering to be normal, and data loss can not be caused.
In one embodiment, the number of running cores, the upper limit of the memory and the CPU utilization rate and the like of the log collector can be limited through configuration, so that the normal use of the system and the application is prevented from being influenced.
In one embodiment, the lifecycle of the log collector can be managed using an Agent command pipeline, a data pipeline, a file pipeline. The log collector can manage the life cycle by using an Agent command pipeline, such as starting collection, stopping collection, restarting collection, reloading collection configuration and the like; the dynamic addition, modification, deletion and the like of the acquisition configuration information can be realized by using the file pipeline of the Agent.
In one embodiment, as shown in connection with fig. 1 and 2, log data may be reported to kafka, through which the log data is shunted to disk for storage.
In the application, the burst flow of the reported log data can be buffered by introducing kafka, so that the phenomenon that the full-link service is in avalanche caused by direct crushing of downstream consumption (such as log cleaning) is avoided, and the system stability can be improved.
S130, converting the log data into structured data, and shunting the structured data to an elastic search cluster through kafka for storage.
In one embodiment, a pre-configured cleaning rule may be obtained; and then cleaning and converting the log data into structured data according to cleaning rules, so as to facilitate subsequent operations such as inquiring, analyzing and monitoring the log data.
In specific implementation, the cleansing rule may be a cleansing rule that uses cleansing modes such as regular expressions, json format, separators, and the like to cleansing data.
Taking the following log data and regular expression cleansing rules as an example, structured data obtained after cleansing may be as follows:
log data:
10.11.10.34 - - [04/Jul/2023:19:45:12 +0800] "GET /cw_license/api/pub_key HTTP/1.1" 200 172 "-" "python-requests/2.22.0" "-"
regular expression cleaning rules:
(?P<clientIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s\-\s\-\s\[(?P<requestTime>.*?)\]\s\"(?P<requestMethod>[A-Z]*)\s(?P<requestUrl>.*?)\"\s(?P<responseStatus>[0-9]*)\s(?P<responseBytes>[0-9]*)\s\"(?P<httpHost>.*?)\"\s\"(?P<httpAgent>.*?)\"
structured data:
{
"httpAgent": "python-requests/2.22.0",
"clientIP": "10.11.10.34",
"httpHost": "-",
"requestMethod": "GET",
"requestTime": "04/Jul/2023:19:45:12 +0800",
"requestUrl": "/cw_license/api/pub_key HTTP/1.1",
"responseBytes": 172,
"responseStatus": 200,
}
As another example, log data (original log), regular expression cleansing rules (field extraction method), and structured data (extraction result) may be as shown in fig. 5.
In both examples, the structured data may include a numeric type (long\double\int) field.
In the application, by converting the log data into the structured data, the log key words (monitoring indexes) can be automatically generated, and the numerical value type field is extracted from the structured data as the log index (monitoring index).
In one embodiment, pre-configured storage configuration information may be obtained; and then the structured data is shunted to the elastiscearch cluster for storage through kafka according to the storage configuration information. The elastiscearch cluster is an open-source distributed, RESTful-style search and data analysis engine.
In particular implementations, the storage configuration information may include a storage device and a retention period. Wherein the storage device is configured as an elastiscearch cluster.
Illustratively, the elastiscearch cluster may be 172.16.32.13 as shown in fig. 6 and the save period may be 3 days as shown in fig. 7.
In the application, the structured data is shunted to the elastic search cluster for storage through the kafka, so that fault isolation between log data acquisition and log data cleaning can be realized, log data reporting and log data cleaning can be decoupled, the log data reporting and the log data cleaning can be separately expanded, the expansibility and the reliability of the system can be improved, the upgrading cost of the system can be reduced, the monitoring performance of the monitoring system can be improved, and fault isolation among systems (such as a log system and an alarm monitoring system) can be realized, thereby improving the operation and maintenance quality of the log system.
For example, kafka stores the collected and reported original log data on a disk, and how the downstream consumption (log cleaning) fails, so that the report of the original log data is not affected, and the cleaning of the log data can be continued after the downstream consumption is recovered to be normal; accordingly, the log data collection failure does not affect the cleaning of the log data. In this case, it is possible to support the rapid integration of the log data of other systems, as long as the log data is reported to kafka in accordance with the corresponding data reporting format. Meanwhile, other systems can be supported to directly dock the log data of the log system in the kafka consumption embodiment;
s140, acquiring monitoring strategy configuration information on the monitoring alarm system.
In one embodiment, the monitoring policy configuration information may be preconfigured according to the index data to be monitored, and stored on the monitoring alarm system, so as to facilitate subsequent alarm monitoring of log data in the log system based on the monitoring policy configuration information.
For example, the index field (numerical type, such as responseBytes, responseStatus) cleaned out may be configured with index monitoring, or keyword monitoring, and monitoring policy configuration information including information such as index name, convergence method (count\max\min\sum\avg), convergence period, and the like may be configured.
In one embodiment, a configuration interface for monitoring policy configuration information may be provided, so as to configure the monitoring policy configuration information according to actual requirements, and meet personalized requirements of users. By way of example, the configuration interface may be as shown in FIG. 8.
In one embodiment, the monitoring alarm system may synchronize the monitoring policy configuration information with a message through kafka, so that the log data may be directly aggregated and calculated according to the monitoring policy configuration information.
The messaging data protocol employed by kafka for message synchronization may be as shown in fig. 9, for example.
And S150, performing pre-aggregation calculation on the structured data according to the monitoring strategy configuration information, generating index data and reporting the index data to a monitoring alarm system.
In one embodiment, the index data may include, but is not limited to: log keywords and log indices.
In one embodiment, a unified log query service of the elastic search cluster may be provided to the outside through an API.
In one embodiment, the index field, the aggregation method and the aggregation period are taken as aggregation dimensions, and the structured data is periodically subjected to pre-aggregation calculation to generate index data and report the index data to the monitoring alarm system.
For example, as shown in fig. 8, taking a convergence period of 1 minute as an example, the structured data of the previous minute may be aggregated and calculated every minute according to the index field and the convergence method, so as to generate index data, and then the index data is reported to the monitoring alarm system.
For example, the following two logs (responsebytes=100, responsebytes=50) are configured with a monitoring policy of the responseBytes field AVG (average value) aggregation method, and the aggregate calculation result is { avg_responsebytes: 75} and reported.
{
"httpAgent": "python-requests/2.22.0",
"clientIP": "10.11.10.34",
"httpHost": "-",
"requestMethod": "GET",
"requestTime": "04/Jul/2023:19:45:12 +0800",
"requestUrl": "/cw_license/api/pub_key HTTP/1.1",
"responseBytes": 100,
"responseStatus": 200,
};
{
"httpAgent": "python-requests/2.22.0",
"clientIP": "10.11.10.34",
"httpHost": "-",
"requestMethod": "GET",
"requestTime": "04/Jul/2023:19:45:12 +0800",
"requestUrl": "/cw_license/api/pub_key HTTP/1.1",
"responseBytes": 50,
"responseStatus": 200,
}。
As an example, the reporting protocol employed to report the index data to the monitoring alarm system may be as shown in fig. 10.
In one embodiment, the monitoring policy configuration information may be created, updated, and deleted by listening to the consumption kafka message, and the lifecycle management may also be performed on the periodic pre-aggregate tasks according to the monitoring policy configuration information. The pre-aggregation task is used for periodically executing according to the aggregation period, and the current period carries out aggregation calculation on log data in the previous period.
When the method is specifically implemented, the pre-aggregation task is managed by taking a log theme, an index field and an aggregation method as management dimensions. The dimension of the aggregated query can be set as the union of monitoring dimensions of the monitoring policy configuration information, so that the request amount of the aggregated query of the data can be converged conveniently, and the performance consumption can be reduced.
In specific implementation, the structured data can be pre-aggregated by adopting the aggregation query capability of the elastic search, so that development cost can be saved without large-scale development. This is because elastomer search is a mature open source component that is more stable and open. The elastic search bears the pre-aggregation calculation, so that the calculation cost of log data pre-aggregation can be saved, and the log data pre-aggregation can normally run with few resources.
As an example, log keywords may be pre-aggregated using a COUNT aggregation method, and log metrics may be pre-aggregated using COUNT, MAX, MIN, SUM or AVG aggregation methods.
For example, there is a log theme where access log data for Nginx is collected and flushed to structured data.
The access log data is as follows:
10.11.10.34 - - [04/Jul/2023:19:45:12 +0800] "GET /cw_license/api/pub_key HTTP/1.1" 200 172 "-" "python-requests/2.22.0" "-"
the structured data is as follows:
{
"httpAgent": "python-requests/2.22.0",
"clientIP": "10.11.10.34",
"httpHost": "-",
"requestMethod": "GET",
"requestTime": "04/Jul/2023:19:45:12 +0800",
"requestUrl": "/cw_license/api/pub_key HTTP/1.1",
"responseBytes": 172,
"responseStatus": 200,
}
when the a user configures a monitoring policy of responseBytes (interface return data size) AVG (average value), monitoring is performed according to the clientIP (request end address) dimension. B user also configures the responseBytes (interface return data size) AVG (average value) monitoring policy, but configures httpAgent (request end type) dimension monitoring. This is two monitoring strategies, but the pre-aggregation will converge into one aggregated query task to converge two aggregated query requests into one request. AVG aggregation calculation is carried out on the content of the journal topic responseBytes field, and statistics can be carried out according to clientIP, httpAgent groups.
In one embodiment, the supervisory alarm system may store the metric data in an InfluxDB database.
In the application, the structured data (namely the structured log) is preprocessed into two index data of the log key word and the log index, and then the index data is connected with the monitoring system, so that the preprocessing of a large amount of log data can be realized, the detection performance of the monitoring system is improved, meanwhile, the converging calculation task can be converged, the request quantity of ES converging inquiry is reduced, and the performance consumption of the ES cluster is reduced.
S160, carrying out alarm detection on the index data through a monitoring alarm system according to the monitoring strategy configuration information, and outputting alarm information.
In specific implementation, in conjunction with fig. 8 and fig. 11, the monitoring policy configuration information may further include a monitoring dimension, a filtering condition, a detection algorithm, and an alarm triggering condition corresponding to the detection algorithm. For example, the monitoring dimension may be configured as an IP address, cloud zone ID. The filter condition may be used to set a filter value, such as 0, for the cloud zone ID. The detection algorithm may be used to detect the index data for a set alarm trigger threshold.
In one embodiment, whether the index data meets the alarm triggering condition can be detected by a monitoring alarm system in a convergence period according to a detection algorithm; generating an alarm event when the index data is determined to meet the alarm triggering condition; and outputting alarm information according to the alarm event.
For example, as shown in fig. 11, the aggregation period of the monitoring policy configuration is 1min, the static threshold of the detection algorithm configuration is greater than or equal to 100, and the triggering condition is that 1 detection algorithm trigger alarm notification is satisfied within 5 periods. When { avg_responseBytes: 75} has been aggregated and reported, the index data does not meet the alarm trigger condition that satisfies 1 detection algorithm (75 < 100) within 5 cycles. However, when { avg_responsebytes: 105} has been aggregated and counted and reported, the index data meets the alarm triggering condition that 1 detection algorithm (105 > 100) is satisfied in 5 periods, and an alarm event can be generated to perform alarm notification.
As one example, an alarm event may be as shown in FIG. 12.
In specific implementation, the detection algorithm may be one of a static threshold, a same-ratio strategy, a ring-ratio strategy, a same-ratio amplitude, a ring-ratio amplitude, and a same-ratio interval.
In one embodiment, a pre-configured alert notification rule may be obtained; and outputting an alarm message according to the alarm notification rule according to the alarm event.
As an example, the alert notification rules may be configured in the manner of fig. 13. The alarm notification service may select one or more of mail notification, short message notification, micro-message notification and voice notification according to the requirement.
In the application, by executing the step S150 and the step S160, the monitoring and alarming requirements of enterprises on log keywords and log indexes can be met, the operation and maintenance personnel can be assisted to monitor the operation conditions of the system and the service, and the fault can be perceived in advance to improve the operation and maintenance quality.
For example, an operation and maintenance person can configure a monitoring alarm for the query response time of the mysql, elasticsearch slow log, and exceeding a certain threshold value indicates that the performance quality of the system is being reduced, and the operation and maintenance person can intervene in advance to avoid the unavailability of the whole service system.
As can be seen from the description, the application can realize fault isolation between log data collection and log data cleaning by introducing kafka to carry out split-flow storage on collected log data and converted structured data, can improve the expansibility and reliability of the system and reduce the upgrading cost of the system, improves the monitoring performance of a monitoring system, and also realizes fault isolation among systems (such as a log system and an alarm monitoring system), thereby improving the operation and maintenance quality of the log system; according to the application, the structured data is pre-aggregated and calculated according to the configuration information of the monitoring strategy, and index data comprising log keywords and log indexes are generated and reported to the monitoring alarm system for alarm detection, so that the monitoring alarm requirements of enterprises on the log keywords and the log indexes can be met, the operation conditions of the operation and maintenance personnel monitoring system and the service can be assisted, faults are perceived in advance, the operation and maintenance quality can be improved, the pre-aggregation treatment of a large amount of log data can be realized, the detection performance of the monitoring system is improved, meanwhile, the aggregation calculation task can be converged, the request quantity of ES aggregation inquiry can be reduced, and the performance consumption of an ES cluster can be reduced.
Fig. 14 is a block diagram showing the construction of a log alarm monitoring apparatus according to an embodiment of the present application. As shown in fig. 14, the apparatus may include:
an acquisition module 210, configured to acquire acquisition configuration information; according to the acquisition configuration information, acquiring log data of a target server, and shunting the log data to a disk for storage through kafka;
a processing module 220, configured to convert the log data into structured data, and shunt the structured data to the elastic search cluster for storage through kafka;
the pre-aggregation module 230 is configured to obtain monitoring policy configuration information on the monitoring alarm system; performing pre-aggregation calculation on the structured data according to the monitoring strategy configuration information, generating index data, and reporting the index data to a monitoring alarm system, wherein the index data comprises log keywords and log indexes;
the monitoring alarm module 240 is configured to perform alarm detection on the index data according to the monitoring policy configuration information through the monitoring alarm system, and output alarm information.
In one embodiment, the acquisition configuration information is obtained by the acquisition module 210 by:
configuring the acquisition item as a target server;
based on the log theme, configuring plug-in configuration information and log filtering information, wherein the plug-in configuration information comprises an acquisition type, an acquisition target, a log path and a log character set, and the log filtering information comprises filtering rules;
Acquiring acquisition configuration information based on the acquisition items, the plug-in configuration information and the log filtering information.
In one embodiment, the acquisition module 210 is specifically configured to:
calling a log collector to collect original log data of a target server according to the collection item and the plug-in configuration information, wherein the log collector is hosted on a agent, and the agent is installed and deployed on the target server;
filtering the original log data according to the log filtering information to obtain log data;
the log data is reported to kafka, and the log data is shunted to a disk for storage through the kafka.
In one embodiment, the acquisition module 210 is further configured to:
and managing the life cycle of the log collector by using an Agent command pipeline, a data pipeline and a file pipeline.
In one embodiment, the processing module 220 is specifically configured to:
acquiring a pre-configured cleaning rule;
according to the cleaning rule, cleaning the log data and converting the log data into structured data;
acquiring pre-configured storage configuration information;
the structured data is shunted to the elastesearch cluster for storage by kafka according to the storage configuration information.
In one embodiment, the monitoring policy configuration information includes an index field, an aggregation method, and an aggregation period; the pre-aggregation module 230 is specifically configured to:
And periodically performing pre-aggregation calculation on the structured data by taking the index field, the aggregation method and the aggregation period as aggregation dimensions, and generating index data to report to a monitoring alarm system.
In one embodiment, the monitoring policy configuration information further includes a detection algorithm and an alarm triggering condition corresponding to the detection algorithm; the monitoring alarm module 240 is specifically configured to:
detecting whether the index data meets the alarm triggering condition or not in a convergence period according to a detection algorithm by a monitoring alarm system;
generating an alarm event when the index data is determined to meet the alarm triggering condition;
and outputting alarm information according to the alarm event.
The functions of each module in the log alarm monitoring device in the embodiment of the present application may be referred to the corresponding descriptions in the above method, and will not be described herein.
Fig. 15 shows a block diagram of a computer apparatus according to an embodiment of the present application. As shown in fig. 15, the computer apparatus includes: a memory 310 and a processor 320, the memory 310 storing instructions that are loaded and executed by the processor 320 to implement the log alert monitoring method in the above embodiments. The number of memories 310 and processors 320 may be one or more.
The computer apparatus further includes:
and the communication interface 330 is used for communicating with external equipment and carrying out data interaction transmission.
If the memory 310, the processor 320 and the communication interface 330 are implemented independently, the memory 310, the processor 320 and the communication interface 330 may be connected to each other and communicate with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 15, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 310, the processor 320, and the communication interface 330 are integrated on a chip, the memory 310, the processor 320, and the communication interface 330 may communicate with each other through internal interfaces.
The embodiment of the application provides a computer readable storage medium storing a computer program, which when run on a computer, implements the method provided in the embodiment of the application.
The embodiment of the application also provides a chip, which comprises a processor and is used for calling and running the instructions stored in the memory, so that the communication equipment provided with the chip executes the method provided by the embodiment of the application.
The embodiment of the application also provides a chip, which comprises: the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the application embodiment.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processing, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (fieldprogrammablegate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting an advanced reduced instruction set machine (advanced RISC machines, ARM) architecture.
Further, optionally, the memory may include a read-only memory and a random access memory, and may further include a nonvolatile random access memory. The memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may include a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory, among others. Volatile memory can include random access memory (random access memory, RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, static RAM (SRAM), dynamic RAM (dynamic random access memory, DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. Computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Any process or method description in a flowchart or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed in a substantially simultaneous manner or in an opposite order from that shown or discussed, including in accordance with the functions that are involved.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. All or part of the steps of the methods of the embodiments described above may be performed by a program that, when executed, comprises one or a combination of the steps of the method embodiments, instructs the associated hardware to perform the method.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules described above, if implemented in the form of software functional modules and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that various changes and substitutions are possible within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (10)

1. A log alert monitoring method, comprising:
acquiring acquisition configuration information;
according to the acquisition configuration information, acquiring log data of a target server, and shunting the log data to a disk for storage through kafka;
converting the log data into structured data, and shunting the structured data to an elastic search cluster through the kafka for storage;
acquiring monitoring strategy configuration information on a monitoring alarm system;
performing pre-aggregation calculation on the structured data according to the monitoring strategy configuration information to generate index data, and reporting the index data to the monitoring alarm system, wherein the index data comprises log keywords and log indexes;
and carrying out alarm detection on the index data by the monitoring alarm system according to the monitoring strategy configuration information, and outputting alarm information.
2. The method of claim 1, wherein the acquisition configuration information is obtained by:
configuring an acquisition item as the target server;
based on a log theme, configuring plug-in configuration information and log filtering information, wherein the plug-in configuration information comprises an acquisition type, an acquisition target, a log path and a log character set, and the log filtering information comprises filtering rules;
and acquiring the acquisition configuration information based on the acquisition item, the plug-in configuration information and the log filtering information.
3. The method of claim 2, wherein collecting log data of a target server according to the collection configuration information, the shunting of the log data to disk storage by kafka comprises:
calling a log collector to collect original log data of the target server according to the collection item and the plug-in configuration information, wherein the log collector is hosted on a agent, and the agent is installed and deployed on the target server;
filtering the original log data according to the log filtering information to obtain the log data;
and reporting the log data to the kafka, and shunting the log data to the disk for storage through the kafka.
4. A method according to claim 3, characterized in that the method further comprises:
and managing the life cycle of the log collector by using an Agent command pipeline, a data pipeline and a file pipeline.
5. The method of claim 1, wherein converting the log data into structured data, and wherein offloading the structured data to an elastesearch cluster for storage by the kafka comprises:
acquiring a pre-configured cleaning rule;
according to the cleaning rule, cleaning the log data to convert the log data into the structured data;
acquiring pre-configured storage configuration information;
and shunting the structured data to the elastiscearch cluster for storage according to the storage configuration information through the kafka.
6. The method of any of claims 1-5, wherein the monitoring policy configuration information includes an index field, an aggregation method, and an aggregation period;
performing pre-aggregation calculation on the structured data according to the monitoring policy configuration information, and generating index data to report to the monitoring alarm system comprises the following steps:
and periodically performing pre-aggregation calculation on the structured data by taking the index field, the aggregation method and the aggregation period as aggregation dimensions, and generating the index data to report to the monitoring alarm system.
7. The method of claim 6, wherein the monitoring policy configuration information further comprises a detection algorithm and an alarm triggering condition corresponding to the detection algorithm;
and carrying out alarm detection on the index data by the monitoring alarm system according to the monitoring strategy configuration information, wherein the output alarm information comprises:
detecting whether the index data meets the alarm triggering condition or not in the convergence period according to the detection algorithm by the monitoring alarm system;
generating an alarm event when the index data is determined to meet the alarm triggering condition;
and outputting the alarm information according to the alarm event.
8. A log alert monitoring device, comprising:
the acquisition module is used for acquiring acquisition configuration information; according to the acquisition configuration information, acquiring log data of a target server, and shunting the log data to a disk for storage through kafka;
the processing module is used for converting the log data into structured data, and shunting the structured data to an elastic search cluster through the kafka for storage;
the pre-aggregation module is used for acquiring monitoring strategy configuration information on the monitoring alarm system; performing pre-aggregation calculation on the structured data according to the monitoring strategy configuration information to generate index data, and reporting the index data to the monitoring alarm system, wherein the index data comprises log keywords and log indexes;
And the monitoring alarm module is used for carrying out alarm detection on the index data through the monitoring alarm system according to the monitoring strategy configuration information and outputting alarm information.
9. A computer apparatus, comprising: a memory and a processor, the memory storing instructions that are loaded and executed by the processor to implement the method of any one of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on a computer, implements the method according to any of claims 1-7.
CN202311447729.9A 2023-11-02 2023-11-02 Log alarm monitoring method and device and computer storage medium Pending CN117194175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311447729.9A CN117194175A (en) 2023-11-02 2023-11-02 Log alarm monitoring method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311447729.9A CN117194175A (en) 2023-11-02 2023-11-02 Log alarm monitoring method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN117194175A true CN117194175A (en) 2023-12-08

Family

ID=88987217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311447729.9A Pending CN117194175A (en) 2023-11-02 2023-11-02 Log alarm monitoring method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN117194175A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111526060A (en) * 2020-06-16 2020-08-11 网易(杭州)网络有限公司 Method and system for processing service log
CN111881011A (en) * 2020-07-31 2020-11-03 网易(杭州)网络有限公司 Log management method, platform, server and storage medium
CN112698915A (en) * 2020-12-31 2021-04-23 北京千方科技股份有限公司 Multi-cluster unified monitoring alarm method, system, equipment and storage medium
CN115460072A (en) * 2022-08-25 2022-12-09 浪潮云信息技术股份公司 Log processing system integrating log collection, analysis, storage and service
WO2023123801A1 (en) * 2021-12-30 2023-07-06 上海川源信息科技有限公司 Log aggregation system, and method for improving availability of log aggregation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111526060A (en) * 2020-06-16 2020-08-11 网易(杭州)网络有限公司 Method and system for processing service log
CN111881011A (en) * 2020-07-31 2020-11-03 网易(杭州)网络有限公司 Log management method, platform, server and storage medium
CN112698915A (en) * 2020-12-31 2021-04-23 北京千方科技股份有限公司 Multi-cluster unified monitoring alarm method, system, equipment and storage medium
WO2023123801A1 (en) * 2021-12-30 2023-07-06 上海川源信息科技有限公司 Log aggregation system, and method for improving availability of log aggregation system
CN115460072A (en) * 2022-08-25 2022-12-09 浪潮云信息技术股份公司 Log processing system integrating log collection, analysis, storage and service

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
US10108411B2 (en) Systems and methods of constructing a network topology
Xu et al. Online system problem detection by mining patterns of console logs
CN107729210B (en) Distributed service cluster abnormity diagnosis method and device
CN107544832A (en) A kind of monitoring method, the device and system of virtual machine process
CN110417586B (en) Service monitoring method, service node, server and computer readable storage medium
CN101312405A (en) Alarm processing method and network management system
CN110209518A (en) A kind of multi-data source daily record data, which is concentrated, collects storage method and device
CN113760652B (en) Method, system, device and storage medium for full link monitoring based on application
CN114124655A (en) Network monitoring method, system, device, computer equipment and storage medium
CN108845912A (en) Service interface calls the alarm method of failure and calculates equipment
CN112751726A (en) Data processing method and device, electronic equipment and storage medium
CN113608839A (en) Cluster alarm method and device, computer equipment and storage medium
CN111782477B (en) Abnormal log monitoring method and device, computer equipment and storage medium
CN114356499A (en) Kubernetes cluster alarm root cause analysis method and device
WO2015187001A2 (en) System and method for managing resources failure using fast cause and effect analysis in a cloud computing system
CN108809729A (en) The fault handling method and device that CTDB is serviced in a kind of distributed system
CN110750425A (en) Database monitoring method, device and system and storage medium
CN110825580A (en) Kuberrnates Pod health monitoring method
CN114553747A (en) Method, device, terminal and storage medium for detecting abnormality of redis cluster
CN114301817A (en) Equipment monitoring threshold setting method and system based on Netconf protocol
CN112910733A (en) Full link monitoring system and method based on big data
CN117194175A (en) Log alarm monitoring method and device and computer storage medium
CN108449212B (en) MAS message transmission method based on event association
CN111352803A (en) Service data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination