CN117707885A - Multi-cluster monitoring index processing method and system - Google Patents
Multi-cluster monitoring index processing method and system Download PDFInfo
- Publication number
- CN117707885A CN117707885A CN202311647656.8A CN202311647656A CN117707885A CN 117707885 A CN117707885 A CN 117707885A CN 202311647656 A CN202311647656 A CN 202311647656A CN 117707885 A CN117707885 A CN 117707885A
- Authority
- CN
- China
- Prior art keywords
- rule
- monitoring
- monitoring index
- index
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 311
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000012795 verification Methods 0.000 claims abstract description 20
- 230000005856 abnormality Effects 0.000 claims description 30
- 230000002159 abnormal effect Effects 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 21
- 230000002776 aggregation Effects 0.000 claims description 14
- 238000004220 aggregation Methods 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 13
- 238000002955 isolation Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 6
- 238000007726 management method Methods 0.000 description 25
- 230000006870 function Effects 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000001186 cumulative effect Effects 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002071 nanotube Substances 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The application discloses a multi-cluster monitoring index processing method and system, wherein the method comprises the steps of acquiring monitoring index data of a plurality of clusters from a plurality of monitoring alarm systems; each cluster is provided with a plurality of monitoring alarm assembly examples, and the types of monitoring index data provided by the monitoring alarm assembly examples in the same cluster are different; generating a monitoring index rule according to rule description information and monitoring index data input by a user; checking the monitoring index rule; and if the monitoring index rule passes the verification, configuring the monitoring index rule to the monitoring alarm component instance.
Description
Technical Field
The present disclosure relates to the field of cluster technologies, and in particular, to a method and system for processing multiple clusters of monitoring indexes.
Background
In the process of transforming the traditional cloud platform into the container cloud platform, the transformation of the technical architecture, the increase of resource objects and the increase of the system scale all bring great challenges to the monitoring field. Prometheus is used as an open source monitoring alarm system which is well applicable to the field of cloud primary monitoring, and the monitoring and acquisition aspect has the capability of large-scale resource monitoring and acquisition by means of the high concurrency characteristic of Go language; the mass storage of the historical data can be realized in the aspect of storage by virtue of the storage of the time sequence database.
However, the existing promethaus does not support automatic updating of the monitoring index rule, the updating of the monitoring index rule needs to be realized by a manual configuration mode, and the updating efficiency is low.
Disclosure of Invention
Therefore, the application discloses the following technical scheme:
the first aspect of the present application provides a multi-cluster monitoring index processing method, including:
acquiring monitoring index data of a plurality of clusters from a plurality of monitoring alarm systems; each cluster is provided with a plurality of monitoring alarm assembly examples, and the types of monitoring index data provided by the monitoring alarm assembly examples in the same cluster are different;
generating a monitoring index rule according to rule description information input by a user and the monitoring index data;
checking the monitoring index rule;
and if the monitoring index rule passes the verification, configuring the monitoring index rule to the monitoring alarm component instance.
Optionally, before the generating the monitoring index rule according to the rule description information input by the user and the monitoring index data, the method further includes:
performing index label standardization processing on the monitoring index data of different categories to obtain standardized monitoring index data;
the generating a monitoring index rule according to the rule description information input by the user and the monitoring index data comprises the following steps:
and generating a monitoring index rule according to rule description information input by a user and the standardized monitoring index data.
Optionally, the verifying the monitoring indicator rule includes:
and carrying out rule normalization check and version consistency check on the monitoring index rule.
Optionally, the method further comprises:
detecting the memory occupation condition of the monitoring alarm assembly instance;
and when the memory occupation condition abnormality is detected, carrying out configuration change on an abnormality monitoring index rule so as to realize isolation treatment of abnormality indexes.
Optionally, the method further comprises:
and outputting abnormal notification information when the memory occupation condition abnormality is detected.
A second aspect of the present application provides a multi-cluster monitoring index processing system, including:
the multi-cluster aggregation module is used for acquiring monitoring index data of a plurality of clusters from a plurality of monitoring alarm systems; each cluster is provided with a plurality of monitoring alarm assembly examples, and the types of monitoring index data provided by the monitoring alarm assembly examples in the same cluster are different;
the index management module comprises an index generator, a rule checker and a rule configuration manager:
the index generator is used for generating a monitoring index rule according to rule description information input by a user and the monitoring index data;
the rule checker is used for checking the monitoring index rule;
and the rule configuration manager is used for configuring the monitoring index rule to the monitoring alarm component instance if the monitoring index rule passes the verification.
Optionally, the multi-cluster index aggregation module is further configured to:
performing index label standardization processing on the monitoring index data of different categories to obtain standardized monitoring index data;
the index generator is specifically configured to, when generating a monitoring index rule according to rule description information and the monitoring index data input by a user:
and generating a monitoring index rule according to rule description information input by a user and the standardized monitoring index data.
Optionally, when the rule checker checks the monitoring indicator rule, the rule checker is specifically configured to:
and carrying out rule normalization check and version consistency check on the monitoring index rule.
Optionally, the system further comprises an exception handling module comprising a monitor and a handler:
the monitor is used for detecting the memory occupation condition of the monitoring alarm assembly instance;
and the processor is used for carrying out configuration change on the abnormal monitoring index rule when the memory occupation condition abnormality is detected so as to realize isolation treatment of abnormal indexes.
Optionally, the disposer is further configured to:
and outputting abnormal notification information when the memory occupation condition abnormality is detected.
The beneficial effect of this scheme lies in:
the method and the system can automatically generate and configure the monitoring index rules for the monitoring alarm assembly examples by utilizing the monitoring index data of a plurality of clusters and rule description information input by a user, and achieve the effect of efficiently updating the monitoring index rules during the operation of the monitoring alarm assembly examples.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic diagram of a multi-cluster monitoring index processing system according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an index management module according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an exception handling module according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a multi-cluster monitoring index processing method according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
First, some terms that may be related to the present application will be described.
The container is an application container engine, so that an issuer can package application programs and dependent software thereof into a portable container in a unified mode, can be virtualized in a light weight manner and supports cross-environment seamless deployment.
Kubernetes, abbreviated as K8S, is an open source software cluster for large-scale deployment and management of containers, with the capabilities of resource scheduling, application deployment and management, automatic repair, service discovery and load balancing, and elastic scaling, becoming the standard for container orchestration.
A container cloud platform capable of managing enterprise-level cloud platforms of multiple Kubernetes clusters.
Prometaus is an open-source monitoring alarm system, can monitor the states of application programs and systems by collecting and storing time series data, has powerful query languages and visualization tools, and can rapidly diagnose and solve problems.
PromQL, a query language for querying and analyzing Prometheus time series data, allows users to select and aggregate time series data according to conditions such as identifiers, labels, functions, etc., and supports SQL-like operations such as selection, filtering, aggregation, and ordering.
PrometheusRules, a monitoring related custom resource object, is used to manage custom monitoring index rules on Prometheus.
Zabbix, which is an open source enterprise level monitoring solution, is used for monitoring, tracking and analyzing network, server and application performance and availability in real time, and provides a wide range of monitoring functions including data collection, data storage, visualization, alarm and the like.
The TSDB, time series database, is a database system dedicated to storing and processing time series data. Where time series data is data collected and recorded in time series, commonly used to represent time varying metrics, and events.
The Go language, also known as Golang, is an open source programming language. The Go language design goal is to provide a simple, efficient and reliable programming language suitable for large-scale software development. The Go language is also widely used in the technical fields of network service programming, cloud computing, container technology, big data processing, etc.
An API, referred to as an application program interface, is used to describe the features or methods of use of a class library to provide the ability for applications and developers to access a set of routines based on software.
Webhook is an HTTP callback for performing certain actions in some cases that can be invoked for resource modification or verification when creating a K8S resource object.
And the custom resource object template in the CR and the K8S supports the developer to customize the K8S resource, and is managed and operated through the master console.
ETCD, a high availability key value store database, is one of the important components in K8S for storing all data of K8S clusters, including cluster state, configuration information, metadata, etc.
Pod, container instance, and collection of storage, configuration that it depends on.
The workload, the resource object within Kubernetes, is a collection of container instance copies.
YAML, a data serialization language, is commonly used to write configuration files.
With the extreme optimization of resource management, traditional cloud platforms are gradually transforming towards container cloud platforms. In a container environment, the creation and destruction of resources are more frequent than the traditional virtual machines, and the data required to be monitored is huge and complex. The traditional monitoring system often needs to manually configure or statically discover a monitoring object, and is difficult to adapt to the elasticity and flexibility of a container; in terms of supporting container indexes, the traditional monitoring system is not a native supporting container, and more complex configuration and adjustment are often required to adapt to the containerized monitoring environment; meanwhile, the traditional monitoring system, such as Zabbix, adopts a relational database, so that the acquisition performance of data is greatly limited in a container environment with larger data volume, and time sequence data of a container cannot be stored and analyzed efficiently.
Prometaus is an open source monitoring alarm system realized based on Go language, and the persistence of index data is realized through TSDB (time sequence database). Compared with the traditional monitoring system, prometheus can call a K8S native interface to realize automatic discovery of a monitoring target, and manual intervention is not needed; and Prometaus primary support container monitoring, without complex configuration; meanwhile, prometheus self-researches a set of high-performance time sequence database, which can support data storage of tens of millions of seconds per second, and can expand the storage performance of historical data through a third-party time sequence database.
However, in the practical application of Prometheus, there are several problems as follows.
First, prometheus does not support automatic updating of the monitoring index rules, which needs to be implemented by manual deployment.
Secondly, the verification capability of the monitoring index rule is lacking, the validity of the rule cannot be automatically judged after the monitoring index rule is manually deployed, and the risk of failure of the monitoring index rule after deployment exists.
Thirdly, in the multi-cluster system, index versions of different clusters are not uniformly managed and checked, so that version differences among the different clusters are large, and maintenance difficulty is high.
Fourth, part of the monitoring indexes may cause high memory occupation along with the increase of the number of the monitored objects, resulting in abnormal operation of Prometaus.
In order to solve the above-mentioned problems, an embodiment of the present application provides a multi-cluster monitoring index processing system, as shown in fig. 1, which may include a multi-cluster index aggregation module, an index management module and an exception handling module.
As shown in fig. 1, the multi-cluster index aggregation module is connected with a plurality of K8S clusters, each K8S cluster deploys a plurality of monitoring alarm assembly instances, each monitoring alarm assembly instance performs different acquisition tasks according to categories, so as to obtain data of different categories of monitoring indexes, the multi-cluster index aggregation module aggregates the data acquired by the monitoring alarm assembly instances of each K8S cluster, performs index label standardization on the data of the multi-source multi-category monitoring indexes, and ensures that universal information label fields among different monitoring indexes are kept consistent.
Wherein the common information tag fields include, but are not limited to, cluster name, workload name, and pod name.
The index management module is used for realizing the further processing and use of the standardized index, and comprises a custom index automatic generation function, a monitoring index rule checking function, a version management function, a consistency checking function and a rule configuration unified deployment function.
The abnormality handling module is used for realizing unified monitoring and handling of monitoring indexes, wherein the unified monitoring and handling comprises real-time abnormal state monitoring, abnormal monitoring index analysis, abnormal index rule processing and abnormal information notification.
Referring to fig. 2, for the architecture diagram of the index management module, the index management module may include an index generator, a rule checker, a unified version manager, and a rule configuration manager.
The rule checker is used for realizing rule normalization check and version consistency check.
The unified version manager is used for realizing standard version management and multi-cluster version management.
Referring to fig. 3, an architecture diagram of the above-mentioned exception handling module may include a monitor and a handler.
The monitor is used for realizing operation monitoring and abnormal index analysis functions, wherein the operation monitoring comprises, but is not limited to, detecting the memory utilization rate of the monitoring alarm component instance.
The handler includes a notification unit and a handling unit for implementing a monitoring index rule configuration function.
According to the above multi-cluster monitoring index processing system, an embodiment of the present application provides a multi-cluster monitoring index processing method, please refer to fig. 4, which is a flowchart of the method, and the method may include the following steps.
S401, monitoring index data of a plurality of clusters are obtained from a plurality of monitoring alarm systems, each cluster is provided with a plurality of monitoring alarm component examples, and the types of the monitoring index data provided by the monitoring alarm component examples in the same cluster are different.
Step S401 may be implemented by a multi-cluster index aggregation module (hereinafter referred to as an aggregation module) of the system shown in fig. 1.
In step S401, the convergence module may be communicatively coupled to each monitoring alarm component instance (e.g., a promethaus instance) of each K8S cluster. Each monitoring alarm component instance is configured with a plurality of monitoring index rules. The monitor indicator rules may be configured by the rule configuration manager described above.
The method comprises the steps that a plurality of prometaus examples of each K8S cluster monitoring alarm assembly example collect monitoring index data of different targets and different sources according to different collection configurations, and then the collected monitoring index data are reported to a convergence module.
In some alternative embodiments, after obtaining the monitoring index data, the aggregation module may perform the following steps:
and performing index label standardization processing on the monitoring index data of different categories to obtain standardized monitoring index data.
The index label standardization process refers to converting the format of the general information label field carried in the monitoring index data into a unified format. A general information tag field may be understood as a field for describing a partial attribute of the corresponding monitoring index data.
As an example, the generic info tag field may include, but is not limited to, a cluster name, a workload name, and a pod name, where the cluster name field may describe which K8S cluster the monitor index data is coming from.
The formats of the general information tag fields of the monitoring index data from different K8S clusters and different monitoring alarm component examples may be different, and the differences of the formats may cause that other modules using the monitoring index data cannot correctly identify the monitoring index data.
In an exemplary embodiment, in the monitoring index data reported by one monitoring alarm component instance, the format of the cluster name field may be "cluster XX", and in the monitoring index data reported by another monitoring alarm component instance, the format of the cluster name field may be "cluster name-XX", and after the index label standardization process is performed by the aggregation module, the cluster name fields of the monitoring index data of the two systems may be all converted into a unified format, for example, all converted into "cluster name XX".
S402, according to rule description information and monitoring index data input by a user, a monitoring index rule is generated.
In step S402, if the aggregation module performs the index tag normalization process, S402 may be replaced with:
and generating a monitoring index rule according to the rule description information input by the user and the standardized monitoring index data.
Step S402 may be implemented by an index generator of the system shown in fig. 1.
The monitoring index rule may include collected index data, index query expression, and other information.
As one example, a monitoring index rule may include:
the collected index data is the ratio of the accumulated visit amount and the accumulated memory usage amount every week; the acquisition mode may be that every other week, the access amount index data of the past week is accumulated to obtain the accumulated access amount of the week, the memory usage amount index data of the past week is accumulated to obtain the accumulated memory usage amount of the week, and then the former is divided by the latter to obtain the ratio of the accumulated access amount of the week and the accumulated memory usage amount of the week.
The rule description information may include index grouping information, multiple groups of indexes to be aggregated, and custom index tags.
The index grouping information indicates which category the index data to be collected by the monitoring index rule belongs to, the multiple groups of indexes to be aggregated indicate the index data to be collected by the monitoring index rule, the custom index label indicates the name of the index data to be collected by the monitoring index rule, and the universal information label field to be attached during reporting.
Taking the foregoing monitoring index rule as an example, when the foregoing monitoring index rule is generated, rule description information obtained by the index generator may be:
index grouping information, memory use condition class; multiple groups of indexes to be polymerized, access quantity and memory use quantity; the index label is customized, the ratio of the cumulative access amount and the cumulative memory usage amount per week is attached with the cluster name and the workload name.
In step S402, the index generator may display and output a front-end page, for example, a corresponding web page on a computer web page, in response to an instruction for generating a new monitoring index rule, and then the user may create an index group on the front-end page, designate a plurality of groups of indexes to be aggregated, and input a custom index tag.
After the user inputs the rule description information, the index generator generates a monitoring index rule corresponding to the rule description information as follows:
the index generator monitors and acquires rule description information input or selected by a user in a front-end page, analyzes a plurality of groups of indexes to be aggregated and self-defined index labels in the rule description information, and generates an index query expression corresponding to the indexes based on PromQL query grammar and a query expression template.
In combination with the foregoing example, the rule of the index generator describes a plurality of groups of indexes to be aggregated and custom index labels, and generates an index query expression of the index, which is the ratio of the cumulative access amount and the cumulative memory usage amount per week, based on the analysis result.
After the query index expression is generated, the index generator calls Prometaus API, the index query expression is utilized to realize the pre-query of the custom index, the query result is presented on the front page for the user to check in a preview mode, and meanwhile, the pre-generated query index expression can be temporarily stored in an array.
Taking the ratio of the cumulative access amount and the cumulative memory usage amount per week as an example, after the index generator obtains the index query expression of the index, the index query expression is used for pre-query processing on the currently collected index data to obtain a corresponding query result, and then the query result is displayed on the front page.
After the user checks, if the query result meets the user expectation, the user can click the front end page for confirmation, then the index generator responds to the confirmation instruction, and the index expression and the custom index name are combined into a monitoring index rule.
If the query result does not meet the user expectation, the user can click on the front end page to regenerate, at the moment, the user can input new rule description information, and then the index generator repeats the process by using the new rule description information to obtain a new monitoring index rule.
The index generator may output a YAML file in which the generated monitoring index rule is recorded, and output the YAML file to the rule checker so that the rule checker checks the monitoring index rule recorded in the file.
When the YAML file is generated, the index generator can read the template file corresponding to the category according to the category indicated in the index grouping information, and then the template file can be combined with the monitoring index rule generated before to form the YAML file recorded with the monitoring index rule.
S403, checking the monitoring index rule.
If the monitoring index rule passes the verification, step S404 is executed, and if the monitoring index rule does not pass the verification, step S405 is executed.
Step S403 may be implemented by a rule checker of the system shown in fig. 1.
The rule checker may specifically perform rule normalization check and version consistency check on the monitoring indicator rule when executing S403.
The rule normalization check is to check whether the file recording the monitoring index rule accords with a preset file specification.
Taking the YAML file as an example, the rule checker may check whether the YAML file recorded with the monitoring index rule meets the YAML file specification, if so, determine that the monitoring index rule passes the rule normalization check, and if not, determine that the monitoring index rule does not meet the rule normalization check.
And the version consistency check is to check whether the generated monitoring index rule and the existing monitoring index rule in the unified version management library are repeated, if so, determining that the monitoring index rule does not pass the version consistency check, and if not, determining that the monitoring index rule passes the version consistency check.
The purpose of the version consistency check is to ensure the uniqueness of the rules.
If the two checks are passed, determining that the monitoring index rule passes the check, and if at least one check fails, determining that the monitoring index rule fails the check.
After the verification is passed, the unified version manager can generate a corresponding version identifier for the monitored index rule which is passed through the verification, and then store the monitored index rule and the corresponding version identifier into the unified version management library.
S404, if the monitoring index rule passes the verification, configuring the monitoring index rule to the monitoring alarm component instance.
Step S404 may be implemented by a rule configuration manager of the system shown in fig. 1.
The rule configuration manager can directly configure the monitoring index rule into a plurality of monitoring alarm component instances of a plurality of clusters after the monitoring index rule passes the verification, and the monitoring index rule is specifically configured to which clusters and which monitoring alarm component instances, and can be specified by a user or selected by the rule configuration manager according to the operation condition of each cluster.
The configuration application may have a version identifier of the monitoring index rule to be configured, so that the rule configuration manager may search the monitoring index rule to be configured using the version identifier.
When the monitoring index rule is configured, the rule configuration manager can realize the configuration of the monitoring index rule and the control functions of unified deployment and unified rollback of multiple clusters of the rule configuration file by calling the multi-cluster K8S API according to the stored cluster authentication information.
S405, if the monitoring index rule fails to pass the verification, outputting prompt information and obtaining new rule description information.
After S405 is performed, S402 is performed back to generate a new monitoring index rule according to the new rule description information.
In S405, the prompt information may be output by the indicator generator, and the prompt information may instruct the user to input new rule description information, and in addition, the prompt information may further include a reason for monitoring that the indicator rule fails to pass the verification, for example, fails the rule normalization verification, or fails the version consistency verification.
In some optional embodiments, the multi-cluster monitoring index processing method may further include the following steps:
inquiring and displaying the monitoring index rule according to the inquiring instruction input by the user.
The above steps may be implemented by a unified version manager of the system shown in fig. 1. The unified version manager can realize multi-version management of the user-defined monitoring index rules, wherein the multi-version management comprises a standard rule release version and a centralized nano-tube, a display and query function of rule versions deployed by each K8S cluster.
Specifically, the unified version manager can receive a query instruction of a user, then analyze out a keyword carried in the query instruction, query a monitoring index rule corresponding to the keyword in the unified version management library by utilizing the keyword, and display a query result to the user.
In some optional embodiments, the multi-cluster monitoring index processing method may further include the following steps:
a1, detecting the memory occupation condition of the monitoring alarm assembly instance;
a2, when the memory occupation condition abnormality is detected, carrying out configuration change on the abnormality monitoring index rule so as to realize isolation treatment of the abnormality index.
Step A1 may be implemented by a monitor of the system shown in fig. 1. The memory occupancy may be represented by memory operation data.
The monitor can collect the memory operation data of the monitoring alarm assembly instance by calling the API, and further realize real-time monitoring of the index collection condition and the generation condition of the monitoring index rule according to the collected memory operation data.
Specifically, the memory operation data can reflect the size of the memory space occupied by each monitoring index data collected by the monitoring alarm component instance, and the size of the memory space occupied by the index generator generating the monitoring index rule.
Based on the latter, the monitor can analyze the number of the monitoring index rules generated by the index generator in the last period of time according to the size of the occupied memory space, that is, analyze the generation condition of the monitoring index rules.
Based on the former, the monitor can judge whether the memory space size used by the monitoring alarm component example for collecting the monitoring index data exceeds a certain memory threshold value according to each monitoring index data, if so, the monitoring index is considered to have the problem of abnormal memory occupation condition, and if not, the memory occupation condition of the monitoring index is considered to be normal.
After the monitoring index of the abnormal memory occupation condition is found, the monitor can inquire the monitoring index rule of the abnormal memory occupation condition in the unified version management library, and inform the inquired monitoring index rule to the processor.
Step A2 may be implemented by a handler of the system shown in fig. 1.
Alternatively, the handler may comprise a notification unit and a handling unit, and step A2 may be specifically performed by the handling unit.
In A2, the processing unit may receive the analysis result from the monitor, that is, receive the abnormality monitoring index rule, and then the processing unit may flag the abnormality monitoring index and flag the environment to which the abnormality monitoring index belongs.
The handling unit may then invoke a rule configuration manager to configure changes to the anomaly monitoring indicator rules to effect isolation handling of the anomaly indicators.
Optionally, the method further comprises:
and when the memory occupation condition abnormality is detected, outputting abnormality notification information.
The notification unit in the processor is used for receiving the analysis result from the monitoring module and then notifying the operation and maintenance personnel of the analysis result by outputting abnormal communication information.
The abnormality notification information may include an abnormality monitoring index, an environment to which the abnormality monitoring index belongs, and an abnormality monitoring index rule.
The abnormal monitoring index rule refers to the monitoring index rule which is queried by the monitor in the step A1 and to which the monitoring index of abnormal memory occupation condition belongs.
The abnormal monitoring index refers to the monitoring index of the abnormal memory occupation condition detected by the monitor in the step A1.
The multi-cluster monitoring index processing method provided by the embodiment has the beneficial effects that:
the method and the system can automatically generate and configure the monitoring index rules for the monitoring alarm assembly examples by utilizing the monitoring index data of a plurality of clusters and rule description information input by a user, and achieve the effect of efficiently updating the monitoring index rules during the operation of the monitoring alarm assembly examples.
The technical scheme designs a set of unified management system to realize the full life cycle management of the multi-cluster monitoring indexes. The index convergence module performs label standardization processing on monitoring indexes collected by the K8S cluster Prometaus plug-ins, and the processed index data are further processed and used by the index management module: firstly, automatically generating a custom monitoring index rule file by an index array which is pre-generated by a user in front-end page operation by an index generator; secondly, the generated index rule file realizes rule normalization and version consistency verification through a rule checker, and realizes version management and control through a unified version manager; and finally, deploying and configuring the custom index rule into a plurality of cluster environments through a rule configuration manager.
The monitoring index rule deployed in the multi-cluster environment realizes the memory occupation analysis and evaluation of the monitoring index through the abnormality handling module, screens and identifies the abnormality index which can be cut, and finally realizes the isolation handling of the abnormality index.
The multi-cluster monitoring index processing method provided by the application has the following beneficial effects:
according to the first aspect, splitting of monitoring acquisition tasks is completed, different monitoring acquisition tasks are issued to different Prometaus examples, standardized treatment of index data is achieved through a label standardization means, and fine unified management of index acquisition is finally achieved.
In the second aspect, the index generator is realized based on the visualized front-end configuration page, the editable and polymerizable index set can be intelligently identified and provided, and meanwhile, the pre-generation and pre-query of the self-defined index are realized through the hook selection of the front-end page, so that the learning cost of a user is greatly reduced.
In the third aspect, unified version management capability is realized based on the custom index, and besides unified storage and maintenance capability of standard versions, version nano-tubes, version inquiry, version verification and unified deployment of multi-cluster multi-dimensional monitoring indexes can be realized.
In the fourth aspect, unified monitoring of the memory state of the multi-cluster Prometaus plug-in is realized, memory occupation analysis of monitoring indexes and positioning, autonomous notification and automatic isolation capabilities of abnormal monitoring indexes are provided, and the robustness of the multi-cluster monitoring system is improved.
In the multi-cluster monitoring index processing system provided in this embodiment, each module may specifically execute the following steps, thereby implementing the multi-cluster monitoring index processing method described above.
The multi-cluster aggregation module is used for acquiring monitoring index data of a plurality of clusters from a plurality of monitoring alarm systems; each cluster is provided with a plurality of monitoring alarm assembly examples, and the types of monitoring index data provided by the monitoring alarm assembly examples in the same cluster are different;
the index management module comprises an index generator, a rule checker and a rule configuration manager:
the index generator is used for generating a monitoring index rule according to rule description information and monitoring index data input by a user;
the rule checker is used for checking the monitoring index rule;
the rule configuration manager is used for configuring the monitoring index rule to the monitoring alarm component instance if the monitoring index rule passes the verification.
Optionally, the multi-cluster index convergence module is further configured to:
performing index label standardization processing on the monitoring index data of different categories to obtain standardized monitoring index data;
the index generator is specifically used for generating a monitoring index rule according to rule description information and monitoring index data input by a user:
and generating a monitoring index rule according to the rule description information input by the user and the standardized monitoring index data.
Optionally, when the rule checker checks the monitoring index rule, the rule checker is specifically configured to:
and carrying out rule normalization check and version consistency check on the monitoring index rule.
Optionally, the system further comprises an exception handling module comprising a monitor and a handler:
the monitor is used for detecting the memory occupation condition of the monitoring alarm assembly instance;
and the processor is used for carrying out configuration change on the rule of the abnormal monitoring index to realize isolation treatment of the abnormal index when the memory occupation condition is detected to be abnormal.
Optionally, the disposer is further configured to:
and when the memory occupation condition abnormality is detected, outputting abnormality notification information.
The specific working principle and beneficial effects of the system can be referred to the relevant steps and beneficial effects of the multi-cluster monitoring index processing method provided in the foregoing embodiment, and are not repeated.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
For convenience of description, the above system or apparatus is described as being functionally divided into various modules or units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
Finally, it is further noted that relational terms such as first, second, third, fourth, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.
Claims (10)
1. The multi-cluster monitoring index processing method is characterized by comprising the following steps of:
acquiring monitoring index data of a plurality of clusters from a plurality of monitoring alarm systems; each cluster is provided with a plurality of monitoring alarm assembly examples, and the types of monitoring index data provided by the monitoring alarm assembly examples in the same cluster are different;
generating a monitoring index rule according to rule description information input by a user and the monitoring index data;
checking the monitoring index rule;
and if the monitoring index rule passes the verification, configuring the monitoring index rule to the monitoring alarm component instance.
2. The method of claim 1, wherein before generating the monitoring index rule based on the rule description information and the monitoring index data input by the user, further comprising:
performing index label standardization processing on the monitoring index data of different categories to obtain standardized monitoring index data;
the generating a monitoring index rule according to the rule description information input by the user and the monitoring index data comprises the following steps:
and generating a monitoring index rule according to rule description information input by a user and the standardized monitoring index data.
3. The method of claim 1, wherein verifying the monitoring indicator rule comprises:
and carrying out rule normalization check and version consistency check on the monitoring index rule.
4. The method as recited in claim 1, further comprising:
detecting the memory occupation condition of the monitoring alarm assembly instance;
and when the memory occupation condition abnormality is detected, carrying out configuration change on an abnormality monitoring index rule so as to realize isolation treatment of abnormality indexes.
5. The method as recited in claim 4, further comprising:
and outputting abnormal notification information when the memory occupation condition abnormality is detected.
6. A multi-cluster monitoring index processing system, comprising:
the multi-cluster aggregation module is used for acquiring monitoring index data of a plurality of clusters from a plurality of monitoring alarm systems; each cluster is provided with a plurality of monitoring alarm assembly examples, and the types of monitoring index data provided by the monitoring alarm assembly examples in the same cluster are different;
the index management module comprises an index generator, a rule checker and a rule configuration manager:
the index generator is used for generating a monitoring index rule according to rule description information input by a user and the monitoring index data;
the rule checker is used for checking the monitoring index rule;
and the rule configuration manager is used for configuring the monitoring index rule to the monitoring alarm component instance if the monitoring index rule passes the verification.
7. The system of claim 6, wherein the multi-cluster index aggregation module is further configured to:
performing index label standardization processing on the monitoring index data of different categories to obtain standardized monitoring index data;
the index generator is specifically configured to, when generating a monitoring index rule according to rule description information and the monitoring index data input by a user:
and generating a monitoring index rule according to rule description information input by a user and the standardized monitoring index data.
8. The system of claim 6, wherein the rule checker is configured to, when checking the monitor indicator rule:
and carrying out rule normalization check and version consistency check on the monitoring index rule.
9. The system of claim 6, further comprising an exception handling module comprising a monitor and a handler:
the monitor is used for detecting the memory occupation condition of the monitoring alarm assembly instance;
and the processor is used for carrying out configuration change on the abnormal monitoring index rule when the memory occupation condition abnormality is detected so as to realize isolation treatment of abnormal indexes.
10. The system of claim 9, wherein the disposer is further configured to:
and outputting abnormal notification information when the memory occupation condition abnormality is detected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311647656.8A CN117707885A (en) | 2023-12-04 | 2023-12-04 | Multi-cluster monitoring index processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311647656.8A CN117707885A (en) | 2023-12-04 | 2023-12-04 | Multi-cluster monitoring index processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117707885A true CN117707885A (en) | 2024-03-15 |
Family
ID=90148977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311647656.8A Pending CN117707885A (en) | 2023-12-04 | 2023-12-04 | Multi-cluster monitoring index processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117707885A (en) |
-
2023
- 2023-12-04 CN CN202311647656.8A patent/CN117707885A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109947746B (en) | Data quality control method and system based on ETL flow | |
KR102033971B1 (en) | Data quality analysis | |
US8489735B2 (en) | Central cross-system PI monitoring dashboard | |
US9354867B2 (en) | System and method for identifying, analyzing and integrating risks associated with source code | |
CN112965874B (en) | Configurable monitoring alarm method and system | |
US10116534B2 (en) | Systems and methods for WebSphere MQ performance metrics analysis | |
CN111427748B (en) | Task alarm method, system, equipment and storage medium | |
Bockermann et al. | The streams framework | |
US20100058113A1 (en) | Multi-layer context parsing and incident model construction for software support | |
CN113656245B (en) | Data inspection method and device, storage medium and processor | |
CN105122733B (en) | Queue is monitored and visualized | |
CN112181704B (en) | Big data task processing method and device, electronic equipment and storage medium | |
CN114090378A (en) | Custom monitoring and alarming method based on Kapacitor | |
CN111221698A (en) | Task data acquisition method and device | |
US9354962B1 (en) | Memory dump file collection and analysis using analysis server and cloud knowledge base | |
CN113297057A (en) | Memory analysis method, device and system | |
CN114221997A (en) | Interface monitoring system based on micro-service gateway | |
CN101515864B (en) | Alarm information allocation system and allocation method thereof | |
CN112134927A (en) | Power grid data exchange platform and data exchange method based on plug-in mode | |
CN117707885A (en) | Multi-cluster monitoring index processing method and system | |
CN115757045A (en) | Transaction log analysis method, system and device | |
CN115718690A (en) | Data accuracy monitoring system and method | |
US8825630B2 (en) | Method and system for generic enterprise search adapter queries | |
CN109033196A (en) | A kind of distributed data scheduling system and method | |
CN113900898B (en) | Data processing system, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |