CN115658447A - Cluster resource monitoring method, device, equipment and storage medium - Google Patents

Cluster resource monitoring method, device, equipment and storage medium Download PDF

Info

Publication number
CN115658447A
CN115658447A CN202211131290.4A CN202211131290A CN115658447A CN 115658447 A CN115658447 A CN 115658447A CN 202211131290 A CN202211131290 A CN 202211131290A CN 115658447 A CN115658447 A CN 115658447A
Authority
CN
China
Prior art keywords
data
audit
cluster
monitoring
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211131290.4A
Other languages
Chinese (zh)
Inventor
冯洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202211131290.4A priority Critical patent/CN115658447A/en
Publication of CN115658447A publication Critical patent/CN115658447A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of cloud, and discloses a method, a device, equipment and a storage medium for monitoring cluster resources, which are used for improving the real-time performance and accuracy of cluster anomaly discovery. The cluster resource monitoring method comprises the following steps: recording each resource operation request of the target cluster, and generating audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request; acquiring target audit data which accords with a preset audit strategy in all initial audit data, and writing the target audit data into an audit log file of a corresponding node in a target cluster; collecting audit log files of all nodes in real time, and writing the audit log files into a data warehouse of a target cluster; acquiring preset monitoring index data through a data warehouse of a target cluster, and performing statistics to obtain a statistical result; and monitoring and early warning audit data of the target cluster. In addition, the invention also relates to a block chain technology, and the statistical result can be stored in the block chain node.

Description

Cluster resource monitoring method, device, equipment and storage medium
Technical Field
The present invention relates to the field of log monitoring technologies, and in particular, to a method, an apparatus, a device, and a storage medium for monitoring cluster resources.
Background
With the development of cloud technology, more and more projects and data are operated, stored and clustered, and therefore, the guarantee of the safety and reliability of the cluster is a precondition for the safety of the projects and the data.
The prior art can generally monitor the operation of a cluster in real time, and the implementation mode of the technology generally carries out index analysis on the log of the cluster so as to find the abnormity in the operation process, but because the log data generated by different nodes in the cluster are complex and diverse and the randomness of the storage position is high, the cluster is difficult to carry out accurate comprehensive analysis on the whole operation state, and the technical problem that the abnormal discovery of the cluster is untimely and inaccurate exists.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for monitoring cluster resources, which are used for improving the real-time performance and accuracy of cluster abnormity discovery.
The first aspect of the present invention provides a method for monitoring cluster resources, including:
recording each resource operation request of the target cluster, and generating audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request;
acquiring target audit data which accords with a preset audit strategy in all initial audit data, and writing the target audit data into an audit log file of a corresponding node in the target cluster;
acquiring audit log files of all nodes in real time, and writing acquisition results into a data warehouse of the target cluster in real time;
acquiring preset monitoring index data through a data warehouse of the target cluster, and counting the monitoring index data to obtain a statistical result;
and monitoring and early warning audit data of the target cluster based on the statistical result.
Optionally, in a first implementation manner of the first aspect of the present invention, the recording each resource operation request of the target cluster, and performing audit data generation on each resource operation request to obtain initial audit data corresponding to each resource operation request includes:
recording each resource operation request of a target cluster through an interface service component in the target cluster, wherein the interface service component is used for managing a resource operation request inlet of the target cluster;
and generating an audit event for each request stage information of each resource operation request to obtain initial audit data corresponding to each resource operation request, wherein each request stage information is used for indicating request record information of the resource operation request in each request state.
Optionally, in a second implementation manner of the first aspect of the present invention, the obtaining target audit data that meets a preset audit policy from all initial audit data, and writing the target audit data into an audit log file of a corresponding node in the target cluster includes:
screening request source end information and request resource information of initial audit data corresponding to each resource operation request through a preset audit strategy to obtain target audit data;
and performing data exchange format processing on the target audit data to obtain target audit data in a data exchange format, and writing the target audit data in the data exchange format into an audit log file of a corresponding node in the target cluster, wherein the audit log file is stored in a preset storage directory which is mounted in the corresponding node in advance.
Optionally, in a third implementation manner of the first aspect of the present invention, the acquiring audit log files of each node in real time, and writing an acquisition result into a data warehouse of the target cluster in real time includes:
performing audit log file access on a preset storage directory pre-mounted on each node through a log component pre-mounted on each node in the target cluster, and performing real-time line-by-line acquisition on the accessed audit log files to obtain an acquisition result, wherein the log component has access rights of all directories in the corresponding node;
and analyzing the data exchange format of the acquired result to obtain an analysis result, and after packaging the analysis result, sending the analysis result to the data warehouse of the target cluster through log service.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the obtaining preset monitoring index data through the data warehouse of the target cluster, and performing statistics on the monitoring index data to obtain a statistical result includes:
inquiring monitoring index data corresponding to the monitoring index from a data warehouse of the target cluster through a preset monitoring index;
and counting audit data and audit events of the monitoring index data in a preset period to obtain a statistical result.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the monitoring and early warning of audit data for the target cluster based on the statistical result includes:
and monitoring and early warning the statistical result in real time through a preset early warning threshold value so as to monitor and early warn audit data of the target cluster.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after the monitoring and warning of audit data of the target cluster is performed based on the statistical result, the monitoring method of cluster resources further includes:
receiving a monitoring index query request, and querying monitoring index data corresponding to the monitoring index query request through a data warehouse of the target cluster;
receiving an alarm setting request and creating a monitoring task corresponding to the alarm setting request;
and monitoring and early warning the monitoring index data corresponding to the monitoring index query request through the monitoring task corresponding to the alarm setting request.
A second aspect of the present invention provides a monitoring apparatus for cluster resources, including:
the recording module is used for recording each resource operation request of the target cluster and generating audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request;
the write-in module is used for acquiring target audit data which accords with a preset audit strategy in all the initial audit data and writing the target audit data into an audit log file of a corresponding node in the target cluster;
the acquisition module is used for acquiring the audit log files of all the nodes in real time and writing the acquisition results into the data warehouse of the target cluster in real time;
the statistical module is used for acquiring preset monitoring index data through a data warehouse of the target cluster, and performing statistics on the monitoring index data to obtain a statistical result;
and the monitoring module is used for monitoring and early warning audit data of the target cluster based on the statistical result.
Optionally, in a first implementation manner of the second aspect of the present invention, the recording module is specifically configured to:
recording each resource operation request of a target cluster through an interface service component in the target cluster, wherein the interface service component is used for managing a resource operation request inlet of the target cluster;
and generating an audit event for each request stage information of each resource operation request to obtain initial audit data corresponding to each resource operation request, wherein each request stage information is used for indicating request record information of the resource operation request in each request state.
Optionally, in a second implementation manner of the second aspect of the present invention, the writing module is specifically configured to:
screening request source end information and request resource information of initial audit data corresponding to each resource operation request through a preset audit strategy to obtain target audit data;
and performing data exchange format processing on the target audit data to obtain target audit data in a data exchange format, and writing the target audit data in the data exchange format into an audit log file of a corresponding node in the target cluster, wherein the audit log file is stored in a preset storage directory which is mounted in the corresponding node in advance.
Optionally, in a third implementation manner of the second aspect of the present invention, the acquisition module is specifically configured to:
performing audit log file access on a preset storage directory pre-mounted on each node through a log component pre-mounted on each node in the target cluster, and performing real-time row-based acquisition on the accessed audit log file to obtain an acquisition result, wherein the log component has access rights of all directories in the corresponding node;
and analyzing the data exchange format of the acquired result to obtain an analysis result, and after packaging the analysis result, sending the analysis result to the data warehouse of the target cluster through log service.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the statistical module is specifically configured to:
inquiring monitoring index data corresponding to the monitoring index from a data warehouse of the target cluster through a preset monitoring index;
and counting audit data and audit events of the monitoring index data in a preset period to obtain a statistical result.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the monitoring module is specifically configured to:
and monitoring and early warning the statistical result in real time through a preset early warning threshold value so as to monitor and early warn audit data of the target cluster.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the monitoring apparatus for cluster resources further includes:
the query module is used for receiving a monitoring index query request and querying monitoring index data corresponding to the monitoring index query request through a data warehouse of the target cluster;
the system comprises a creating module, a monitoring module and a sending module, wherein the creating module is used for receiving an alarm setting request and creating a monitoring task corresponding to the alarm setting request;
and the execution module is used for monitoring and early warning the monitoring index data corresponding to the monitoring index query request through the monitoring task corresponding to the alarm setting request.
A third aspect of the present invention provides a monitoring device for cluster resources, including: a memory and at least one processor, the memory having a computer program stored therein; the at least one processor calls the computer program in the memory to cause the monitoring device of the cluster resource to execute the above-mentioned monitoring method of the cluster resource.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to perform the above-mentioned method of monitoring cluster resources.
In the technical scheme provided by the invention, each resource operation request of a target cluster is recorded, and audit data is generated for each resource operation request to obtain initial audit data corresponding to each resource operation request; acquiring target audit data which accords with a preset audit strategy in all initial audit data, and writing the target audit data into an audit log file of a corresponding node in the target cluster; acquiring audit log files of all nodes in real time, and writing acquisition results into a data warehouse of the target cluster in real time; acquiring preset monitoring index data through a data warehouse of the target cluster, and counting the monitoring index data to obtain a statistical result; and monitoring and early warning audit data of the target cluster based on the statistical result. In the embodiment of the invention, the audit data corresponding to each request is generated based on a request recording mode on a cluster level, the audit data conforming to the audit strategy is written into the audit log file of the corresponding node, and the audit log file of each node is collected and written into the data warehouse in real time on the cluster level, so that the audit data for cluster anomaly analysis is not dispersed in different storage directories of different nodes in different storage structures but is concentrated in the data warehouse of a target cluster, the monitoring index data of any monitoring index can be directly obtained from the data warehouse of the target cluster, and statistics, analysis, monitoring and early warning are carried out, thereby improving the instantaneity and accuracy of cluster anomaly discovery.
Drawings
Fig. 1 is a schematic diagram of an embodiment of a monitoring method for cluster resources in an embodiment of the present invention;
fig. 2 is a schematic diagram of another embodiment of a monitoring method for cluster resources in the embodiment of the present invention;
fig. 3 is a schematic diagram of an embodiment of a monitoring apparatus for cluster resources in an embodiment of the present invention;
fig. 4 is a schematic diagram of another embodiment of a monitoring apparatus for cluster resources in an embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of a monitoring device for cluster resources in the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for monitoring cluster resources, which are used for improving the real-time performance and accuracy of cluster anomaly discovery.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be implemented in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It is to be understood that the execution subject of the present invention may be a monitoring apparatus of a cluster resource, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
For convenience of understanding, a specific flow of an embodiment of the present invention is described below, with reference to fig. 1, an embodiment of a method for monitoring cluster resources in an embodiment of the present invention includes:
101. recording each resource operation request of the target cluster, and generating audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request;
it should be noted that the resource operation request of the target cluster includes a resource configuration request and a resource usage request for any node in the target cluster, for example, the internal configuration of one node in the target cluster is changed to belong to the resource configuration request, and the database of one node in the target cluster is accessed to belong to the resource usage request, which is not limited herein. It will be appreciated that the resource configuration request is typically sent by a user having the identity of an administrator of the target cluster, such as an administrator of the target cluster, and the resource usage request is typically sent by the identity of a user of a service function carried by the target cluster, such as a user of a certain service system. In the embodiment, the resource operation requests of the internal and external identity users of the target cluster are recorded, so that the operation records of the internal and external personnel on the cluster can be obtained, and the cluster can be monitored more comprehensively and perfectly.
In this embodiment, the audit data includes record data of a preset audit event, by way of example and not limitation, the audit data includes, but is not limited to, relevant parameters when the audit event occurs, such as the occurrence time, the occurrence node, the relevant component information, the relevant system information, the source end information, and the log information of the preset audit event, and the preset audit event may be an event that operates on a target cluster, such as a resource configuration event, a resource request event, a resource download event, and a resource migration event of the target cluster, and is not limited herein. The server generates audit data for each resource operation request to obtain initial audit data in an audit event form corresponding to each resource operation request, so that monitoring of the target cluster can be based on the operation event of the target cluster instead of complex and diverse log files, the difficulty of cluster anomaly analysis is reduced, and the accuracy and the real-time performance of cluster anomaly discovery are improved.
It should be noted that, recording each resource operation request of the target cluster is a recording operation on a cluster level, rather than a recording operation on a node level, and in an embodiment, each resource operation request of the target cluster is recorded by a container orchestration system of the target cluster, where the container orchestration system includes a cluster entry management component, and further, each resource operation request of the target cluster is recorded by a cluster entry management component in the container orchestration system of the target cluster, for example, the cluster entry management component may be an Interface service component, and the cluster entry management component is configured to manage cluster entry data such as a cluster entry API (Application Programming Interface), a cluster resource configuration entry, and a cluster security entry.
102. Acquiring target audit data which accords with a preset audit strategy in all initial audit data, and writing the target audit data into an audit log file of a corresponding node in a target cluster;
it should be noted that the preset audit policy is used for indicating the screening condition of the audit data, such as the screening of important audit events such as public network access, command execution, resource deletion, confidential dictionary access, and the like, and the preset audit policy can be configured manually through an administrator terminal of a target cluster, and can also be automatically associated through monitoring indexes, and after the server screens the initial audit data through the preset audit policy, the target audit data which accords with the preset audit policy in all the initial audit data is obtained, so that the efficiency of obtaining subsequent monitoring index data can be improved, the monitoring of the cluster is more targeted, and the efficiency is higher.
As an example and not by way of limitation, the preset audit policy may also configure screening of audit data for access to a specific user identity or a specific cluster resource, so as to obtain audit data of a specific user or a specific resource, so as to perform more targeted monitoring on the cluster.
It should be noted that, different from the conventional technique in which each cluster node searches for a log file to perform exception analysis, after acquiring target audit data on a cluster level, a server directly writes the target audit data into an audit log file of a corresponding node in a target cluster, so that the audit data can be summarized through the cluster level and then distributed to the corresponding node.
In the embodiment, in order to avoid the change of the storage position of the audit log file in the node, which results in the improvement of the collection difficulty of the audit data, the audit log file is pre-mounted on the preset storage directory of the corresponding node in the storage directory of the corresponding node, and the server writes the target audit data into the audit log file of the corresponding node in the target cluster through the preset storage directory of the corresponding node, so that a data basis is provided for the real-time collection of the audit log file of each subsequent node, and the real-time property of the collection is improved.
103. Acquiring audit log files of all nodes in real time, and writing acquisition results into a data warehouse of a target cluster in real time;
in this embodiment, since the target audit data is written into the audit log file of the corresponding node in the target cluster in real time, in order to summarize the audit log files of all nodes in real time, in an embodiment, the server collects the audit log files of all nodes in real time through the log component in the container arrangement system of the target cluster, that is, the step of collecting the audit log files of all nodes in real time is also an operation on the cluster level, and the log component is pre-mounted under the root directory of the container arrangement system, has access rights of all directories of the nodes, which is equivalent to the access right of the storage directory having the audit log files, so that the audit data of each node can be obtained in real time through the log component on the cluster level, thereby improving the real-time performance of cluster monitoring.
It should be noted that, the server acquires the audit log files of each node in real time, and after acquiring the acquisition result, writes the acquisition result into the data warehouse of the target cluster in real time, specifically, the server writes the acquisition result into the data warehouse of the corresponding node in the target cluster in real time. Before step 103, the log component of the target cluster creates a corresponding data table in the data warehouse for each node, so that the server writes the acquisition result into the data table of the corresponding node in the data warehouse in real time. By means of the data warehouse technology, the analysis and processing efficiency of audit data can be improved, and therefore the efficiency of cluster anomaly monitoring is improved.
104. Acquiring preset monitoring index data through a data warehouse of a target cluster, and counting the monitoring index data to obtain a statistical result;
it can be understood that the data warehouse can connect and analyze the service data from the heterogeneous sources, so that the data analysis of the monitoring index is performed through the data warehouse of the target cluster, the data analysis efficiency can be improved, and the data analysis difficulty can be reduced. Specifically, the server obtains preset monitoring index data through a collection result stored in a data warehouse of the target cluster, where the preset monitoring index data is used to indicate the collection data of a preset monitoring index, for example, the preset monitoring index may be an index such as a request duration, a request frequency, an audit event, and the like, the collection data of the preset monitoring index is collected data corresponding to the preset monitoring index in the collection result, for example, the request duration may be 1 second, 200 milliseconds, and the like in the collection result to indicate a time length for completing a request, and the request frequency may be a request frequency of a security dictionary, a request frequency of a high-level resource configuration, and the like in the collection result, and the specific details are not limited herein.
In this embodiment, since the monitoring index data is data in a preset period, such as within 1 hour, within 20 minutes, within 1 day, and the like, in order to analyze, monitor and early warn the data in the preset period, after the server acquires the monitoring index data, the server performs data statistics on the monitoring index data to obtain a statistical result, for example, if the preset monitoring index is a request duration, the request duration in the preset period is counted to obtain a total request duration in the preset period, which is the statistical result.
105. And monitoring and early warning audit data of the target cluster based on the statistical result.
It can be understood that, since the statistical result is obtained based on the audit data, the server may monitor and warn the audit data of the target cluster based on the statistical result, for example, when the obtained statistical result is a total requested duration in a preset period, the server may timely request an average duration in the preset period, and set an average duration threshold for monitoring the average requested duration in the preset period, and when the average requested duration is greater than the average duration threshold, send a warning, so that the warning receiving terminal may obtain a warning message in real time and take further countermeasures.
For example, if the statistical result includes statistical data of a target audit event, the server may monitor and warn the target audit event, for example, if the target audit event is an access of a secure dictionary, when the statistical result obtained by the server includes the statistical data of the target audit event, if the number of occurrences of the target audit event is 1, the server sends a warning message to the warning receiving terminal to prompt the warning receiving terminal that the secure dictionary is accessed 1 time, and the warning receiving terminal may trace and check the access event of the secure dictionary, thereby improving the security of the secure data in the cluster.
Further, the server stores the statistical result in a blockchain database, which is not limited herein.
In the embodiment of the invention, the audit data corresponding to each request is generated based on a request recording mode on a cluster level, the audit data conforming to the audit strategy is written into the audit log file of the corresponding node, and the audit log file of each node is collected and written into the data warehouse in real time on the cluster level, so that the audit data for cluster anomaly analysis is not dispersed in different storage directories of different nodes in different storage structures but is concentrated in the data warehouse of a target cluster, the monitoring index data of any monitoring index can be directly obtained from the data warehouse of the target cluster, and statistics, analysis, monitoring and early warning are carried out, thereby improving the instantaneity and accuracy of cluster anomaly discovery.
Referring to fig. 2, another embodiment of the monitoring method for cluster resources in the embodiment of the present invention includes:
201. recording each resource operation request of the target cluster through an interface service component in the target cluster, wherein the interface service component is used for managing a resource operation request inlet of the target cluster;
it should be noted that, different from the traditional node level log collection, the cluster level-based recording mode can facilitate the collection of cluster resource operation records, and does not need to collect the log file of each node, so that the efficiency of obtaining the cluster resource operation records is higher, and the problems that the cluster resource operation records are omitted and abnormal cluster discovery is not timely due to the fact that the storage position and the storage mode of each node are different for the log files are avoided. In this embodiment, each resource operation request of the target cluster is recorded by the interface service component in the target cluster, that is, a recording mode in a cluster level, in one embodiment, a container arrangement system is installed in the target cluster, and each resource operation request of the target cluster can be recorded by the interface service component in the container arrangement system, where the interface service component is used to manage a resource operation request entry of the target cluster, which is a necessary path for each resource operation request, and therefore, any resource operation request, including a resource configuration request and a resource usage request, is not missed by the recording mode in the cluster level.
202. Generating an audit event for each request stage information of each resource operation request to obtain initial audit data corresponding to each resource operation request, wherein each request stage information is used for indicating request record information of the resource operation request in each request state;
it should be noted that each resource operation request includes multiple request stages, such as a request receiving stage, a request responding completing stage, and a request responding unfinished stage, which are corresponding to different states of the request, and the server generates an audit event for each request stage message of each resource operation request based on a preset audit event to obtain initial audit data corresponding to each resource operation request, for example, assuming that the preset audit event is an access event of a security dictionary, the server packages request record information of the resource operation request of the access event of the security dictionary in each request stage and the audit event to obtain initial audit data corresponding to the resource operation request, that is, the initial audit data includes request record information of the resource operation request in each request state and corresponding audit event message, where the request record information may be time, request source information, request resource information, and other related record parameters in different request states, and is not limited herein.
203. Acquiring target audit data which accords with a preset audit strategy in all initial audit data, and writing the target audit data into an audit log file of a corresponding node in a target cluster;
specifically, step 203 includes: screening request source end information and request resource information of initial audit data corresponding to each resource operation request through a preset audit strategy to obtain target audit data; and performing data exchange format processing on the target audit data to obtain target audit data in a data exchange format, writing the target audit data in the data exchange format into an audit log file of a corresponding node in a target cluster, wherein the audit log file is stored in a preset storage directory which is mounted in the corresponding node in advance.
In this embodiment, in order to monitor a request source end and a request resource, a server performs screening of request source end information and request resource information on initial audit data corresponding to each resource operation request through a preset audit policy to obtain target audit data meeting the preset audit policy, and then performs data exchange format conversion on the target audit data to obtain target audit data in a standardized data exchange format, where the data exchange format may be an xml format or a JSON format, and is not limited specifically herein. And then, the server writes the target audit data in the data exchange format into an audit log file of the preset storage directory of the corresponding node in the target cluster through the preset storage directory which is mounted in the corresponding node in advance, so that the target audit data are synchronized into the corresponding node.
In one embodiment, the server determines which node the target audit data belongs to through the request resource information in the target audit data, that is, the request resource node corresponding to the resource operation request is the node corresponding to the target audit data, and the server writes the target audit data into the audit log file of the node corresponding to the target audit data.
204. Acquiring audit log files of all nodes in real time, and writing acquisition results into a data warehouse of a target cluster in real time;
specifically, step 204 includes: performing audit log file access on a preset storage directory pre-mounted on each node through a log component pre-mounted on each node in a target cluster, and performing real-time row-based acquisition on the accessed audit log file to obtain an acquisition result, wherein the log component has access rights of all directories in the corresponding node; and analyzing the acquired result in a data exchange format to obtain an analysis result, packaging the analysis result, and sending the analysis result to a data warehouse of the target cluster through a log service.
In this embodiment, the server accesses the audit log file to the preset storage directory pre-mounted on each node through the log component pre-mounted on each node in the target cluster, and because the log component pre-mounted on each node in the target cluster has access rights to all directories corresponding to the node, the audit log file can be quickly accessed through the log component, and the accessed audit log file is collected in real time in rows to obtain a collection result, wherein the collection result includes all data in the audit log file. And then, the server analyzes the acquired result in a data exchange format to obtain an analysis result in accordance with a storage format of the data warehouse, and the analysis result is packaged and then sent to the data warehouse of the target cluster, so that the data warehouse directly stores the analysis result to a data table of a corresponding node for subsequent cluster anomaly analysis, monitoring and early warning.
205. Acquiring preset monitoring index data through a data warehouse of a target cluster, and counting the monitoring index data to obtain a statistical result;
specifically, step 205 includes: inquiring monitoring index data corresponding to the monitoring index from a data warehouse of the target cluster through a preset monitoring index; and carrying out statistics on audit data and audit events on the monitoring index data in a preset period to obtain a statistical result.
In this embodiment, in order to analyze, count and analyze audit data in a preset period, the server queries, through a preset monitoring index, monitoring index data corresponding to the monitoring index from a data warehouse of a target cluster, where the monitoring index data is used to indicate target audit data corresponding to the monitoring index, and the server performs statistics on the audit data and an audit event on the monitoring index data in the preset period to obtain statistical results of the audit data and the audit event, so as to perform subsequent analysis, monitoring and early warning on the audit data and the audit event.
206. And monitoring and early warning audit data of the target cluster based on the statistical result.
Specifically, step 206 includes: and monitoring and early warning the statistical result in real time through a preset early warning threshold value so as to monitor and early warn audit data of the target cluster.
In the embodiment, the server monitors and warns the audit data, the audit event and the monitoring index data in the statistical result in real time through the set prewarning threshold, and when the data exceed the prewarning threshold, the server sends a prewarning message to the prewarning receiving terminal so that the prewarning receiving terminal can take further measures.
Further, after step 206, the method further includes: receiving a monitoring index query request, and querying monitoring index data corresponding to the monitoring index query request through a data warehouse of a target cluster; receiving an alarm setting request and creating a monitoring task corresponding to the alarm setting request; and monitoring and early warning the monitoring index data corresponding to the monitoring index query request through the monitoring task corresponding to the alarm setting request.
In this embodiment, the terminal may further query the monitoring index data and set the early warning threshold to create a new data monitoring task, and specifically, the server receives a monitoring index query request sent by the terminal, and queries the monitoring index parameters corresponding to the monitoring index query request through a data warehouse of the target cluster, so that a terminal user may quickly and flexibly obtain data related to the monitoring index. And then, the server receives an alarm setting request sent by the terminal, creates a monitoring task corresponding to the alarm setting request, and monitors and warns monitoring index data corresponding to the monitoring index query request through the monitoring task. According to the embodiment, the monitoring indexes and the warning lines can be flexibly set based on the monitoring logic of the cluster resources, the data monitoring task conforming to the monitoring logic is established, and the flexible monitoring of the cluster resources is realized.
In the embodiment of the invention, each request record of cluster resources is recorded through an interface service component based on a cluster level, audit data corresponding to each request record is generated, the audit data conforming to an audit strategy is written into an audit log file of a corresponding node, and the audit log file of each node is collected and written into a data warehouse in real time on the cluster level, so that the audit data for cluster anomaly analysis is not dispersed in different storage directories of different nodes in different storage structures but is concentrated in the data warehouse of a target cluster, monitoring index data of any monitoring index can be directly obtained from the data warehouse of the target cluster, and statistics, analysis, monitoring and early warning are carried out, and the real-time performance and the accuracy of cluster anomaly discovery are improved.
With reference to fig. 3, the method for monitoring cluster resources in the embodiment of the present invention is described above, and a monitoring apparatus for cluster resources in the embodiment of the present invention is described below, where an embodiment of the monitoring apparatus for cluster resources in the embodiment of the present invention includes:
a recording module 301, configured to record each resource operation request of the target cluster, and generate audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request;
a write-in module 302, configured to obtain target audit data that meets a preset audit policy in all initial audit data, and write the target audit data into an audit log file of a corresponding node in the target cluster;
the acquisition module 303 is configured to acquire the audit log files of each node in real time, and write the acquisition results into the data warehouse of the target cluster in real time;
the statistical module 304 is configured to obtain preset monitoring index data through the data warehouse of the target cluster, and perform statistics on the monitoring index data to obtain a statistical result;
and a monitoring module 305, configured to monitor and early warn audit data of the target cluster based on the statistical result. Further, the statistical result is stored in the blockchain database, which is not limited herein.
In the embodiment of the invention, the audit data corresponding to each request is generated based on a request recording mode on a cluster level, the audit data conforming to an audit strategy is written into the audit log file of a corresponding node, and the audit log file of each node is acquired and written into a data warehouse in real time on the cluster level, so that the audit data for cluster anomaly analysis is not dispersed in different storage directories of different nodes in different storage structures but concentrated in the data warehouse of a target cluster, the monitoring index data of any monitoring index can be directly acquired from the data warehouse of the target cluster, and statistics, analysis, monitoring and early warning are carried out, thereby improving the instantaneity and accuracy of cluster anomaly discovery.
Referring to fig. 4, another embodiment of the monitoring apparatus for cluster resources according to the embodiment of the present invention includes:
a recording module 301, configured to record each resource operation request of the target cluster, and generate audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request;
a write-in module 302, configured to obtain target audit data that meets a preset audit policy in all initial audit data, and write the target audit data into an audit log file of a corresponding node in the target cluster;
the acquisition module 303 is configured to acquire the audit log files of each node in real time, and write the acquisition results into the data warehouse of the target cluster in real time;
a statistics module 304, configured to obtain preset monitoring index data through a data warehouse of the target cluster, and perform statistics on the monitoring index data to obtain a statistical result;
and the monitoring module 305 is configured to monitor and early warn audit data of the target cluster based on the statistical result.
Optionally, the recording module 301 is specifically configured to:
recording each resource operation request of a target cluster through an interface service component in the target cluster, wherein the interface service component is used for managing a resource operation request inlet of the target cluster;
and generating an audit event for each request stage information of each resource operation request to obtain initial audit data corresponding to each resource operation request, wherein each request stage information is used for indicating request record information of the resource operation request in each request state.
Optionally, the writing module 302 is specifically configured to:
screening request source end information and request resource information of initial audit data corresponding to each resource operation request through a preset audit strategy to obtain target audit data;
and performing data exchange format processing on the target audit data to obtain target audit data in a data exchange format, and writing the target audit data in the data exchange format into an audit log file of a corresponding node in the target cluster, wherein the audit log file is stored in a preset storage directory which is mounted in the corresponding node in advance.
Optionally, the acquisition module 303 is specifically configured to:
performing audit log file access on a preset storage directory pre-mounted on each node through a log component pre-mounted on each node in the target cluster, and performing real-time row-based acquisition on the accessed audit log file to obtain an acquisition result, wherein the log component has access rights of all directories in the corresponding node;
and analyzing the data exchange format of the acquired result to obtain an analysis result, and after packaging the analysis result, sending the analysis result to the data warehouse of the target cluster through log service.
Optionally, the statistical module 304 is specifically configured to:
inquiring monitoring index data corresponding to the monitoring index from a data warehouse of the target cluster through a preset monitoring index;
and carrying out statistics on audit data and audit events on the monitoring index data in a preset period to obtain a statistical result.
Optionally, the monitoring module 305 is specifically configured to:
and monitoring and early warning the statistical result in real time through a preset early warning threshold value so as to monitor and early warn audit data of the target cluster.
Optionally, the monitoring apparatus for cluster resources further includes:
the query module 306 is configured to receive a monitoring index query request, and query monitoring index data corresponding to the monitoring index query request through a data warehouse of the target cluster;
a creating module 307, configured to receive an alarm setting request, and create a monitoring task corresponding to the alarm setting request;
and the execution module 308 is configured to monitor and early warn the monitoring index data corresponding to the monitoring index query request through the monitoring task corresponding to the alarm setting request.
In the embodiment of the invention, each request record of cluster resources is recorded through an interface service component based on a cluster level, audit data corresponding to each request record is generated, the audit data conforming to an audit strategy is written into an audit log file of a corresponding node, and the audit log file of each node is collected and written into a data warehouse in real time on the cluster level, so that the audit data for cluster anomaly analysis is not dispersed in different storage directories of different nodes in different storage structures but is concentrated in the data warehouse of a target cluster, monitoring index data of any monitoring index can be directly obtained from the data warehouse of the target cluster, and statistics, analysis, monitoring and early warning are carried out, and the real-time performance and the accuracy of cluster anomaly discovery are improved.
Fig. 3 and fig. 4 describe the monitoring apparatus of the cluster resource in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the monitoring device of the cluster resource in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a monitoring device for cluster resources, which may have a relatively large difference due to different configurations or performances, and includes one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532 according to an embodiment of the present invention. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of computer program operations in the monitoring device 500 for cluster resources. Still further, the processor 510 may be arranged to communicate with the storage medium 530, to execute a series of computer program operations in the storage medium 530 on the monitoring device 500 of the cluster resource.
The cluster resource monitoring device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, mac OS X, unix, linux, freeBSD, etc. Those skilled in the art will appreciate that the monitoring device configuration of the clustered resource illustrated in fig. 5 does not constitute a limitation of the monitoring device of the clustered resource and may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components.
The present invention also provides a computer device, which includes a memory and a processor, where the memory stores a computer-readable computer program, and when the computer-readable computer program is executed by the processor, the processor is caused to execute the steps of the method for monitoring cluster resources in the foregoing embodiments.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored thereon a computer program, which, when run on a computer, causes the computer to perform the steps of the method for monitoring of cluster resources.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several computer programs to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for monitoring cluster resources is characterized in that the method for monitoring cluster resources comprises the following steps:
recording each resource operation request of the target cluster, and generating audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request;
obtaining target audit data which accords with a preset audit strategy in all initial audit data, and writing the target audit data into an audit log file of a corresponding node in a target cluster;
acquiring audit log files of all nodes in real time, and writing acquisition results into a data warehouse of the target cluster in real time;
acquiring preset monitoring index data through a data warehouse of the target cluster, and counting the monitoring index data to obtain a statistical result;
and monitoring and early warning audit data of the target cluster based on the statistical result.
2. The method for monitoring cluster resources of claim 1, wherein the recording each resource operation request of the target cluster, and generating audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request comprises:
recording each resource operation request of a target cluster through an interface service component in the target cluster, wherein the interface service component is used for managing a resource operation request inlet of the target cluster;
and generating an audit event for each request stage information of each resource operation request to obtain initial audit data corresponding to each resource operation request, wherein each request stage information is used for indicating request record information of the resource operation request in each request state.
3. The method for monitoring cluster resources of claim 1, wherein the obtaining of target audit data that meets a preset audit policy among all initial audit data and writing the target audit data into an audit log file of a corresponding node in the target cluster comprises:
screening request source end information and request resource information of initial audit data corresponding to each resource operation request through a preset audit strategy to obtain target audit data;
and performing data exchange format processing on the target audit data to obtain target audit data in a data exchange format, and writing the target audit data in the data exchange format into an audit log file of a corresponding node in the target cluster, wherein the audit log file is stored in a preset storage directory which is mounted in the corresponding node in advance.
4. The method for monitoring cluster resources of claim 1, wherein the collecting audit log files of each node in real time and writing the collected results into the data warehouse of the target cluster in real time comprises:
performing audit log file access on a preset storage directory pre-mounted on each node through a log component pre-mounted on each node in the target cluster, and performing real-time line-by-line acquisition on the accessed audit log files to obtain an acquisition result, wherein the log component has access rights of all directories in the corresponding node;
and analyzing the acquired result in a data exchange format to obtain an analysis result, packaging the analysis result, and sending the analysis result to the data warehouse of the target cluster through the log service.
5. The method for monitoring cluster resources of claim 1, wherein the obtaining preset monitoring index data through the data warehouse of the target cluster, and performing statistics on the monitoring index data to obtain a statistical result comprises:
inquiring monitoring index data corresponding to the monitoring index from a data warehouse of the target cluster through a preset monitoring index;
and carrying out statistics on audit data and audit events on the monitoring index data in a preset period to obtain a statistical result.
6. The method for monitoring cluster resources of claim 1, wherein the monitoring and warning of audit data of the target cluster based on the statistical result comprises:
and monitoring and early warning the statistical result in real time through a preset early warning threshold value so as to monitor and early warn audit data of the target cluster.
7. The method for monitoring cluster resources of any one of claims 1-6, wherein after the monitoring and pre-warning of audit data of the target cluster based on the statistical result, the method for monitoring cluster resources further comprises:
receiving a monitoring index query request, and querying monitoring index data corresponding to the monitoring index query request through a data warehouse of the target cluster;
receiving an alarm setting request and creating a monitoring task corresponding to the alarm setting request;
and monitoring and early warning the monitoring index data corresponding to the monitoring index query request through the monitoring task corresponding to the alarm setting request.
8. A monitoring apparatus for cluster resources, the monitoring apparatus for cluster resources comprising:
the recording module is used for recording each resource operation request of the target cluster and generating audit data for each resource operation request to obtain initial audit data corresponding to each resource operation request;
the write-in module is used for acquiring target audit data which accords with a preset audit strategy in all the initial audit data and writing the target audit data into an audit log file of a corresponding node in the target cluster;
the acquisition module is used for acquiring the audit log files of all the nodes in real time and writing the acquisition results into the data warehouse of the target cluster in real time;
the statistical module is used for acquiring preset monitoring index data through a data warehouse of the target cluster, and performing statistics on the monitoring index data to obtain a statistical result;
and the monitoring module is used for monitoring and early warning audit data of the target cluster based on the statistical result.
9. A monitoring device for a cluster resource, the monitoring device for the cluster resource comprising: a memory and at least one processor, the memory having stored therein a computer program;
the at least one processor invokes the computer program in the memory to cause the monitoring device of the cluster resource to perform the monitoring method of the cluster resource according to any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for monitoring cluster resources according to any one of claims 1 to 7.
CN202211131290.4A 2022-09-16 2022-09-16 Cluster resource monitoring method, device, equipment and storage medium Pending CN115658447A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211131290.4A CN115658447A (en) 2022-09-16 2022-09-16 Cluster resource monitoring method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211131290.4A CN115658447A (en) 2022-09-16 2022-09-16 Cluster resource monitoring method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115658447A true CN115658447A (en) 2023-01-31

Family

ID=84983678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211131290.4A Pending CN115658447A (en) 2022-09-16 2022-09-16 Cluster resource monitoring method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115658447A (en)

Similar Documents

Publication Publication Date Title
CN107992398B (en) Monitoring method and monitoring system of service system
CN110245078B (en) Software pressure testing method and device, storage medium and server
CN112035404B (en) Medical data monitoring and early warning method, device, equipment and storage medium
US7167915B2 (en) Monitoring storage resources used by computer applications distributed across a network
US7444263B2 (en) Performance metric collection and automated analysis
KR100772999B1 (en) Method and system for monitoring performance of applications in a distributed environment
US7457872B2 (en) On-line service/application monitoring and reporting system
US7275097B2 (en) System and method for analyzing input/output activity on local attached storage
WO2010111145A2 (en) Monitoring of distributed applications
CN109460307B (en) Micro-service calling tracking method and system based on log embedded point
CN111858251B (en) Data security audit method and system based on big data computing technology
US9600523B2 (en) Efficient data collection mechanism in middleware runtime environment
Sukhija et al. Event management and monitoring framework for HPC environments using ServiceNow and Prometheus
CN106951360B (en) Data statistical integrity calculation method and system
CN110011845B (en) Log collection method and system
CN115658447A (en) Cluster resource monitoring method, device, equipment and storage medium
JP2007089162A (en) Method, system and computer program for same value suppression of performance management data
Sosnowski et al. Monitoring event logs within a cluster system
CN113868094A (en) Big data abnormal information monitoring system
CN113407415A (en) Log management method and device of intelligent terminal
JP7424052B2 (en) Control program, control method and control device
TWI712880B (en) Information service availability management method and system
Kotsiuba et al. Multi-Database Monitoring Tool for the E-Health Services
Sosnowski et al. Exploring the space of system monitoring
CN113704068A (en) System, management method for operation of system, server, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination