CN116955079A - Distributed monitoring method, device, equipment and computer readable storage medium - Google Patents

Distributed monitoring method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN116955079A
CN116955079A CN202310932049.XA CN202310932049A CN116955079A CN 116955079 A CN116955079 A CN 116955079A CN 202310932049 A CN202310932049 A CN 202310932049A CN 116955079 A CN116955079 A CN 116955079A
Authority
CN
China
Prior art keywords
query
instance
monitored
monitoring data
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310932049.XA
Other languages
Chinese (zh)
Inventor
刘恩远
向超胜
刘舒鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Unicom Cloud Data Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Unicom Digital Technology Co Ltd
Unicom Cloud Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, Unicom Digital Technology Co Ltd, Unicom Cloud Data Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202310932049.XA priority Critical patent/CN116955079A/en
Publication of CN116955079A publication Critical patent/CN116955079A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Selective Calling Equipment (AREA)

Abstract

The application provides a distributed monitoring method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: responding to the query operation of a user, and determining the identification of the object to be monitored and the corresponding Prometaus through a query module according to the query condition input by the user in the query operation; acquiring monitoring data of an object to be monitored stored in Prometaus corresponding to the object to be monitored from a database; according to the query conditions, if the fact that the monitoring data do not need to be aggregated is determined, the monitoring data of the object to be monitored are sent to a user side used by a user; if the monitoring data are determined to be needed to be aggregated, the monitoring data of the object to be monitored are aggregated, and the aggregated monitoring data are sent to the user side. The method provided by the application can realize unified monitoring management of various examples, thereby meeting the personalized requirements of users, and simultaneously reducing the bandwidth waste caused by centralized inquiry based on a plurality of examples.

Description

Distributed monitoring method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a distributed monitoring method, device, apparatus, and computer readable storage medium.
Background
As cloud resource pools increase year by year, it is necessary for management personnel to grasp the situation of the clusters in time and monitor the clusters in real time. Cluster monitoring is generally classified into single cluster monitoring for local area networks and multi-cluster distributed monitoring for wide area networks. Among them, distributed monitoring applications are becoming more and more widespread.
At present, a method for distributed monitoring generally manages and analyzes a single cluster environment, cannot uniformly manage various examples, and causes bandwidth loss for centralized query of distributed monitoring data.
Therefore, the existing distributed monitoring method cannot perform unified monitoring management on various examples, and meanwhile, the problem of bandwidth loss caused by centralized query of distributed monitoring data exists.
Disclosure of Invention
The application provides a distributed monitoring method, a device, equipment and a computer readable storage medium, which can realize unified monitoring management of various examples, further meet the personalized requirements of users, and simultaneously reduce the bandwidth waste caused by centralized inquiry based on a plurality of examples.
In a first aspect, the present application provides a distributed monitoring method, applied to a distributed monitoring device, where the distributed monitoring device is configured in a server or an electronic device, and a monitoring alarm system promethaus is deployed for each cloud resource pool, where the promethaus is used to obtain and store monitoring data of an instance in the cloud resource pool based on registration information of the instance; the distributed monitoring device is used for monitoring the examples in each cloud resource pool through each Prometaus; the method comprises the following steps:
Responding to a query operation of a user, and determining an identification of an object to be monitored and a corresponding Prometaheus according to a query condition input by the user in the query operation through a query module in the distributed monitoring device; the object to be monitored comprises at least one instance;
acquiring monitoring data of the object to be monitored stored in Prometaus corresponding to the object to be monitored from a database;
according to the query conditions, if the fact that the monitoring data do not need to be aggregated is determined, the monitoring data of the object to be monitored are sent to a user side used by a user; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side.
In one possible design, the database stores the correspondence between each cloud resource pool and the corresponding promethaus; the determining the identification of the object to be monitored and the corresponding Prometaus according to the query condition input by the user in the query operation comprises:
determining a label of the object to be monitored according to the query condition input by the user in the query operation;
according to the label of the object to be monitored, determining a cloud resource pool to which the object to be monitored belongs;
And determining Prometaus for acquiring monitoring data of the object to be monitored according to the cloud resource pool to which the object to be monitored belongs through the corresponding relation.
In one possible design, the method further comprises:
determining that the query condition is a query of a single cloud resource pool or a query of a plurality of cloud resource pools;
if the query is a query of a single cloud resource pool, determining that the query operation is a fragmented query, wherein the fragmented query is used for indicating that the queried monitoring data is not required to be aggregated;
if the query is a query of a plurality of cloud resource pools, determining that the query operation is an aggregation query, wherein the aggregation query is used for indicating that the queried monitoring data needs to be aggregated.
In one possible design, the distributed monitoring apparatus includes a registration/configuration module; the method further comprises, prior to responding to a query operation by a user:
responding to the operation of creating the instance by the user, and registering the information of the instance with a registration/configuration module by calling a gateway;
storing registration information of the instance, the registration information including an IP address of the instance, a tag of the instance, and a path for providing a monitoring interface for a corresponding promethaus; wherein the Prometheus is used for periodically acquiring monitoring data of the instance through the path.
In one possible design, the registering the information of the instance with the registration/configuration module by calling a gateway in response to the operation of creating the instance by the user includes:
responding to the operation of creating an instance by a user, and if the instance is a container instance, sending information of the instance to the gateway through a registration engine; processing the information of the instance through the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information;
triggering and calling the gateway to register through an automatic discovery rule configured at the gateway if the instance is a traditional instance, wherein the traditional instance is used for representing the instance running in a virtual machine; and processing the information of the instance by calling the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information.
In one possible design, the method further comprises:
and responding to input operation of a user in an operable interface corresponding to the registration/configuration module, and configuring the registered instance.
In one possible design, the method further comprises:
And when determining that the monitoring task of the instance is off line, canceling the registration information of the instance through the registration/configuration module.
In a second aspect, the distributed monitoring device is configured in a server or an electronic device, and a monitoring alarm system promethaus is deployed for each cloud resource pool, where the promethaus is used for acquiring and storing monitoring data of an instance in the cloud resource pool based on registration information of the instance; the distributed monitoring device is used for monitoring the examples in each cloud resource pool through each Prometaus; the device comprises:
the query module is used for responding to the query operation of the user, and determining the identification of the object to be monitored and the corresponding Prometaus according to the query condition input by the user in the query operation through the query module in the distributed monitoring device; the object to be monitored comprises at least one instance;
the query module is further configured to obtain monitoring data of the object to be monitored stored in promethaus corresponding to the object to be monitored from a database;
the processing module is used for sending the monitoring data of the object to be monitored to a user side used by a user if the monitoring data are determined not to be aggregated according to the query condition; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side.
In a third aspect, the present application provides an electronic device comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory, causing the at least one processor to perform the distributed monitoring method as described above in the first aspect and possible designs of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium, in which computer executable instructions are stored, which when executed by a processor, implement the distributed monitoring method according to the first aspect and the possible designs of the first aspect.
The distributed monitoring method, the device, the equipment and the computer readable storage medium are applied to the distributed monitoring device, the distributed monitoring device is configured in a server or electronic equipment, a monitoring alarm system Prometaus is deployed for each cloud resource pool, and the Prometaus is used for acquiring and storing monitoring data of an instance based on registration information of the instance in the cloud resource pool; the distributed monitoring device is used for monitoring the examples in each cloud resource pool through each Prometaus; firstly, responding to query operation of a user, and determining the identification of an object to be monitored and the corresponding Prometaus according to query conditions input by the user in the query operation through a query module in the distributed monitoring device; the object to be monitored comprises at least one instance; further, monitoring data of the object to be monitored stored in Prometaus corresponding to the object to be monitored is obtained from a database; further, according to the query condition, if it is determined that the monitoring data do not need to be aggregated, the monitoring data of the object to be monitored are sent to a user side used by a user; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side. The application can support to inquire the monitoring data of one or more cloud resource pools and one or more instances by providing a unified inquiry entrance, realize to provide monitoring support for multiple scenes (refer to multiple instances), multiple environments (refer to multiple cloud resource pools), multiple clusters (refer to example clusters of hardware or software) and the like, and based on Prometaus deployed in each cloud resource pool, can realize that when a user inquires, specific Prometaus is determined based on inquiry conditions, corresponding monitoring data is acquired from the Prometaus, and meanwhile, whether the monitoring data needs to be aggregated is determined by analyzing the inquiry conditions, if so, the aggregated monitoring data is sent to a user side, and if not, the monitoring data is directly sent to the user side. Therefore, unified monitoring management of the examples is realized, personalized requirements of users are further met, and bandwidth waste caused by centralized query based on a plurality of examples is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a schematic view of a scenario of a distributed monitoring method according to an embodiment of the present application;
fig. 2 is a schematic view of a scenario of a distributed monitoring method according to another embodiment of the present application;
fig. 3 is a schematic flow chart of a distributed monitoring method according to an embodiment of the present application;
FIG. 4 is a flowchart of a distributed monitoring method according to another embodiment of the present application;
FIG. 5 is a schematic diagram of a distributed monitoring method according to another embodiment of the present application;
fig. 6 is a flowchart of a distributed monitoring method according to another embodiment of the present application
Fig. 7 is a schematic structural diagram of a distributed monitoring device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The cloud resource monitoring system is one of the main components of the cloud platform, the cloud resource monitoring system can provide resource use condition monitoring for cloud resource suppliers and users, the users can check and check the use condition of the purchased resources after logging in, and the users can plan the distribution condition of the purchased resources according to the use condition of the resources, so that the cost of purchasing the resources is reduced, and the use rate of the resources is improved.
At present, a method for distributed monitoring generally manages and analyzes a single cluster environment, cannot uniformly manage various examples, and causes bandwidth loss for centralized query of distributed monitoring data. Therefore, the existing distributed monitoring method cannot perform unified monitoring management on various examples, and meanwhile, the problem of bandwidth loss caused by centralized query of distributed monitoring data exists.
In order to solve the problems, the technical idea of the application is as follows: by providing a unified query portal, monitoring data of one or more cloud resource pools, one or more instances can be queried, and monitoring support is provided for multiple scenarios (referring to multiple instances), multiple environments (referring to multiple cloud resource pools), multiple clusters (referring to an instance cluster of hardware or software), and the like. Based on Prometaus deployed in each cloud resource pool, when a user inquires, specific Prometaus is determined based on inquiry conditions, corresponding monitoring data are obtained from the Prometaus, meanwhile, whether the monitoring data need to be aggregated is determined by analyzing the inquiry conditions, if so, the aggregated monitoring data are sent to the user side, and if not, the monitoring data are directly sent to the user side. The unified monitoring management of the examples can be realized, the personalized requirements of the user are further met, and meanwhile, the bandwidth waste caused by centralized query based on a plurality of examples is reduced.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of a distributed monitoring method according to an embodiment of the present application. The scenario may include a client, promethaus, and a distributed monitoring device; the distributed monitoring device is configured in a server or electronic equipment, a monitoring alarm system Prometaheus is deployed for each cloud resource pool, and the Prometaheus are used for acquiring and storing monitoring data of an instance based on registration information of the instance in the cloud resource pool. The distributed monitoring device comprises a gateway, a registration/configuration module, a query module and the like, wherein the gateway is in communication connection with the registration/configuration module, the registration/configuration module is in communication connection with Prometaus, and the query module is in communication with the Prometaus. The distributed monitoring device is used for monitoring the instances in each cloud resource pool through each Prometaus, and displaying monitoring data of a monitoring object through a monitoring interface, wherein the monitoring object can be all instances or some/some instances of one or more cloud resource pools, and the specific limitation is not limited herein.
The user side is used for providing a monitoring inlet, and the user provides promsql for inquiring related data to perform monitoring alarm or display according to the related data; the unified query entry is used for analyzing the promsql statement and carrying out fragment query or aggregation query; acquiring Prometaus data sources, namely extracting monitoring information (such as monitoring data) of related examples through relevant information of a registry by Prometaus; a registration and configuration center (including a registration center and a configuration center, herein referred to as a registration/configuration module, for providing registration, deregistration and configuration functions), for providing registration and deregistration capabilities for the instance through the client, and for managing the instance; for a specific to-be-monitored example, the method can call the registration, the interface registration and the cancellation of the configuration center, can also configure the automatic discovery rule, and is registered to the registration center through webhook.
Referring to fig. 2, a unified query interface is provided through a query module, a user can input query conditions on an operation interface, and search specific promethaus based on the query conditions by calling the unified query interface, wherein the promethaus stores monitoring data of an instance corresponding to the query conditions; the promethaus may obtain an interface path of data of each instance from a registration and configuration center (herein referred to as a registration/configuration module), and periodically obtain corresponding monitoring information through the interface path.
The prior art does not perform overall analysis on multiple cluster complex scenes such as vm, bare metal, containers and the like, but performs analysis from a single cluster environment such as k8s clusters, and does not uniformly manage various examples, but the container state and the traditional examples are uniformly registered in a uniform registry and uniformly managed in the application.
In addition, in the prior art, bandwidth consumption is caused for management inquiry of distributed monitoring data or data cannot be aggregated by fragment inquiry through centralized storage inquiry, and the application can aggregate the data without centralized storage of the data so as to reduce the bandwidth waste. Moreover, at present, the scheme of distributed monitoring does not relate to management and monitoring of single instance management and instance lifecycle, and does not relate to related processes of active registration and de-registration of the instance.
Therefore, the application adopts a Prometaheus-based distributed monitoring mode, and can provide monitoring support for multiple scenes, multiple environments, multiple clusters and the like. Providing a Prometaus unified registration and configuration center, acquiring instance information from the center by Prometaus of each environment to scrape data, and visually managing related monitoring instances by platform operation staff through pages; registering and logging off related examples through webhook of each example; the prometql unified query entry performs unified processing on the Prometheus data of each environment, supports the query of the prometql mode, can rapidly query the Prometheus data (including the monitoring data of the Prometheus) of the corresponding environment through the slicing key, and can also aggregate and query the monitoring data of each environment; an active registration client interface is provided, through which relevant instances can be automatically discovered and actively registered with a registry through corresponding rule configurations.
The technical scheme of the application is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Referring to fig. 3, fig. 3 is a schematic flow chart of a distributed monitoring method according to an embodiment of the present application. The method is applied to a distributed monitoring device, wherein the distributed monitoring device is configured in a server or electronic equipment, a monitoring alarm system Prometaheus is deployed for each cloud resource pool, and the Prometaheus are used for acquiring and storing monitoring data of an instance based on registration information of the instance in the cloud resource pool; the distributed monitoring device is used for monitoring the examples in each cloud resource pool through each Prometaus; comprising the following steps:
S301, responding to query operation of a user, and determining identification of an object to be monitored and corresponding Prometaus through a query module in the distributed monitoring device according to query conditions input by the user in the query operation; the object to be monitored includes at least one instance.
In this embodiment, the data query layer (herein referred to as a query module) unifies the query interface through promsql (i.e., a query language built in by promethaus). When a user enters a query condition (such as kafka of a certain cluster in a certain area, and sometimes the IP of the kafka can be added), the identification of the object to be monitored is determined by analyzing the incoming promsql so as to locate to which kafka instance specifically. Each Prometaus is recorded in a database when being created, so that subsequent queries can find the Prometaus, and the corresponding relation between the instance and the Prometaus is stored in the database. Thus, it is possible to locate to which Prometaus in particular, based on the identification of the object to be monitored.
S302, acquiring monitoring data of the object to be monitored, which are stored in Prometaus corresponding to the object to be monitored, from a database.
In this embodiment, the monitoring data of each instance is stored in the corresponding promethaus, for example, the default storage is stored for 15 days. Wherein, the data format of the monitoring data, such as a strip, of each monitoring instance is the acquisition item: cpu tag: { kafka x cluster } acquisition time: 0711-14:26:11 values: 90%.
S303, according to the query condition, if the fact that the monitoring data do not need to be aggregated is determined, the monitoring data of the object to be monitored are sent to a user side used by a user; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side.
In this embodiment, by analyzing the query condition, it is determined whether to aggregate the monitoring data, if so, the aggregated monitoring data is sent to the user terminal, and if not, the monitoring data is directly sent to the user terminal.
According to the distributed monitoring method provided by the embodiment, firstly, the query operation of a user is responded, and the identification of an object to be monitored and the corresponding Prometaus are determined according to the query condition input by the user in the query operation through a query module in the distributed monitoring device; the object to be monitored comprises at least one instance; further, monitoring data of the object to be monitored stored in Prometaus corresponding to the object to be monitored is obtained from a database; further, according to the query condition, if it is determined that the monitoring data do not need to be aggregated, the monitoring data of the object to be monitored are sent to a user side used by a user; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side. The application can support to inquire the monitoring data of one or more cloud resource pools and one or more instances by providing a unified inquiry entrance, realize to provide monitoring support for multiple scenes (refer to multiple instances), multiple environments (refer to multiple cloud resource pools), multiple clusters (refer to example clusters of hardware or software) and the like, and based on Prometaus deployed in each cloud resource pool, can realize that when a user inquires, specific Prometaus is determined based on inquiry conditions, corresponding monitoring data is acquired from the Prometaus, and meanwhile, whether the monitoring data needs to be aggregated is determined by analyzing the inquiry conditions, if so, the aggregated monitoring data is sent to a user side, and if not, the monitoring data is directly sent to the user side. Therefore, unified monitoring management of the examples is realized, personalized requirements of users are further met, and bandwidth waste caused by centralized query based on a plurality of examples is reduced.
In one possible design, the database stores the correspondence between each cloud resource pool and the corresponding promethaus; the determining the identification of the object to be monitored and the corresponding Prometaus according to the query condition input by the user in the query operation comprises:
determining a label of the object to be monitored according to the query condition input by the user in the query operation;
according to the label of the object to be monitored, determining a cloud resource pool to which the object to be monitored belongs;
and determining Prometaus for acquiring monitoring data of the object to be monitored according to the cloud resource pool to which the object to be monitored belongs through the corresponding relation.
In this embodiment, each environment (here, cloud resource pool) is numbered, and related promethaus resource information is bound and stored in a database. Referring to fig. 4, a specific environment ID (here, a tag of an object to be monitored) is first analyzed according to promsql, and since each cloud resource pool is deployed with one promethaus, each instance forms a cloud resource pool, when an instance is created, the corresponding relationship between the instance (or the cloud resource pool) and the promethaus can be stored in a registration and configuration center, and then the corresponding promethaus related information can be queried through the environment ID, so that specific monitoring information can be obtained.
In one possible design, the method further comprises:
determining that the query condition is a query of a single cloud resource pool or a query of a plurality of cloud resource pools;
if the query is a query of a single cloud resource pool, determining that the query operation is a fragmented query, wherein the fragmented query is used for indicating that the queried monitoring data is not required to be aggregated;
if the query is a query of a plurality of cloud resource pools, determining that the query operation is an aggregation query, wherein the aggregation query is used for indicating that the queried monitoring data needs to be aggregated.
In this embodiment, as shown in fig. 4, by analyzing a specific environment ID, it is determined whether the environment is a plurality of environment queries, and if the environment is not a plurality of environment queries (herein, a query of a single cloud resource pool, i.e., a fragmented query), the requirement is forwarded to a specific promethaus for querying the monitoring data and returning to the user. In the case of multiple environmental queries (herein referred to as multiple cloud resource pool queries), the query is used to query all relevant Prometaus and aggregate the data and return it to the user. Specifically, the method includes the steps of locating the searched Prometheus according to a query condition, then transmitting the query condition to the Prometheus, storing monitoring data by the Prometheus, judging whether aggregation is needed according to the returned condition after the query condition is finished, giving the user after the aggregation is needed, and giving the user after the aggregation is not needed.
Illustratively, determining whether the query condition contains a region name; if the region name is contained, determining that the query operation is a fragmented query, wherein the fragmented query is used for indicating that the monitoring data does not need to be aggregated; and if the region name is not contained, determining that the query operation is an aggregation query, wherein the aggregation query is used for indicating that the monitoring data needs to be aggregated. For example, by determining whether there is a shard key (here, whether there is a zone name in the query condition), if there is no zone name, the query is set as an aggregate query by default.
Because the aggregation in the prior art is to collect data uniformly, prometaus data of each cloud pool (here, cloud resource pool) needs to be collected uniformly, data needs to be transmitted, and a large amount of bandwidth needs to be consumed in transmission.
In one possible design, the distributed monitoring apparatus includes a registration/configuration module; the method further comprises, prior to responding to a query operation by a user:
Responding to the operation of creating the instance by the user, and registering the information of the instance with a registration/configuration module by calling a gateway;
storing registration information of the instance, the registration information including an IP address of the instance, a tag of the instance, and a path for providing a monitoring interface for a corresponding promethaus; wherein the Prometheus is used for periodically acquiring monitoring data of the instance through the path.
In one possible design, the method further comprises:
and when determining that the monitoring task of the instance is off line, canceling the registration information of the instance through the registration/configuration module.
In this embodiment, as shown in fig. 5, the conventional instance and the container instance are specific monitoring objects, when the service is started, gateway (i.e. a registry gateway used for adding, deleting, checking and authenticating functions of the registry) needs to be called to register instance information with the registry, and when the service is off-line, an interface needs to be called to perform cancellation.
In one possible design, the registering the information of the instance with the registration/configuration module by calling a gateway in response to the operation of creating the instance by the user includes:
Responding to the operation of creating an instance by a user, and if the instance is a container instance, sending information of the instance to the gateway through a registration engine; processing the information of the instance through the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information;
triggering and calling the gateway to register through an automatic discovery rule configured at the gateway if the instance is a traditional instance, wherein the traditional instance is used for representing the instance running in a virtual machine; and processing the information of the instance by calling the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information.
In this embodiment, as shown in fig. 5, when a conventional instance (e.g., conventional instance 1 in the x-cluster and conventional instance 2 in the y-cluster) starts up the paas application, the gateway is actively invoked to register, and the paas application in a container state (herein, a container instance, such as container instance 1 in the x-cluster and container instance 2 in the y-cluster) passes through the registration engine crd (i.e., a custom k8s resource type for automatic registration of the container paas product). Based on the configuration, the registry is responsible for storing relevant information of the registry instance and providing function configuration functions. Prometaus (i.e., an open source monitoring alarm system based on a time series database) data sources of each environment scrape monitoring information of self-correlated examples by querying information of a registry, and each Prometaus is recorded in the database when being created so that the Prometaus can be found by subsequent queries. The data query layer analyzes the incoming promsql through a promsql unified query interface, performs fragment query (queries on promethaus of a specific environment, such as querying promethaus corresponding to an x cluster (i.e. promethaus-x) or promethaus corresponding to a y cluster (i.e. promethaus-y)) or aggregate query (querying the promethaus of related environments, such as promethaus-x and promethaus-y; and feeds back the query result to a user after data aggregation), so that the data does not need to be collected uniformly, but only the promethal is forwarded downwards, and the data is aggregated uniformly, thereby saving bandwidth.
Specifically, with reference to fig. 6, the process of obtaining the monitoring data is as follows: and analyzing and acquiring the fragment key according to the promsql, acquiring corresponding Prometheus information such as a Prometheus label, inquiring monitoring information (here, monitoring data) from the corresponding Prometheus, and if the Prometheus is a plurality of Prometheus, aggregating the monitoring data of each Prometheus to obtain aggregated data, and returning the aggregated data to a user side. If Prometheus is single, specific monitoring data of Prometheus is returned to the user side.
Specifically, according to promsql, an environment ID (here, a tag of an object to be monitored) is analyzed, since each cloud resource pool is deployed with one promethaus, each instance forms a cloud resource pool, when an instance is created, the corresponding relationship between the instance (or cloud resource pool) and the promethaus can be stored in a registration and configuration center, and then the corresponding promethaus related information can be queried through the environment ID, so that specific monitoring information can be obtained through analysis
When the registration and configuration center is used as a registration center, the interface registration can be performed only through authentication; crd also requires configuration authentication; when the registration information of the configuration center and the global configuration of Prometaus are managed, the UI performs authentication management based on Role-based access control (roller-Based Access Control, rbac), and some of the UI can only be checked by groups and some of the UI can be managed.
In one possible design, the method further comprises:
and responding to input operation of a user in an operable interface corresponding to the registration/configuration module, and configuring the registered instance.
In this embodiment, as shown in fig. 5, the monitoring instance management may manage and configure registered instances by means of pages, such as modifying tag information, manually adding and deleting instance information, and the like.
When instance information is registered, how to find specific promethaus information according to promsql becomes critical. To solve the above problem, the present embodiment numbers each environment by performing related development on the unified query interface, and binds and stores related promethaus resource information to the database, as shown in fig. 4.
The application adopts a distributed monitoring mode based on Prometheus, and can provide monitoring support for multiple scenes, multiple environments, multiple clusters and the like. Providing a Prometaus unified registration and configuration center, acquiring instance information from the center by Prometaus of each environment to scrape data, and visually managing related monitoring instances by platform operation staff through pages; registering and logging off related examples through webhook of each example; the prometql unified query entry performs unified processing on the Prometheus data of each environment, supports the query of the prometql mode, can rapidly query the Prometheus data (including the monitoring data of the Prometheus) of the corresponding environment through the slicing key, and can also aggregate and query the monitoring data of each environment; an active registration client interface is provided, through which relevant instances can be automatically discovered and actively registered with a registry through corresponding rule configurations.
Therefore, unified monitoring management of the examples is realized, personalized requirements of users are further met, and bandwidth waste caused by centralized query based on a plurality of examples is reduced.
In order to implement the distributed monitoring method, the present embodiment provides a distributed monitoring device, where the distributed monitoring device is configured in a server or an electronic device, and a monitoring alarm system promethaus is deployed for each cloud resource pool, where the promethaus is used to obtain and store monitoring data of an instance in the cloud resource pool based on registration information of the instance; the distributed monitoring device is used for monitoring the examples in each cloud resource pool through each Prometaus. Referring to fig. 7, fig. 7 is a schematic structural diagram of a distributed monitoring device according to an embodiment of the present application; the distributed monitoring device includes:
the query module 701 is configured to respond to a query operation of a user, and determine, by using a query module in the distributed monitoring device, an identifier of an object to be monitored and a corresponding promethaus according to a query condition input by the user in the query operation; the object to be monitored comprises at least one instance;
the query module 701 is further configured to obtain monitoring data of the object to be monitored stored in promethaus corresponding to the object to be monitored from a database;
The processing module 702 is configured to send the monitoring data of the object to be monitored to a user end used by a user if it is determined that the monitoring data does not need to be aggregated according to the query condition; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side.
In this embodiment, a query module 701 and a processing module 702 are configured to respond to a query operation of a user, and determine, by using the query module in the distributed monitoring device, an identifier of an object to be monitored and a corresponding promethaus according to a query condition input by the user in the query operation; the object to be monitored comprises at least one instance; acquiring monitoring data of the object to be monitored stored in Prometaus corresponding to the object to be monitored from a database; according to the query conditions, if the fact that the monitoring data do not need to be aggregated is determined, the monitoring data of the object to be monitored are sent to a user side used by a user; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side. The application can support to inquire the monitoring data of one or more cloud resource pools and one or more instances by providing a unified inquiry entrance, realize to provide monitoring support for multiple scenes (refer to multiple instances), multiple environments (refer to multiple cloud resource pools), multiple clusters (refer to example clusters of hardware or software) and the like, and based on Prometaus deployed in each cloud resource pool, can realize that when a user inquires, specific Prometaus is determined based on inquiry conditions, corresponding monitoring data is acquired from the Prometaus, and meanwhile, whether the monitoring data needs to be aggregated is determined by analyzing the inquiry conditions, if so, the aggregated monitoring data is sent to a user side, and if not, the monitoring data is directly sent to the user side. Therefore, unified monitoring management of the examples is realized, personalized requirements of users are further met, and bandwidth waste caused by centralized query based on a plurality of examples is reduced.
The device provided in this embodiment may be used to implement the technical solution of the foregoing method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.
In one possible design, the database stores the correspondence between each cloud resource pool and the corresponding promethaus; the query module is specifically configured to:
determining a label of the object to be monitored according to the query condition input by the user in the query operation;
according to the label of the object to be monitored, determining a cloud resource pool to which the object to be monitored belongs;
and determining Prometaus for acquiring monitoring data of the object to be monitored according to the cloud resource pool to which the object to be monitored belongs through the corresponding relation.
In one possible design, the query module is further configured to:
determining that the query condition is a query of a single cloud resource pool or a query of a plurality of cloud resource pools;
if the query is a query of a single cloud resource pool, determining that the query operation is a fragmented query, wherein the fragmented query is used for indicating that the queried monitoring data is not required to be aggregated;
if the query is a query of a plurality of cloud resource pools, determining that the query operation is an aggregation query, wherein the aggregation query is used for indicating that the queried monitoring data needs to be aggregated.
In one possible design, the distributed monitoring apparatus includes a registration/configuration module; the processing module is further configured to, prior to responding to a query operation by a user:
responding to the operation of creating the instance by the user, and registering the information of the instance with a registration/configuration module by calling a gateway;
storing registration information of the instance, the registration information including an IP address of the instance, a tag of the instance, and a path for providing a monitoring interface for a corresponding promethaus; wherein the Prometheus is used for periodically acquiring monitoring data of the instance through the path.
In one possible design, the processing module is further specifically configured to:
responding to the operation of creating an instance by a user, and if the instance is a container instance, sending information of the instance to the gateway through a registration engine; processing the information of the instance through the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information;
triggering and calling the gateway to register through an automatic discovery rule configured at the gateway if the instance is a traditional instance, wherein the traditional instance is used for representing the instance running in a virtual machine; and processing the information of the instance by calling the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information.
In one possible design, the processing module is further configured to:
and responding to input operation of a user in an operable interface corresponding to the registration/configuration module, and configuring the registered instance.
In one possible design, the processing module is further configured to:
and when determining that the monitoring task of the instance is off line, canceling the registration information of the instance through the registration/configuration module.
In order to implement the distributed monitoring method, the embodiment provides electronic equipment. Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic apparatus 80 of the present embodiment includes: at least one processor 801 and memory 802; wherein, the memory 802 is used for storing computer execution instructions; at least one processor 801 for executing computer-executable instructions stored in memory to perform the steps described in the embodiments above. Reference may be made in particular to the relevant description of the embodiments of the method described above.
The embodiment of the application also provides a computer readable storage medium, wherein computer execution instructions are stored in the computer readable storage medium, and when a processor executes the computer execution instructions, the distributed monitoring method is realized.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms. In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The units formed by the modules can be realized in a form of hardware or a form of hardware and software functional units.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional module is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the application. It should be understood that the above processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc. The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus. The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (10)

1. The distributed monitoring method is characterized by being applied to a distributed monitoring device, wherein the distributed monitoring device is configured in a server or electronic equipment, a monitoring alarm system Prometaheus is deployed for each cloud resource pool, and the Prometaheus are used for acquiring and storing monitoring data of an instance based on registration information of the instance in the cloud resource pool; the distributed monitoring device is used for monitoring the examples in each cloud resource pool through each Prometaus; the method comprises the following steps:
Responding to a query operation of a user, and determining an identification of an object to be monitored and a corresponding Prometaheus according to a query condition input by the user in the query operation through a query module in the distributed monitoring device; the object to be monitored comprises at least one instance;
acquiring monitoring data of the object to be monitored stored in Prometaus corresponding to the object to be monitored from a database;
according to the query conditions, if the fact that the monitoring data do not need to be aggregated is determined, the monitoring data of the object to be monitored are sent to a user side used by a user; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side.
2. The method according to claim 1, wherein the database stores the correspondence between each cloud resource pool and the corresponding promethaus; the determining the identification of the object to be monitored and the corresponding Prometaus according to the query condition input by the user in the query operation comprises:
determining a label of the object to be monitored according to the query condition input by the user in the query operation;
According to the label of the object to be monitored, determining a cloud resource pool to which the object to be monitored belongs;
and determining Prometaus for acquiring monitoring data of the object to be monitored according to the cloud resource pool to which the object to be monitored belongs through the corresponding relation.
3. The method according to claim 1, wherein the method further comprises:
determining that the query condition is a query of a single cloud resource pool or a query of a plurality of cloud resource pools;
if the query is a query of a single cloud resource pool, determining that the query operation is a fragmented query, wherein the fragmented query is used for indicating that the queried monitoring data is not required to be aggregated;
if the query is a query of a plurality of cloud resource pools, determining that the query operation is an aggregation query, wherein the aggregation query is used for indicating that the queried monitoring data needs to be aggregated.
4. A method according to any one of claims 1-3, wherein the distributed monitoring device comprises a registration/configuration module; the method further comprises, prior to responding to a query operation by a user:
responding to the operation of creating the instance by the user, and registering the information of the instance with a registration/configuration module by calling a gateway;
Storing registration information of the instance, the registration information including an IP address of the instance, a tag of the instance, and a path for providing a monitoring interface for a corresponding promethaus; wherein the Prometheus is used for periodically acquiring monitoring data of the instance through the path.
5. The method of claim 4, wherein the registering information of the instance with the registration/configuration module by invoking a gateway in response to the operation of the user to create the instance comprises:
responding to the operation of creating an instance by a user, and if the instance is a container instance, sending information of the instance to the gateway through a registration engine; processing the information of the instance through the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information;
triggering and calling the gateway to register through an automatic discovery rule configured at the gateway if the instance is a traditional instance, wherein the traditional instance is used for representing the instance running in a virtual machine; and processing the information of the instance by calling the gateway, and sending the processed information to a registration/configuration module for registration to generate registration information.
6. The method according to claim 4, wherein the method further comprises:
and responding to input operation of a user in an operable interface corresponding to the registration/configuration module, and configuring the registered instance.
7. The method according to claim 4, wherein the method further comprises:
and when determining that the monitoring task of the instance is off line, canceling the registration information of the instance through the registration/configuration module.
8. The distributed monitoring device is characterized by being configured in a server or electronic equipment, wherein a monitoring alarm system Prometaheus is deployed for each cloud resource pool, and the Prometaheus are used for acquiring and storing monitoring data of an instance based on registration information of the instance in the cloud resource pool; the distributed monitoring device is used for monitoring the examples in each cloud resource pool through each Prometaus; the device comprises:
the query module is used for responding to the query operation of the user, and determining the identification of the object to be monitored and the corresponding Prometaus according to the query condition input by the user in the query operation through the query module in the distributed monitoring device; the object to be monitored comprises at least one instance;
The query module is further configured to obtain monitoring data of the object to be monitored stored in promethaus corresponding to the object to be monitored from a database;
the processing module is used for sending the monitoring data of the object to be monitored to a user side used by a user if the monitoring data are determined not to be aggregated according to the query condition; and if the monitoring data is determined to be needed to be aggregated, aggregating the monitoring data of the object to be monitored, and sending the aggregated monitoring data to the user side.
9. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing computer-executable instructions stored in the memory cause the at least one processor to perform the distributed monitoring method of any of claims 1-7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the distributed monitoring method of any of claims 1-7.
CN202310932049.XA 2023-07-26 2023-07-26 Distributed monitoring method, device, equipment and computer readable storage medium Pending CN116955079A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310932049.XA CN116955079A (en) 2023-07-26 2023-07-26 Distributed monitoring method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310932049.XA CN116955079A (en) 2023-07-26 2023-07-26 Distributed monitoring method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116955079A true CN116955079A (en) 2023-10-27

Family

ID=88454370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310932049.XA Pending CN116955079A (en) 2023-07-26 2023-07-26 Distributed monitoring method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116955079A (en)

Similar Documents

Publication Publication Date Title
CN108737325B (en) Multi-tenant data isolation method, device and system
CN111913818B (en) Method for determining dependency relationship between services and related device
CN112925647B (en) Cloud edge cooperative system, cluster resource control method and device
CN111198976B (en) Cloud asset association analysis system, method, electronic equipment and medium
CN113760641A (en) Service monitoring method, device, computer system and computer readable storage medium
CN110278101B (en) Resource management method and equipment
CN103370695A (en) Database update notification method
CN112187509A (en) Multi-architecture cloud platform execution log management method, system, terminal and storage medium
CN110018932B (en) Method and device for monitoring container magnetic disk
EP3011456B1 (en) Sorted event monitoring by context partition
CN108696559B (en) Stream processing method and device
CN111651235A (en) Virtual machine set task management method and device
CN116151631A (en) Service decision processing system, service decision processing method and device
CN109995571B (en) Method and device for matching server configuration and VNF application
CN114090268B (en) Container management method and container management system
CN116955079A (en) Distributed monitoring method, device, equipment and computer readable storage medium
CN115514618A (en) Alarm event processing method and device, electronic equipment and medium
CN113779021B (en) Data processing method, device, computer system and readable storage medium
WO2018200167A1 (en) Managing asynchronous analytics operation based on communication exchange
CN115202973A (en) Application running state determining method and device, electronic equipment and medium
CN112596974A (en) Full link monitoring method, device, equipment and storage medium
CN114116908A (en) Data management method and device and electronic equipment
CN113760315A (en) Method and device for testing system
CN111782428A (en) Data calling system and method
CN115757041B (en) Method for collecting dynamically configurable multi-cluster logs and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination