CN116010201A - Monitoring method, management equipment and computing system - Google Patents
Monitoring method, management equipment and computing system Download PDFInfo
- Publication number
- CN116010201A CN116010201A CN202211697581.XA CN202211697581A CN116010201A CN 116010201 A CN116010201 A CN 116010201A CN 202211697581 A CN202211697581 A CN 202211697581A CN 116010201 A CN116010201 A CN 116010201A
- Authority
- CN
- China
- Prior art keywords
- target
- monitoring
- interface
- computing device
- program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- User Interface Of Digital Computer (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
Description
技术领域technical field
本申请涉及云计算技术领域,尤其涉及一种监控方法、管理设备和计算系统。The present application relates to the technical field of cloud computing, and in particular to a monitoring method, a management device and a computing system.
背景技术Background technique
业务系统的稳定运行是为用户提供业务服务的重要前提。当前,通过监控业务系统的各项指标,实现对业务系统的维护与管理。在云原生场景中,通常由多个计算设备构成集群为用户提供业务服务。其中,集群中各个计算设备的资源采用容器技术进行划分,使得运行在不同容器中的进程相对独立。在该种方式下有助于业务的灵活部署,但同时大幅度增加对系统中各项指标的监控难度。The stable operation of the business system is an important prerequisite for providing business services to users. At present, the maintenance and management of the business system is realized by monitoring various indicators of the business system. In a cloud-native scenario, multiple computing devices usually form a cluster to provide users with business services. Among them, the resources of each computing device in the cluster are divided using container technology, so that the processes running in different containers are relatively independent. In this way, it is helpful for the flexible deployment of the business, but at the same time, it greatly increases the difficulty of monitoring various indicators in the system.
发明内容Contents of the invention
本申请实施例提供了一种监控方法、管理设备和计算系统,用于实现监控集群中计算设备的各项指标。The embodiment of the present application provides a monitoring method, a management device and a computing system, which are used to monitor various indicators of computing devices in a cluster.
为达到上述目的,本申请的实施例采用如下技术方案:In order to achieve the above object, the embodiments of the present application adopt the following technical solutions:
第一方面,提供了一种监控方法,应用于管理设备,其中,管理设备用于监控部署云原生业务的计算集群,计算集群包括多个计算设备;该方法包括:显示第一界面;其中,第一界面包含多个粒度的监控对象分别对应的候选对象的选择入口;接收针对第一界面中候选对象的选择入口的第一操作;响应于第一操作,在候选对象中确定至少一个目标监控对象;其中,目标监控对象包括以下任意一种或多种:计算集群中的目标计算设备、目标计算设备中的目标程序以及目标程序中的目标变量;向至少一个目标监控对象所对应的计算设备发送监控请求;监控请求包括目标监控对象的标识和被监控的目标参数;其中,目标参数用于指示目标监控对象与云原生业务相关的参数;接收目标监控对象所对应的计算设备响应监控请求的目标参数信息。In the first aspect, a monitoring method is provided, which is applied to a management device, wherein the management device is used to monitor a computing cluster for deploying cloud-native services, and the computing cluster includes multiple computing devices; the method includes: displaying a first interface; wherein, The first interface includes selection entries of candidate objects corresponding to monitoring objects of multiple granularities; receiving a first operation for the selection entry of candidate objects in the first interface; in response to the first operation, determining at least one target monitoring among the candidate objects object; wherein, the target monitoring object includes any one or more of the following: the target computing device in the computing cluster, the target program in the target computing device, and the target variable in the target program; the computing device corresponding to at least one target monitoring object Send a monitoring request; the monitoring request includes the identification of the target monitoring object and the monitored target parameters; wherein, the target parameter is used to indicate the parameters related to the target monitoring object and the cloud native business; receiving the computing device corresponding to the target monitoring object responds to the monitoring request Target parameter information.
由于当前集群资源涉及到多个计算设备,并且基于容器技术将集群资源增加更细粒度的划分方式,使得针对集群资源中各项指标的监控成为难题,当前通常针对单个计算设备对资源进行监控,无法灵活更改监控的指标项以及对整体资源基于不同粒度进行快速定位的监控,从而使得当前监控效率较低。对此,采用本申请实施例提供的上述方法,管理设备可以实现对集群中各项指标的监控,其中,用户可以针对集群资源采用不同监控粒度,获取相应的数据信息,提升用户针对整体集群整体资源的监控效果,进而提升云原生业务运行的稳定性。Since the current cluster resources involve multiple computing devices, and based on container technology, the cluster resources are divided into finer-grained methods, which makes it difficult to monitor various indicators in the cluster resources. Currently, resources are usually monitored for a single computing device. It is impossible to flexibly change the monitored indicator items and monitor the overall resources based on different granularities for rapid positioning, which makes the current monitoring efficiency low. In this regard, using the above method provided by the embodiment of the present application, the management device can realize the monitoring of various indicators in the cluster. Among them, the user can use different monitoring granularities for the cluster resources to obtain corresponding data information, and improve the user's overall monitoring of the overall cluster. The monitoring effect of resources can improve the stability of cloud-native business operation.
在一种可能的实现方式中,多个粒度的监控对象包括:计算设备程序和变量;其中,程序包括以下至少一项:pod、容器和进程;粒度从大到小依次为计算设备、pod、容器、进程和变量。In a possible implementation, the monitoring objects of multiple granularities include: computing device programs and variables; wherein, the programs include at least one of the following: pods, containers, and processes; Containers, processes and variables.
该种可能的实现方式,提供了管理设备的可监控的粒度级别,有助于用户通过管理设备获取不同粒度的监控对象的指标信息,提升监控效果。This possible implementation method provides a monitorable level of granularity of the management device, which helps users obtain indicator information of monitoring objects of different granularities through the management device, and improves the monitoring effect.
在一种可能的实现方式中,在显示第一界面之前,该方法还包括:存储多个粒度的监控对象之间的关联关系;响应于第一操作,在候选对象中确定至少一个目标监控对象,包括:响应于第一操作,根据关联关系按照粒度从大到小在候选对象中确定至少一个目标监控对象。In a possible implementation manner, before displaying the first interface, the method further includes: storing associations between monitoring objects of multiple granularities; and in response to the first operation, determining at least one target monitoring object among candidate objects , including: in response to the first operation, determining at least one target monitoring object among the candidate objects according to the association relationship in descending granularity.
该种可能的实现方式,提供了一种用于基于多粒度的监控对象选择目标监控对象的实现方式,有助于用户快速定位目标监控对象,提升确定监控对象的效率。This possible implementation method provides an implementation method for selecting a target monitoring object based on multi-granularity monitoring objects, which helps the user quickly locate the target monitoring object and improves the efficiency of determining the monitoring object.
在一种可能的实现方式中,在向至少一个目标监控对象所对应的计算设备发送监控请求之前,该方法还包括:显示第二界面,第二界面包括运行参数的操作选项;接收针对第二界面中运行参数的操作选项的第二操作;响应于第二操作,在运行参数中确定目标参数。In a possible implementation manner, before sending the monitoring request to the computing device corresponding to the at least one target monitoring object, the method further includes: displaying a second interface, where the second interface includes operation options for operating parameters; A second operation of the operation option of the operating parameters in the interface; in response to the second operation, determining the target parameter among the operating parameters.
该种可能的实现方式,通过用户针对第二界面的操作确定目标参数,提升方案可实施性。In this possible implementation manner, the implementability of the solution is improved through the user determining the target parameter for the operation on the second interface.
在一种可能的实现方式中,第二界面中还包括修改操作选项,该方法还包括:接收针对第二界面中运行参数的修改操作选项的第三操作;响应于第三操作,向目标计算设备发送修改请求,修改请求包括修改后目标监控对象运行云原生业务的运行参数。In a possible implementation, the second interface further includes modification operation options, and the method further includes: receiving a third operation of modifying operation options for operating parameters in the second interface; in response to the third operation, calculating the The device sends a modification request, and the modification request includes the modified operating parameters of the target monitoring object running cloud-native services.
该种可能的实现方式,有助于用户针对运行参数进行修改,进而灵活调整系统针对监控对象进行监控的指标项,提升监控效率。This possible implementation method helps the user to modify the operating parameters, and then flexibly adjust the index items that the system monitors for the monitored objects, and improve the monitoring efficiency.
在一种可能的实现方式中,该方法还包括:将目标参数信息转换为可视化程序可识别的信息,其中,可视化程序用于将可视化程序可识别的信息转换为可视化图像;显示第三界面,第三界面包括可视化图像。In a possible implementation, the method further includes: converting the target parameter information into information recognizable by a visualization program, wherein the visualization program is used to convert the information recognizable by the visualization program into a visualized image; displaying a third interface, The third interface includes visual images.
该种可能的实现方式,有助于为用户提供更直观的目标参数信息,从而提升用户体验感。This possible implementation manner helps to provide users with more intuitive target parameter information, thereby improving user experience.
在一种可能的实现方式中,将目标参数信息转换为可视化程序可识别的信息,包括:将目标参数信息映射为可读取的标签与数值;根据标签与数值得到可视化程序可识别的信息。In a possible implementation manner, converting target parameter information into information identifiable by a visualization program includes: mapping target parameter information into readable labels and values; obtaining information identifiable by the visualization program according to the labels and values.
该种可能的实现方式,提供了目标参数信息转换为可视化图像时的处理过程,提升方案可实施性。This possible implementation method provides a processing process when the target parameter information is converted into a visualized image, and improves the implementability of the scheme.
在一种可能的实现方式中,该方法还包括:存储目标参数信息。In a possible implementation manner, the method further includes: storing target parameter information.
该种可能的实现方式,有助于统计历史运行参数,便于数据进一步分析与处理。This possible implementation method is helpful for statistics of historical operating parameters and facilitates further analysis and processing of data.
第二方面,提供了一种管理设备,包括处理器、显示器和通信接口,处理器与通信接口和显示器分别连接;显示器,用于显示第一界面;其中,第一界面包含多个粒度的监控对象分别对应的候选对象的选择入口;通信接口,用于接收针对第一界面中候选对象的选择入口的第一操作;处理器,用于响应于第一操作,在候选对象中确定至少一个目标监控对象;其中,目标监控对象包括以下任意一种或多种:计算集群中的目标计算设备、目标计算设备中的目标程序以及目标程序中的目标变量;通信接口,还用于向至少一个目标监控对象所对应的计算设备发送监控请求;监控请求包括目标监控对象的标识和被监控的目标参数;其中,目标参数用于指示目标监控对象与云原生业务相关的参数;通信接口,还用于接收目标监控对象所对应的计算设备响应监控请求的目标参数信息。In a second aspect, a management device is provided, including a processor, a display, and a communication interface, the processor is connected to the communication interface and the display respectively; the display is used to display the first interface; wherein, the first interface includes multiple granularity monitoring The selection entries of the candidate objects corresponding to the objects respectively; the communication interface is used to receive the first operation for the selection entry of the candidate objects in the first interface; the processor is used to respond to the first operation and determine at least one target among the candidate objects A monitoring object; wherein, the target monitoring object includes any one or more of the following: a target computing device in the computing cluster, a target program in the target computing device, and a target variable in the target program; the communication interface is also used to send at least one target The computing device corresponding to the monitoring object sends a monitoring request; the monitoring request includes the identity of the target monitoring object and the monitored target parameters; wherein, the target parameter is used to indicate the parameters of the target monitoring object related to cloud native services; the communication interface is also used for The computing device corresponding to the target monitoring object responds to the target parameter information of the monitoring request.
第三方面,提供了一种管理设备,包括用于执行第一方面提供的任意一种方法的功能单元,各个功能单元所执行的动作通过硬件实现或通过硬件执行相应的软件实现。例如,管理设备可以包括:处理单元、显示单元和通信单元。显示单元,用于显示第一界面;其中,第一界面包含多个粒度的监控对象分别对应的候选对象的选择入口;通信单元,用于接收针对第一界面中候选对象的选择入口的第一操作;处理单元,用于响应于第一操作,在候选对象中确定至少一个目标监控对象;其中,目标监控对象包括以下任意一种或多种:计算集群中的目标计算设备、目标计算设备中的目标程序以及目标程序中的目标变量;通信单元,还用于向至少一个目标监控对象所对应的计算设备发送监控请求;监控请求包括目标监控对象的标识和被监控的目标参数;其中,目标参数用于指示目标监控对象与云原生业务相关的参数;通信单元,还用于接收目标监控对象所对应的计算设备响应监控请求的目标参数信息。In a third aspect, a management device is provided, including functional units for executing any one of the methods provided in the first aspect, and the actions performed by each functional unit are realized by hardware or by executing corresponding software by hardware. For example, the management device may include: a processing unit, a display unit, and a communication unit. The display unit is used to display the first interface; wherein, the first interface includes selection entries of candidate objects corresponding to monitoring objects of multiple granularities; the communication unit is used to receive the first selection entry for the candidate objects in the first interface. Operation; a processing unit configured to determine at least one target monitoring object among candidate objects in response to the first operation; wherein, the target monitoring object includes any one or more of the following: a target computing device in a computing cluster, a target computing device in The target program of the target program and the target variable in the target program; the communication unit is also used to send a monitoring request to the computing device corresponding to at least one target monitoring object; the monitoring request includes the identification of the target monitoring object and the monitored target parameters; wherein, the target The parameters are used to indicate the parameters of the target monitoring object related to the cloud native service; the communication unit is also used to receive the target parameter information of the computing device corresponding to the target monitoring object responding to the monitoring request.
第四方面,提供了一种管理设备,包括:处理器和存储器。处理器与存储器电连接,存储器用于存储程序指令,处理器用于执行程序指令,以使管理设备执行第一方面提供的任意一种方法。In a fourth aspect, a management device is provided, including: a processor and a memory. The processor is electrically connected to the memory, the memory is used to store program instructions, and the processor is used to execute the program instructions, so that the management device executes any one of the methods provided in the first aspect.
第五方面,提供了一种计算系统,该计算系统包括多个计算设备和上述第二方面至第四方面所记载的管理设备;其中,计算设备用于运行云原生业务;管理设备用于监控计算设备上运行的云原生业务相关信息,如计算设备、pod、容器、进程和变量的资源使用信息等。In the fifth aspect, a computing system is provided, the computing system includes multiple computing devices and the management devices described in the second to fourth aspects above; wherein, the computing devices are used to run cloud-native services; the management devices are used to monitor Information related to cloud-native services running on computing devices, such as resource usage information of computing devices, pods, containers, processes, and variables.
第六方面,提供了一种芯片,该芯片包括:处理器和接口电路;接口电路,用于接收代码指令并传输至处理器;处理器,用于运行代码指令以执行第一方面提供的任意一种方法。According to a sixth aspect, a chip is provided, and the chip includes: a processor and an interface circuit; the interface circuit is used to receive code instructions and transmit them to the processor; a way.
第七方面,提供了一种计算机可读存储介质,包括计算机执行指令,当计算机执行指令在计算机上运行时,使得计算机执行第一方面提供的任意一种方法。In a seventh aspect, a computer-readable storage medium is provided, including computer-executable instructions, and when the computer-executable instructions are run on the computer, the computer is made to execute any one of the methods provided in the first aspect.
第八方面,提供了一种计算机程序产品,包括计算机执行指令,当计算机执行指令在计算机上运行时,使得计算机执行第一方面提供的任意一种方法。In an eighth aspect, a computer program product is provided, including computer-executable instructions, and when the computer-executable instructions are run on a computer, the computer is made to execute any one of the methods provided in the first aspect.
第二方面至第八方面中的任一种实现方式所带来的技术效果可参见第一方面中对应实现方式所带来的技术效果,此处不再赘述。For the technical effects brought about by any one of the implementations from the second aspect to the eighth aspect, refer to the technical effects brought about by the corresponding implementations in the first aspect, which will not be repeated here.
附图说明Description of drawings
图1为一种基于监控容器实现指标监控的系统示意图;Fig. 1 is a schematic diagram of a system for realizing indicator monitoring based on a monitoring container;
图2为一种基于指标观测工具实现指标监控的系统示意图;FIG. 2 is a schematic diagram of a system for realizing indicator monitoring based on an indicator observation tool;
图3为本申请实施例提供的一种监控的场景示意图;FIG. 3 is a schematic diagram of a monitoring scene provided by an embodiment of the present application;
图4为本申请实施例提供的一种通信设备的组成示意图;FIG. 4 is a schematic diagram of the composition of a communication device provided by an embodiment of the present application;
图5为本申请实施例提供的一种监控方法的流程示意图;FIG. 5 is a schematic flow diagram of a monitoring method provided in an embodiment of the present application;
图6为本申请实施例提供的一种用于确定目标监控对象的第一界面的示意图;FIG. 6 is a schematic diagram of a first interface for determining a target monitoring object provided by an embodiment of the present application;
图7为本申请实施例提供的一种用于确定目标指标的第一界面的示意图;FIG. 7 is a schematic diagram of a first interface for determining a target indicator provided by an embodiment of the present application;
图8为本申请实施例提供的一种用于确定目标指标的第二界面的示意图;FIG. 8 is a schematic diagram of a second interface for determining target indicators provided by an embodiment of the present application;
图9为本申请实施例提供的一种监控方法的流程示意图;FIG. 9 is a schematic flowchart of a monitoring method provided in an embodiment of the present application;
图10为本申请实施例提供的一种原始数据的映射关系示意图;FIG. 10 is a schematic diagram of a mapping relationship of raw data provided by an embodiment of the present application;
图11为本申请实施例提供的一种管理设备的结构示意图。FIG. 11 is a schematic structural diagram of a management device provided by an embodiment of the present application.
具体实施方式Detailed ways
在本申请的描述中,除非另有说明,“/”表示“或”的意思,例如,A/B可以表示A或B。本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。此外,“至少一个”是指一个或多个,“多个”是指两个或两个以上。“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。In the description of the present application, unless otherwise specified, "/" means "or", for example, A/B may mean A or B. The "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone These three situations. In addition, "at least one" means one or more, and "plurality" means two or more. Words such as "first" and "second" do not limit the number and order of execution, and words such as "first" and "second" do not necessarily limit the difference.
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in this application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design described herein as "exemplary" or "for example" is not to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.
当前,云原生架构为云计算中一种主流的架构,具有微服务化、持续集成、容器化、开发者运维(development operations,DevOps)等特点,有助于实现业务基于不同粒度的资源环境实现灵活部署。例如,在云原生架构中,通常以多个计算设备构成的集群承载业务服务。多个计算设备的资源基于容器技术进行划分,得到多个独立的运行环境,分别用于承载一个或多个进程。具体地,容器技术通过利用Linux内核的控制组群(control groups,简称为cgroup)和命名空间(namespace)机制实现进程之间的隔离。其中,cgroup用于表示一个容器的资源(如中央处理器(central processing unit,CPU)资源、内存等)。命令空间机制用于区分不同的运行环境中相同的变量名。基于上述方式,使得各个容器具有独立的计算资源用于进程的运行,并且基于命名空间机制规避容器之间存在相同变量而产生的冲突。Currently, the cloud-native architecture is a mainstream architecture in cloud computing. It has the characteristics of micro-service, continuous integration, containerization, and developer operation and maintenance (development operations, DevOps). Enable flexible deployment. For example, in a cloud-native architecture, a cluster of multiple computing devices is usually used to carry business services. The resources of multiple computing devices are divided based on container technology to obtain multiple independent operating environments, which are used to carry one or more processes respectively. Specifically, the container technology realizes the isolation between processes by utilizing the control groups (control groups, cgroup for short) and namespace (namespace) mechanism of the Linux kernel. Among them, cgroup is used to represent the resources of a container (such as central processing unit (central processing unit, CPU) resources, memory, etc.). The command space mechanism is used to distinguish the same variable name in different operating environments. Based on the above method, each container has independent computing resources for the running of the process, and based on the namespace mechanism, conflicts caused by the existence of the same variable between containers are avoided.
可以理解的,本申请实施例所涉及的计算设备包括但不限于为服务器、计算机等具有计算功能的电子设备。It can be understood that the computing device involved in the embodiment of the present application includes but is not limited to an electronic device having a computing function such as a server and a computer.
当前为了实现集群资源的管理,主要采用kubernetes开源平台(以下简称为k8s)进行容器的编排与调度。其中,pod为k8s中最小的可部署单元,容器在pod中运行,一个pod可以包括一个或多个容器,pod也可以称为容器组。因此,k8s所管理的集群资源可以基于计算设备级、pod级、容器级以及进程级进行不同粒度的划分。Currently, in order to realize the management of cluster resources, the kubernetes open source platform (hereinafter referred to as k8s) is mainly used for container arrangement and scheduling. Among them, pod is the smallest deployable unit in k8s, and containers run in pods. A pod can include one or more containers, and pods can also be called container groups. Therefore, the cluster resources managed by k8s can be divided into different granularities based on computing device level, pod level, container level and process level.
业务服务在业务系统的运行过程中,针对业务系统中各项指标的监控是维护业务系统稳定性的关键。其中,指标项用于从不同维度体现业务系统的运行情况。例如,指标项包括CPU占用率、内存占用情况、运行时长等等运行参数。在Linux系统中,通过进程管理工具可以实现监控上述描述的指标项,描述系统进程的状态。例如,通过top工具可以反映系统进程的动态信息,即随着当前系统进程的运行,及时更新系统进程的状态。又如,通过ps工具反映系统在过去执行的进程的静态快照,即显示查询时刻对应的进程的状态。Business service During the operation of the business system, the monitoring of various indicators in the business system is the key to maintaining the stability of the business system. Among them, the indicator items are used to reflect the operation of the business system from different dimensions. For example, the index items include operating parameters such as CPU usage, memory usage, and running time. In the Linux system, the indicator items described above can be monitored through the process management tool, and the status of the system process can be described. For example, the top tool can reflect the dynamic information of the system process, that is, update the status of the system process in time as the current system process runs. Another example is to use the ps tool to reflect static snapshots of processes executed by the system in the past, that is, to display the status of the process corresponding to the query time.
然而,承载云原生业务的集群资源由于采用容器技术将集群中各个计算设备资源进行划分,形成独立的运行环境,使得上述系统中常用的进程检测工具无法获取容器粒度的运行情况。对此,当前提出一种监控方式,通过在pod中增加并行的监控容器,实现监控pod中各个业务容器的运行情况。如图1所示,存在监控容器的pod可以针对pod中各个业务容器进行指标监控。该种方式下,监控容器可以针对用户基于业务容器的监控需求进行灵活部署。例如,在图1中,在pod1中,为业务容器1和业务容器2部署监控容器1。在pod2中,为业务容器3部署监控容器2。而在pod3中,未部署监控容器对业务容器3进行指标监控,从而不限制于pod中监控容器的部署形式。其中,一个pod中还可以部署多个监控容器分别用于监控该pod中不同的业务容器。However, because the cluster resources that carry cloud-native services use container technology to divide each computing device resource in the cluster to form an independent operating environment, the process detection tools commonly used in the above-mentioned systems cannot obtain the running status of the container granularity. In this regard, a monitoring method is currently proposed. By adding parallel monitoring containers to the pod, the running status of each business container in the pod can be monitored. As shown in Figure 1, a pod with monitoring containers can monitor indicators for each business container in the pod. In this way, the monitoring container can be flexibly deployed according to the user's monitoring requirements based on the business container. For example, in Figure 1, in pod1, monitor
上述监控方式是k8s中提供的一种sensu监控工具所采用的监控策略,具体地,sensu通过k8s中sidecar方式运行监控容器,通过监控容器与各个业务容器在同一pod中的通信,采集所需的指标项。The above monitoring method is a monitoring strategy adopted by a sensu monitoring tool provided in k8s. Specifically, sensu runs the monitoring container through the sidecar mode in k8s, and collects the required information by monitoring the communication between the container and each business container in the same pod. index item.
在上述监控方式中,由于监控容器运行于用户态的资源下,而无法访问内核资源以获取基于进程粒度的指标监控。因此,sensu的最小监控单元为容器。另外,采用上述方式,需要用户针对监控对象在其所运行的pod中附加监控容器实现监控,从而使得每个pod中占用一定的资源,用于按照用户需求部署一个或多个监控容器,监控效率较低。In the above monitoring method, because the monitoring container runs under the resources of the user mode, it cannot access kernel resources to obtain indicator monitoring based on process granularity. Therefore, the smallest monitoring unit of sensu is the container. In addition, using the above method, the user needs to add a monitoring container to the pod running on the monitoring object to realize monitoring, so that each pod occupies a certain amount of resources, which is used to deploy one or more monitoring containers according to user needs, and the monitoring efficiency is improved. lower.
当前提出另一种监控方式,通过调用指标观测工具集(inspektor-gadget)中的程序,以访问内核资源,获取容器内部进程的运行情况。具体地,该指标观测工具集是基于Linux内核的扩展的伯克利数据包过滤器(extended Berkeley packet filter,eBPF)框架实现访问内核资源。该指标观测工具集可以实现更多指标项的监控,例如获取磁盘的输入输出(input output,IO)流量、网络连接时延、文件打开对象等信息。如图2所示,用户通过调用指标观测工具集中的对某项指标的监控指令,该指令通过eBPF框架实现对内核资源的访问,获取相应的监控信息,反馈至用户。Currently, another monitoring method is proposed, by calling the program in the indicator observation tool set (inspektor-gadget) to access kernel resources and obtain the running status of the internal process of the container. Specifically, the index observation tool set is based on the extended Berkeley packet filter (eBPF) framework of the Linux kernel to access kernel resources. The indicator observation tool set can realize the monitoring of more indicator items, such as obtaining information such as disk input and output (IO) traffic, network connection delay, and file open objects. As shown in Figure 2, the user calls the monitoring command for a certain indicator in the indicator observation tool set, and the command realizes access to kernel resources through the eBPF framework, obtains corresponding monitoring information, and feeds back to the user.
在该种监控方式中,指标观测工具集存储于每个计算设备中,基于用户需求进行指标项监控。但在云原生场景下,基于多计算设备构成的集群资源,无法根据用户需求实现指令的快速调用。In this monitoring method, the indicator observation tool set is stored in each computing device, and indicator items are monitored based on user needs. However, in the cloud-native scenario, based on the cluster resources composed of multiple computing devices, it is impossible to quickly call instructions according to user needs.
对此,本申请提出一种监控方法,应用于管理设备,该管理设备与多个部署有云原生业务的计算设备通信,用于响应用户基于不同粒度的指标监控请求,并定位到相应的计算设备,获取相应的指标项所对应的数据信息以反馈至用户,从而有助于灵活获取集群中计算设备的运行情况,并通过多个计算设备的统一管理,提升指标监控的效率。In this regard, this application proposes a monitoring method, which is applied to a management device. The management device communicates with multiple computing devices deployed with cloud-native services, and is used to respond to user monitoring requests based on indicators of different granularities and locate corresponding computing devices. Devices, obtain the data information corresponding to the corresponding index items to feed back to the user, which helps to flexibly obtain the operation status of the computing devices in the cluster, and improve the efficiency of index monitoring through the unified management of multiple computing devices.
如图3所示,为本申请提供的一种指标监控的场景示意图,包括管理设备用于管理集群中多个计算设备(计算设备1、计算设备2、……、计算设备n,n为大于1的整数)。其中,各个计算设备包括一个或多个pod,每个pod中包括一个或多个容器,容器中的进程运行有云原生业务,每个计算设备中还包括观测程序集合,用于针对不同的指标项调用相应的观测程序,获取相应的数据信息。示例性的,计算设备通过观测程序在内核中的kprobe、tracepoint等追踪点或者打桩点从内核中读取数据。基于本申请实施例提供的监控方法,管理设备通过向多个计算设备中的目标计算设备发送指标监控请求,目标计算设备基于指标监控请求所请求的目标参数调用相关监控指令,获取目标参数信息,并向管理设备反馈,管理设备接收后反馈至用户,从而实现在云原生业务的场景下,实现对集群资源中多个计算设备不同监控粒度的指标监控,提升集群整体资源的监控效率。As shown in Figure 3, it is a schematic diagram of an indicator monitoring scenario provided by this application, including a management device used to manage multiple computing devices in the cluster (
本申请实施例提供的监控方法所应用的管理设备可以为独立的通信设备执行本申请提供的方法。本申请对于通信设备的具体形态不进行限制,可以为终端或服务器。其中,终端具体可以是手机、增强现实(augmented reality,AR)设备、虚拟现实(virtualreality,VR)设备、平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personalcomputer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等。服务器可以是一个物理或逻辑服务器。The management device to which the monitoring method provided in the embodiment of the present application is applied may execute the method provided in the present application as an independent communication device. The present application does not limit the specific form of the communication device, which may be a terminal or a server. Wherein, the terminal specifically may be a mobile phone, an augmented reality (augmented reality, AR) device, a virtual reality (virtual reality, VR) device, a tablet computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital Assistant (personal digital assistant, PDA), etc. A server can be a physical or logical server.
在硬件实现上,上述通信设备可以通过如图4所示的通信设备实现相应的功能。如图4所示,为本申请实施例提供的一种通信设备40的硬件结构示意图。In terms of hardware implementation, the aforementioned communication device may implement corresponding functions through the communication device shown in FIG. 4 . As shown in FIG. 4 , it is a schematic diagram of a hardware structure of a
图4所示的通信设备40可以包括:处理器401、存储器402、通信接口403、总线404以及显示器405。处理器401、存储器402、通信接口403以及显示器405之间可以通过总线404连接。The
处理器401是通信设备40的控制中心,可以是一个通用CPU,也可以是其他通用处理器等。其中,通用处理器可以是微处理器或者是任何常规的处理器等。The
作为一个示例,处理器401可以包括一个或多个CPU,例如图4中所示的CPU 0和CPU1。As an example, the
存储器402可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electricallyerasable programmable read-only memory,EEPROM)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The
一种可能的实现方式中,存储器402可以独立于处理器401存在。存储器402可以通过总线404与处理器401相连接,用于存储数据、指令或者程序代码。处理器401调用并执行存储器402中存储的指令或程序代码时,能够实现本申请实施例提供的监控方法。In a possible implementation manner, the
另一种可能的实现方式中,存储器402也可以和处理器401集成在一起。In another possible implementation manner, the
通信接口403,用于通信设备40与其他设备通过通信网络连接,通信网络可以是以太网,无线接入网(radio access network,RAN),无线局域网(wireless local areanetworks,WLAN)等。通信接口403可以包括用于接收数据的接收单元,以及用于发送数据的发送单元。The
总线404,可以是工业标准体系结构(industry standard architecture,ISA)总线、外部设备互连(peripheral component interconnect,PCI)总线或扩展工业标准体系结构(extended industry standard architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 404 may be an industry standard architecture (industry standard architecture, ISA) bus, a peripheral component interconnect (PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 4 , but it does not mean that there is only one bus or one type of bus.
显示器405,用于显示图像,视频等。显示器包括显示屏、显示面板等。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dotlight emitting diodes,QLED)等。在一些实施例中,通信设备40可以包括1个或N个显示屏,N为大于1的正整数。本申请实施例中,显示器可以显示监控的目标参数信息,以及接收用户针对监控对象、监控参数的选择操作或输入操作。The
需要指出的是,图4中示出的结构并不构成对通信设备40的限定,除图4所示部件之外,通信设备40可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。It should be noted that the structure shown in FIG. 4 does not constitute a limitation on the
如图5所示,为本申请实施例提供的一种监控方法的流程示意图。该方法应用于监控部署云原生业务的多个计算设备的管理设备中,包括步骤S501-S505。As shown in FIG. 5 , it is a schematic flowchart of a monitoring method provided in the embodiment of the present application. The method is applied to a management device monitoring multiple computing devices deploying cloud native services, including steps S501-S505.
S501、显示第一界面。S501. Displaying a first interface.
其中,第一界面包含多个粒度的监控对象分别对应的候选对象的选择入口。Wherein, the first interface includes selection entries of candidate objects corresponding to monitoring objects of multiple granularities.
其中,多个粒度的监控对象包括:计算设备、程序和变量。程序包括以下至少一项:pod、容器、进程。粒度从大到小依次为计算设备、pod、容器、进程和变量。Wherein, monitoring objects of multiple granularities include: computing devices, programs and variables. A program includes at least one of the following: pod, container, process. The granularity in descending order is computing device, pod, container, process, and variable.
可以理解的是,当前业务系统中包括多个承载云原生业务的计算设备,在本申请所提供的实施例中,根据业务系统中计算资源的划分,提供相应的监控服务。具体地,监控粒度包括计算设备级、pod级、容器级、进程级以及变量级。目标监控对象可以基于各个监控粒度,确定为某个监控粒度下具体的监控对象。It can be understood that the current business system includes multiple computing devices carrying cloud-native services. In the embodiments provided in this application, corresponding monitoring services are provided according to the division of computing resources in the business system. Specifically, the monitoring granularity includes computing device level, pod level, container level, process level and variable level. The target monitoring object can be determined as a specific monitoring object under a certain monitoring granularity based on each monitoring granularity.
示例性的,如图6所示,为本申请实施例提供的一种第一界面的示意图。其中,第一界面包括基于不同粒度的监控对象对应的候选对象的选择入口。在图6所示的第一界面中,包含指示各个监控粒度的标识,例如计算设备、容器组pod、容器以及进程。对应地,每个监控粒度包括一个或多个候选对象。Exemplarily, as shown in FIG. 6 , it is a schematic diagram of a first interface provided by the embodiment of the present application. Wherein, the first interface includes an entry for selecting candidate objects corresponding to monitoring objects based on different granularities. In the first interface shown in FIG. 6 , there are identifiers indicating each monitoring granularity, such as computing device, container group pod, container, and process. Correspondingly, each monitoring granularity includes one or more candidate objects.
S502、接收针对第一界面中候选对象的选择入口的第一操作。S502. Receive a first operation for selecting an entry of a candidate object in the first interface.
其中,该第一操作用于在候选对象的选择入口中确定目标监控对象。Wherein, the first operation is used to determine the target monitoring object in the selection entry of the candidate object.
示例性的,在图6中,每个监控粒度对应一个候选对象的选择入口,该选择入口用于用户在该监控粒度下的一个或多个候选对象中进行选择。例如,在图6中通过点击该入口,显示每个监控粒度对应的候选对象列表,该候选对象列表由一个或多个的候选对象的标识构成。例如,计算设备对应的候选对象列表中包括计算设备a、计算设备b和计算设备c。类似地,pod对应的候选对象列表可以包括pod a、pod b和pod c。Exemplarily, in FIG. 6 , each monitoring granularity corresponds to a selection entry of a candidate object, and the selection entry is used for the user to select one or more candidate objects under the monitoring granularity. For example, by clicking this entry in FIG. 6 , a candidate object list corresponding to each monitoring granularity is displayed, and the candidate object list is composed of one or more identifiers of candidate objects. For example, the candidate object list corresponding to the computing device includes computing device a, computing device b, and computing device c. Similarly, the list of candidates for pods can include pod a, pod b, and pod c.
S503、响应于第一操作,在候选对象中确定目标监控对象;其中,目标监控对象包括以下任意一种或多种:计算集群中的目标计算设备、目标计算设备中的目标程序以及目标程序中的目标变量。S503. In response to the first operation, determine a target monitoring object among candidate objects; wherein, the target monitoring object includes any one or more of the following: a target computing device in a computing cluster, a target program in a target computing device, and a target program in a target program target variable.
示例性的,当目标监控对象为计算集群中的目标计算设备时,该指标监控请求用于基于计算设备级粒度进行指标监控,管理设备所获取的目标指标为该目标计算设备在运行云原生业务的过程中的运行情况,例如目标计算设备整体资源的CPU使用率、内存占用情况等数据信息。Exemplarily, when the target monitoring object is a target computing device in a computing cluster, the indicator monitoring request is used to perform indicator monitoring based on computing device-level granularity, and the target indicator obtained by the management device is that the target computing device is running a cloud-native business The running status of the process, such as the CPU usage rate of the overall resources of the target computing device, memory usage and other data information.
示例性的,当目标程序为pod时,目标监控对象为多个计算设备中目标计算设备中的pod,该指标监控请求用于基于pod级粒度进行监控,指标监控请求用于请求获取的目标指标为该pod在目标计算设备中的CPU使用率、内存占用情况等数据信息。Exemplarily, when the target program is a pod, the target monitoring object is a pod in the target computing device among multiple computing devices, the indicator monitoring request is used for monitoring based on pod-level granularity, and the indicator monitoring request is used for requesting to obtain target indicators Data information such as the CPU usage and memory usage of the pod in the target computing device.
示例性的,当目标程序为容器或进程时,该指标监控请求用于基于容器级或进程级粒度进行监控,其中,所请求的目标指标为目标计算设备中的某个容器或某个进程在目标计算设备中的运行情况。Exemplarily, when the target program is a container or a process, the indicator monitoring request is used for monitoring based on container-level or process-level granularity, wherein the requested target indicator is a certain container or a certain process in the target computing device in Operations on the target computing device.
示例性的,监控粒度还包括变量级。目标监控对象为目标程序中的某个变量。其中,运行于计算设备中的进程可能包含多个变量,指标监控请求可以基于变量级粒度进行检测,该指标监控请求用于请求获取的目标指标为该变量在计算设备中的CPU使用率、内存占用情况等数据信息。Exemplarily, the monitoring granularity also includes a variable level. The target monitoring object is a certain variable in the target program. Among them, the process running on the computing device may contain multiple variables, and the indicator monitoring request can be detected based on variable-level granularity. Data information such as occupancy status.
结合上述步骤S501和步骤S502,在一种可能的实现方式中,管理设备存储业务系统中多个计算设备的标识,以及多个计算设备与程序之间的关联关系。如下表1所示,为一种计算设备与程序之间关联关系的示例。In conjunction with the above step S501 and step S502, in a possible implementation manner, the management device stores identifiers of multiple computing devices in the business system, and associations between the multiple computing devices and programs. As shown in Table 1 below, it is an example of an association relationship between a computing device and a program.
表1、计算设备与程序之间关联关系的示例Table 1. Examples of associations between computing devices and programs
在一种示例中,图6所示的计算设备对应的候选对象列表包含计算设备a、计算设备b和计算设备c,pod对应的候选对象列表包含pod a1、pod a2、pod b和pod c,容器对应的候选对象列表包含容器a1、容器a2、容器a3、容器b1、容器b2和容器c。In an example, the candidate object list corresponding to the computing device shown in Figure 6 includes computing device a, computing device b, and computing device c, and the candidate object list corresponding to pod includes pod a1, pod a2, pod b, and pod c, The candidate object list corresponding to the container includes container a1 , container a2 , container a3 , container b1 , container b2 and container c.
在另一种示例中,结合上述表1的示例,图6所示的计算设备对应的候选对象列表包含计算设备a、计算设备b和计算设备c,而pod对应的候选对象列表,基于用户针对计算设备中候选对象列表的选择,并结合表1中程序与计算设备的关联关系进行显示,例如,当用户选择计算设备a时,基于计算设备a与程序的关联关系,确定pod对应的候选对象列表包括pod a1和pod a2。类似地,当用户选择pod a1时,容器对应的候选对象列表包括容器a1和容器a2。In another example, in combination with the example in Table 1 above, the candidate object list corresponding to the computing device shown in FIG. 6 includes computing device a, computing device b, and computing device c, and the candidate object list corresponding to pod is based on the Select the list of candidate objects in the computing device, and display it in conjunction with the relationship between the program and the computing device in Table 1. For example, when the user selects computing device a, based on the relationship between computing device a and the program, determine the candidate object corresponding to the pod The list includes pod a1 and pod a2. Similarly, when the user selects pod a1, the candidate object list corresponding to the container includes container a1 and container a2.
可以理解的是,基于上述示例所描述的第一界面中候选对象标识的呈现方式,有助于提升与用户交互的便利性。It can be understood that, based on the above example, the presentation manner of the candidate object identifiers in the first interface is helpful to improve the convenience of interacting with the user.
需要说明的是,以上针对第一界面中所包含的第一界面元素的描述仅作为示例,在实际应用过程中,可以进一步优化,例如,在上述另一种示例中,不同的第一界面元素按照用户逐级选择的监控粒度,在不同的界面显示,当用户选择计算设备a跳转界面至pod及其对应的候选对象列表,用户再基于该界面中的候选对象列表进行选择,本申请对此不作限制。It should be noted that the above description of the first interface elements contained in the first interface is only an example, and can be further optimized in the actual application process. For example, in the above another example, different first interface elements According to the monitoring granularity selected by the user step by step, it is displayed on different interfaces. When the user selects the computing device a and jumps to the interface to pod and its corresponding candidate object list, the user then makes a selection based on the candidate object list in the interface. This is not limited.
需要说明的是,在图6所示的第一界面中,将候选对象的标识以下拉菜单的形式显示,在实际应用中,还可以通过其他形式显示,本申请对此不作限制。另外,在第一界面中还可以包括更多或更少的界面元素,例如还包括进程级的候选对象列表,以实现用户在第一界面中进行操作,确定监控对象,本申请对此不作限制。It should be noted that, in the first interface shown in FIG. 6 , the identifiers of the candidate objects are displayed in the form of a pull-down menu. In practical applications, they can also be displayed in other forms, which is not limited in this application. In addition, more or fewer interface elements may be included in the first interface, for example, a process-level candidate object list is also included, so that the user can operate in the first interface and determine the monitoring object, which is not limited in this application .
在另一种示例中,用户按照监控粒度逐级选择候选对象,并将监控粒度最小的候选对象作为目标监控对象。如图7中的(a)图所示,基于图6用户在计算设备的候选对象中选择计算设备a后,进一步地在pod中选择pod a1;在图7中的(b)图中,基于pod a1对应的容器a1和容器a2中选择容器a1,此时,容器a1为目标监控对象的标识。若在图6所示的示例中,用户在计算设备中选择计算设备a后不再进行更细粒度的pod进行选择,则目标监控对象的标识为计算设备a。In another example, the user selects candidate objects step by step according to the monitoring granularity, and takes the candidate object with the smallest monitoring granularity as the target monitoring object. As shown in (a) in Figure 7, based on Figure 6, after the user selects computing device a among the candidates of computing devices, he further selects pod a1 in pod; in (b) in Figure 7, based on Select container a1 from container a1 and container a2 corresponding to pod a1. At this time, container a1 is the identifier of the target monitoring object. If, in the example shown in FIG. 6 , the user selects the computing device a from among the computing devices and does not select a finer-grained pod, the target monitoring object is identified as the computing device a.
需要说明的是,第一操作与第一界面中可操作的控件相关,可以根据第一界面中部署的控件灵活调整,本申请对此不作限制。It should be noted that the first operation is related to the operable controls on the first interface, and can be flexibly adjusted according to the controls deployed on the first interface, which is not limited in this application.
可选的,响应于第一操作,管理设备确定至少一个目标监控对象。其中,该至少一个目标监控对象可以是不同监控粒度下的监控对象。示例性的,在图6或图7中,用户选择一个或多个计算设备,或者一个或多个程序;或者,用户选择一个或多个计算设备,以及一个或多个程序。也就是说,目标监控对象可以是相同粒度的一个或多个监控对象,也可以是不同粒度的多个监控对象。Optionally, in response to the first operation, the management device determines at least one target monitoring object. Wherein, the at least one target monitoring object may be a monitoring object under different monitoring granularities. Exemplarily, in FIG. 6 or FIG. 7 , the user selects one or more computing devices, or one or more programs; or, the user selects one or more computing devices, and one or more programs. That is to say, the target monitoring object may be one or more monitoring objects of the same granularity, or multiple monitoring objects of different granularities.
S504、向至少一个目标监控对象所对应的计算设备发送监控请求。S504. Send a monitoring request to a computing device corresponding to at least one target monitoring object.
其中,该监控请求包含目标参数信息的标识和被监控的目标参数,该目标参数用于指示目标监控对象与云原生业务相关的参数。Wherein, the monitoring request includes the identification of the target parameter information and the target parameter to be monitored, and the target parameter is used to indicate the parameters of the target monitoring object related to the cloud native service.
在一种示例中,该目标参数为上文所描述的指标项中的任意一种或多种。In an example, the target parameter is any one or more of the index items described above.
可选的,在上述步骤S504之前,管理设备通过以下步骤S11-S13确定目标参数。Optionally, before the above step S504, the management device determines the target parameters through the following steps S11-S13.
S11、显示第二界面;其中,该第二界面包括运行参数的操作选项。S11. Display a second interface; wherein, the second interface includes operation options for operating parameters.
其中,该运行参数用于指示业务系统中的资源使用信息。可以理解为上文所述的指标项。Wherein, the operating parameter is used to indicate resource usage information in the business system. It can be understood as the index item mentioned above.
示例性的,如图8所示,为本申请实施例提供的一种第二界面的示意图。在图8所示的第二界面中,包含多个运行参数的标识,例如CPU占用率、内存使用率和运行时间。第二界面中还包括各个运行参数所对应的操作选项,例如开启和关闭。具体地,当用户针对某个运行参数对应的操作选项选择开启,则表示需要获取该运行参数的信息。Exemplarily, as shown in FIG. 8 , it is a schematic diagram of a second interface provided by the embodiment of the present application. In the second interface shown in FIG. 8 , there are identifications of multiple running parameters, such as CPU usage rate, memory usage rate and running time. The second interface also includes operation options corresponding to various operating parameters, such as opening and closing. Specifically, when the user chooses to enable the operation option corresponding to a certain operating parameter, it indicates that information on the operating parameter needs to be acquired.
其中,在图8中,运行参数中的运行时间是指计算设备或计算设备中的程序启动后(期间未关闭)的运行时间,例如计算设备开机后的运行时长、pod创建后的运行时长等。Among them, in Figure 8, the running time in the running parameters refers to the running time after the computing device or the program in the computing device is started (during which it is not closed), such as the running time after the computing device is turned on, the running time after the pod is created, etc. .
可选的,管理设备还可以监控磁盘的输入输出,运行参数还包括每秒的读写次数(input/output operations per second,IOPS),用于监控磁盘性能。管理设备还可以执行网络监控,运行参数还包括网络丢包率、网络延迟等。Optionally, the management device can also monitor the input and output of the disk, and the operating parameters also include the number of reads and writes per second (input/output operations per second, IOPS), which is used to monitor the performance of the disk. The management device can also perform network monitoring, and the operating parameters also include network packet loss rate, network delay, etc.
在一种可能的实现方式中,该一个或多个运行参数与目标监控对象具有对应关系。示例性的,当目标监控对象为某个进程时,运行参数可以包括以上示例所描述的一种或多种运行参数之外,还可以包括进程句柄数。其中,通过监控进程句柄数的信息,避免某个进程占用过多资源,影响系统整体性能。该种实现方式,也可以理解为,管理设备基于第一界面确定目标监控对象确定第二界面。In a possible implementation manner, the one or more operating parameters have a corresponding relationship with target monitoring objects. Exemplarily, when the target monitoring object is a certain process, the running parameter may include not only one or more running parameters described in the above examples, but also the number of process handles. Among them, by monitoring the information of the number of process handles, it is avoided that a certain process occupies too many resources and affects the overall performance of the system. This implementation manner can also be understood as, the management device determines the target monitoring object based on the first interface and determines the second interface.
进一步地,当上述步骤S503确定多个目标监控对象时,可通过目标监控对象与运行参数的对应关系,确定各个目标监控对象对应的第二界面。例如,在图6中用户选择计算设备a和计算设备b,响应于用户的选择,第二界面包括计算设备a对应的运行参数的界面,以及计算设备b对应的运行参数的界面。Further, when multiple target monitoring objects are determined in the above step S503, the second interface corresponding to each target monitoring object may be determined through the corresponding relationship between target monitoring objects and operating parameters. For example, in FIG. 6 , the user selects computing device a and computing device b, and in response to the user's selection, the second interface includes an interface corresponding to the operating parameters of computing device a and an interface corresponding to operating parameters of computing device b.
可以理解的是,上述所描述的第二界面中包括的运行参数的数量及形式仅作为示例,在实际应用中,还可以包括更多或更少的运行参数,以及其他与运行参数相对应的选项,本申请对此不作限制。It can be understood that the number and form of the operating parameters included in the second interface described above are only examples, and in actual applications, more or fewer operating parameters and other corresponding operating parameters may also be included. option, which is not limited in this application.
S12、接收针对第二界面中一个或多个运行参数的操作选项的第二操作。S12. A second operation of receiving operation options for one or more operating parameters in the second interface.
具体地,第二操作基于操作选项的具体形式执行。例如,点击、双击或输入。Specifically, the second operation is performed based on a specific form of the operation option. For example, click, double-click, or type.
示例性的,在图8中,操作选项包括开启或关闭,用于用户基于观测指标的需求执行第二操作。Exemplarily, in FIG. 8 , the operation options include on or off, which is used for the user to perform the second operation based on the requirement of the observed index.
S13、响应于第二操作,在一个或多个参数信息中确定目标参数信息。S13. In response to the second operation, determine target parameter information in one or more pieces of parameter information.
示例性的,在图8中,当用户基于“CPU占用率”选择“开启”,则管理设备将“CPU占用率”确定为目标参数。Exemplarily, in FIG. 8 , when the user selects "on" based on the "CPU usage rate", the management device determines the "CPU usage rate" as the target parameter.
可以理解的是,目标参数可以为一个或多个,用于满足用户的监控需求。It can be understood that there may be one or more target parameters to meet the user's monitoring requirements.
其中,上述确定目标参数的过程,也可以理解为开启或关闭一个或多个指标项的监控功能。Wherein, the above-mentioned process of determining the target parameter can also be understood as enabling or disabling the monitoring function of one or more index items.
通过上述步骤S11-S13,管理设备通过为用户显示第二界面,并接收用户针对第二界面的操作,确定目标参数。Through the above steps S11-S13, the management device determines the target parameter by displaying the second interface for the user and receiving the user's operation on the second interface.
其中,当目标监控对象为目标计算设备时,步骤S504具体包括:根据本地存储的计算设备信息确定目标计算设备的地址,基于该地址发送目标参数。当目标监控对象为目标程序或目标变量时,步骤S502具体包括:根据计算设备与程序或变量的关联关系,基于目标监控对象所指示的目标程序,确定该目标程序相对应的计算设备;向该目标程序对应的计算设备发送监控请求。Wherein, when the target monitoring object is a target computing device, step S504 specifically includes: determining an address of the target computing device according to locally stored computing device information, and sending target parameters based on the address. When the target monitoring object is a target program or a target variable, step S502 specifically includes: according to the association relationship between the computing device and the program or variable, based on the target program indicated by the target monitoring object, determining the computing device corresponding to the target program; The computing device corresponding to the target program sends a monitoring request.
在一种可能的实现方式中,管理设备中存储有计算设备的地址,基于目标监控对象所对应的计算设备的标识确定发送监控请求的地址。In a possible implementation manner, the address of the computing device is stored in the management device, and the address for sending the monitoring request is determined based on the identifier of the computing device corresponding to the target monitoring object.
基于上述步骤S11-S13,管理设备确定针对目标监控对象以及被监控的目标参数,向目标监控对象对应的计算设备发送包含该目标参数的标识的监控请求。Based on the above steps S11-S13, the management device determines for the target monitoring object and the monitored target parameter, and sends a monitoring request including the identification of the target parameter to the computing device corresponding to the target monitoring object.
S505、接收目标监控对象所对应的计算设备响应监控请求的目标参数信息。S505. Receive target parameter information that the computing device corresponding to the target monitoring object responds to the monitoring request.
其中,目标参数信息用于反映目标监控对象在其对应的计算设备中目标参数的具体数据信息。Wherein, the target parameter information is used to reflect the specific data information of the target parameter of the target monitoring object in its corresponding computing device.
通过上述步骤S504发送监控请求后,目标监控对象所对应的计算设备接收监控请求,并基于监控请求获取目标参数信息。After the monitoring request is sent through the above step S504, the computing device corresponding to the target monitoring object receives the monitoring request, and obtains target parameter information based on the monitoring request.
结合图3所示,管理设备将监控请求发送至计算设备1中,计算设备1根据监控请求包含的目标参数的标识,在观测程序集合中调用相应的程序,获取目标参数信息,并反馈至管理设备。其中,观测程序集合包括上述第二界面所显示的一个或多个运行参数对应的观测程序。观测程序用于基于eBPF框架实现对内核的访问,获取相应的数据信息。As shown in Figure 3, the management device sends the monitoring request to the
由于当前业务系统中包含多个计算设备,并且基于容器技术将业务系统的资源增加更细粒度的划分方式,使得针对业务系统的资源中各项指标的监控成为难题,当前通常针对单个计算设备对资源进行监控,无法灵活更改监控的指标项,并且针对整体资源基于不同粒度的监控无法实现快速定位,从而使得当前监控效率较低。对此,采用本申请实施例提供的上述方法,管理设备提供操作平台,使得用户通过简单的操作即可完成对集群中各项指标的监控。其中,用户可以针对集群资源采用不同监控粒度,获取相应的参数信息,提升用户针对业务系统的资源的监控效果,进而提升云原生业务运行的稳定性。Since the current business system contains multiple computing devices, and the resources of the business system are divided into more fine-grained methods based on container technology, it is difficult to monitor various indicators in the resources of the business system. Resources are monitored, and the monitored indicators cannot be flexibly changed, and the monitoring of the overall resources based on different granularities cannot be quickly located, which makes the current monitoring efficiency low. In this regard, by adopting the above method provided by the embodiment of the present application, the management device provides an operation platform, so that the user can complete the monitoring of various indicators in the cluster through simple operations. Among them, users can use different monitoring granularities for cluster resources to obtain corresponding parameter information, improve the user's monitoring effect on business system resources, and then improve the stability of cloud-native business operations.
可选的,上述步骤S505之后,方法还包括:管理设备存储目标参数信息。Optionally, after the above step S505, the method further includes: the management device stores the target parameter information.
其中,管理设备将目标参数信息存储于本地,或者将目标参数信息存储于独立于管理设备的其他设备或数据库中。Wherein, the management device stores the target parameter information locally, or stores the target parameter information in other devices or databases independent of the management device.
可以理解的是,用户可能需要获取一段时间内针对某项目标参数信息的数据信息,而不是即时的数据信息,从而基于该一段时间内所获取的数据信息进行分析,例如获取CPU占用率在预设时间段内所达到的最高值,基于该最高值判断是否调整该计算设备中所承载的业务。It is understandable that users may need to obtain data information for a certain target parameter information within a period of time, rather than real-time data information, so as to perform analysis based on the data information obtained within a period of time, such as obtaining Set the highest value reached within the time period, and determine whether to adjust the service carried by the computing device based on the highest value.
可选的,在上述步骤S505之后,方法还包括:管理设备显示监控数据;其中,监控数据包括目标计算设备响应的目标监控对象的目标参数信息,或者,监控数据包括目标计算设备响应的目标监控对象的目标参数信息,以及目标监控对象所属的计算设备的标识。Optionally, after the above step S505, the method further includes: the management device displays monitoring data; wherein the monitoring data includes target parameter information of the target monitoring object responded by the target computing device, or the monitoring data includes target monitoring data responded by the target computing device The target parameter information of the object, and the identification of the computing device to which the target monitoring object belongs.
在一种示例中,当目标监控对象为目标计算设备,目标参数信息的标识为CPU占用率时,则监控数据包括目标计算设备当前的CPU占用率的数值,或者还包括该目标计算设备的标识。当目标监控对象为pod,目标参数信息的标识为CPU占用率时,则监控数据包括该pod在所属的计算设备中的CPU占用率的数值。或者还包括该pod所属的计算设备的标识。In one example, when the target monitoring object is a target computing device, and the target parameter information is identified as a CPU usage rate, the monitoring data includes the value of the current CPU usage rate of the target computing device, or further includes the target computing device ID . When the target monitoring object is a pod, and the identifier of the target parameter information is CPU usage, the monitoring data includes the value of the CPU usage of the pod in the computing device to which it belongs. Optionally also includes the identification of the computing device to which the pod belongs.
在另一种示例中,当目标监控对象为目标程序,例如容器时,目标参数信息的标识为CPU占用率时,则监控数据包括容器在所属的计算设备中的CPU占用率的数值;或者,监控数据还包括该容器所属的pod的标识以及计算设备的标识。类似地,当目标监控对象为进程时,则监控数据包括目标参数信息,还可以包括该进程所属的容器的标识、pod的标识以及计算设备的标识。In another example, when the target monitoring object is a target program, such as a container, and the identification of the target parameter information is CPU usage, the monitoring data includes the value of the CPU usage of the container in the computing device to which it belongs; or, The monitoring data also includes the identity of the pod to which the container belongs and the identity of the computing device. Similarly, when the target monitoring object is a process, the monitoring data includes target parameter information, and may also include the identifier of the container, the identifier of the pod, and the identifier of the computing device to which the process belongs.
通过上述方式,为用户提供与目标监控对象相关的其他监控粒度的程序或计算设备标识,便于用户查看相关信息,提升监控的智能性。Through the above method, the user is provided with other monitoring granularity programs or computing device identifiers related to the target monitoring object, which facilitates the user to view relevant information and improves the intelligence of monitoring.
可选的,上述步骤S505中所接收的目标参数信息为目标监控对象所述的计算设备从操作系统内核中获取的原始数据,在步骤S505之后,方法还包括:将目标参数信息转换为可视化程序可识别的信息,该可视化程序用于将可视化程序可识别的信息转换为可视化图像;显示包含该可视化图像的第三界面。Optionally, the target parameter information received in the above step S505 is the original data obtained from the operating system kernel by the computing device described by the target monitoring object. After step S505, the method further includes: converting the target parameter information into a visualization program identifiable information, the visualization program is used to convert the information identifiable by the visualization program into a visualized image; displaying a third interface containing the visualized image.
可以理解的是,当目标监控对象为目标程序时,相应的观测程序需要访问内核以获取所需的数据。例如,内核在容器创建时,通过cgroup信息记录该容器的运行参数。当目标程序为pod、容器或者为进程时,均可通过cgroup信息获取运行参数。It can be understood that when the target monitoring object is the target program, the corresponding observation program needs to access the kernel to obtain the required data. For example, when a container is created, the kernel records the running parameters of the container through cgroup information. When the target program is a pod, container, or process, the running parameters can be obtained through cgroup information.
其中,采用可视化程序得到可视化图像的过程称为可视化处理过程,可视化处理是指将数据信息基于图像处理技术转换为图形或图像。示例性的,用户获取一段时间内的磁盘IOPS,通过可视化处理得到该段时间内IOPS随时间变化的折线图,有助于用户基于折线图直观获知IOPS随时间变化的趋势,从而确定是否为该计算设备所承载的业务配置更多或更少的资源。Among them, the process of obtaining a visualized image by using a visualization program is called a visualization processing process, and visualization processing refers to converting data information into graphics or images based on image processing technology. Exemplarily, the user obtains the disk IOPS over a period of time, and obtains a line chart of the IOPS over time during the period through visualization processing, which helps the user to intuitively know the trend of IOPS over time based on the line chart, so as to determine whether it is the More or fewer resources are allocated for services carried by computing devices.
可以理解的是,通过上述方式,有助于用户通过可视化图像获取更直观的数据信息,提升用户监控效率。It can be understood that, through the above method, it is helpful for the user to obtain more intuitive data information through the visualized image, and improve the monitoring efficiency of the user.
具体地,管理设备通过如图9所示的步骤得到可视化程序可识别的信息,包括步骤S901-S904。Specifically, the management device obtains information identifiable by the visualization program through the steps shown in FIG. 9 , including steps S901-S904.
S901、判断目标参数信息是否为格式化数据。S901. Determine whether the target parameter information is formatted data.
若是,则执行步骤S904、输出格式化数据;若否,则继续执行步骤S902。If yes, execute step S904 to output the formatted data; if not, continue to execute step S902.
其中,格式化数据是指上述可视化程序能识别的数据。Wherein, the formatted data refers to the data that can be recognized by the above-mentioned visualization program.
可以理解的是,由内核中获取的数据通常按照观测程序规定的数据格式进行输出,相应地,数据接收方需要按照规定的数据格式读取,从而转换为能够由可视化程序识别的格式化数据。在进行可视化处理时,可视化程序对于数据结构、字符串长短度要求较为严格,符合可视化程序上述要求的数据即为格式化数据。It can be understood that the data obtained from the kernel is usually output in accordance with the data format specified by the observation program. Correspondingly, the data receiver needs to read it in accordance with the specified data format, so as to convert it into formatted data that can be recognized by the visualization program. When performing visualization processing, the visualization program has strict requirements on the data structure and the length of the string, and the data that meets the above requirements of the visualization program is formatted data.
S902、将目标参数信息映射为可读取的标签与数值。S902. Map the target parameter information into readable labels and values.
其中,观测程序在计算设备的内核中获取目标程序的目标参数信息时,按照预设规则导出内核中的信息。该预设规则可以为标签与数值的数据存储格式。一组标签与数值对应一组数据,每组数据对应一种观测信息。相应地,管理设备按照该预设规则,进行解析,得到可读取的标签与数值;该可读取的标签与数即为管理设备可读取的目标参数信息。示例性的,如图10所示,原始数据包括数据组1、数据组2和数据组3。经过映射后得到pod ID、容器ID和PID。三个标签分别对应各自的数值,从而基于原始数据得到需要的数据信息。Wherein, when the observation program obtains the target parameter information of the target program in the kernel of the computing device, it exports the information in the kernel according to preset rules. The preset rule may be a data storage format of labels and values. A set of labels and values corresponds to a set of data, and each set of data corresponds to a type of observation information. Correspondingly, the management device analyzes according to the preset rules to obtain readable labels and values; the readable labels and numbers are target parameter information that can be read by the management device. Exemplarily, as shown in FIG. 10 , the original data includes
在一种可能的实现方式中,观测程序需要根据内核中获取的数据,并调用内核中的其他程序进行数据追溯。例如,内核中能够直接获取得到PID,基于PID可以通过其对应的cgroup信息追溯得到容器ID,例如采用以下方式:In a possible implementation, the observation program needs to call other programs in the kernel to trace the data according to the data obtained in the kernel. For example, the PID can be obtained directly in the kernel, and based on the PID, the container ID can be traced back through its corresponding cgroup information, for example, in the following way:
cat/proc/$PID/cgroup|awk-F'/''{print$5}'cat /proc/$PID/cgroup|awk -F'/''{print $5}'
此外,基于PID还可以通过moutinfo追溯得到pod ID,如采用以下方式:In addition, based on the PID, the pod ID can also be traced through moutinfo, such as the following method:
cat/proc/$PID/mountinfo|grep"etc-hosts"|awk-F/{'print$6'}cat /proc/$PID/mountinfo|grep "etc-hosts"|awk -F/{'print $6'}
通过上述方式,有助于基于追溯更多相关的信息,为用户呈现更多的相关数据进行参考,提升集群资源的监控效率。Through the above method, it is helpful to present more relevant data for reference based on tracing more relevant information, and improve the monitoring efficiency of cluster resources.
S903、将标签与数值进行格式化。S903. Format the label and the value.
具体地,将标签和数据格式化的过程为根据可视化程序所规定的数据结构以及字符串长短,进行数据处理,得到格式化数据。Specifically, the process of formatting the labels and data is to perform data processing according to the data structure and the length of the string specified by the visualization program to obtain formatted data.
S904、输出格式化数据。S904. Output formatted data.
可以理解的是,在上述方案中,管理设备通过用户的监控请求实现获取相应计算设备中的运行参数,进一步地,还可以对获取的信息进行处理,以通过可视化程序得到可视化图像。上述步骤S904以管理设备输出格式化数据至进行可视化处理的设备为例,在该步骤后,方法还可以包括,接收可视化图像,进行显示。例如,管理设备获取磁盘I/O流量在一段时间中的变化曲线。It can be understood that, in the above solution, the management device obtains the operating parameters of the corresponding computing device through the user's monitoring request, and further, can also process the obtained information to obtain a visualized image through a visualization program. The above step S904 is an example where the management device outputs the formatted data to the device for visualization processing. After this step, the method may further include receiving a visualization image and displaying it. For example, the management device acquires a change curve of disk I/O traffic over a period of time.
需要说明的是,可视化程序可以直接部署于管理设备中,则步骤S904可以直接输出并显示可视化图像。It should be noted that the visualization program can be directly deployed on the management device, and then step S904 can directly output and display the visualization image.
可以理解的是,通过上述方式,基于从内核中获取的原始数据,经过格式化转换为可视化程序能够读取的格式化数据,从而为实现数据可视化处理提供实现基础。It can be understood that, through the above method, based on the original data obtained from the kernel, it is formatted and converted into formatted data that can be read by a visualization program, thereby providing an implementation basis for realizing data visualization processing.
可选的,管理设备将格式化数据和/或可视化图像存储于本地,或者存储于独立于业务系统的其他设备或数据库中。Optionally, the management device stores the formatted data and/or visualized images locally, or in other devices or databases independent of the business system.
可选的,在上述步骤S505之后,方法还包括:接收目标计算设备发送的第一修改请求,其中,该第一修改请求用于增加或删除多个计算设备中的计算设备或程序;响应于第一修改请求,更新业务系统中计算设备的信息,以及计算设备与程序之间的关联关系。Optionally, after the above step S505, the method further includes: receiving a first modification request sent by the target computing device, where the first modification request is used to add or delete computing devices or programs in multiple computing devices; responding The first modification request is to update the information of the computing device in the business system and the association relationship between the computing device and the program.
可以理解的是,集群中的资源可能发生变更,该资源变更需要在管理设备中更新,使得在指标监控时准确定位指标监控的目标监控对象。It can be understood that the resource in the cluster may change, and the resource change needs to be updated in the management device, so that the target monitoring object of the indicator monitoring can be accurately located during the indicator monitoring.
在一种可能的实现方式中,目标计算设备将变更的计算设备或程序的标识包含于第一修改请求中,发送至管理设备。示例性的,若目标计算设备中增加新的容器,则变更的计算设备或程序的标识至少包含该容器的标识、该容器所属pod的标识以及所属计算设备的标识。若目标计算设备中删除已有的容器,则可以只包含该容器的标识。In a possible implementation manner, the target computing device includes the identifier of the changed computing device or program in the first modification request and sends it to the management device. Exemplarily, if a new container is added to the target computing device, the identifier of the changed computing device or program includes at least the identifier of the container, the identifier of the pod to which the container belongs, and the identifier of the computing device to which the container belongs. If the existing container is deleted in the target computing device, only the identifier of the container may be included.
可以理解的是,管理设备中维护有目标计算设备与目标程序的关联关系,即当前集群中资源基于不同粒度划分后的关联关系,若进行新增,管理设备可以基于新增程序所处位置相应地更新关联关系。若进行删除,管理设备可以基于删除程序的标识相应地更新关联关系。It can be understood that the management device maintains the relationship between the target computing device and the target program, that is, the relationship between the resources in the current cluster based on different granularities. to update the relationship. If deletion is performed, the management device may update the association relationship accordingly based on the identifier of the deletion program.
需要说明的是,上述以修改程序为例,在实际应用中还包括计算设备的增删,管理设备更新业务系统中计算设备的信息的方式与上述实现方式类似,不再重复描述。It should be noted that the modification of the program is taken as an example above, and the addition and deletion of computing devices are also included in the actual application. The method of updating the information of the computing devices in the business system by the management device is similar to the above implementation method, and will not be described again.
可选的,管理设备在第一界面中更新第一界面元素,包括更新目标计算设备的候选对象列表和/或目标程序的候选对象列表。Optionally, the management device updates the first interface element on the first interface, including updating the candidate object list of the target computing device and/or the candidate object list of the target program.
可选的,第二界面中还包括修改操作选项,在上述步骤S505之后,方法还包括:接收针对第二界面中运行参数的修改操作选项的第三操作;响应于第三操作,向目标计算设备发送的修改请求,其中,该修改请求包括修改后目标监控对象运行云原生业务的运行参数。Optionally, the second interface also includes modifying operation options. After the above step S505, the method further includes: receiving a third operation for modifying operating parameters in the second interface; in response to the third operation, calculating A modification request sent by the device, wherein the modification request includes the modified operating parameters of the target monitoring object running cloud-native services.
可以理解的是,在前文所描述的示例中,用户可以通过在第二界面中运行参数对应的操作选项以指示需要监控的目标指标。其中,第二界面中还包括用于修改界面中运行参数的修改操作选项,用于新增或删除当前已有的目标指标的观测指令。It can be understood that, in the example described above, the user can indicate the target index to be monitored by operating the operation option corresponding to the parameter in the second interface. Wherein, the second interface also includes modification operation options for modifying the operating parameters in the interface, and for adding or deleting observation instructions for currently existing target indicators.
在一种可能的实现方式中,该修改操作选项用于接收用户的输入信息,该输入信息可以为针对目标监控设备至少一个指令项中的任意一个或多个的增加或删除指令。响应于用户的输入信息,管理设备向目标监控设备所属的计算设备发送该修改请求,接收修改请求的计算设备基于eBPF框架生成新的观测程序并运行,获取相应的数据信息反馈至管理设备;或者,计算设备在观测指令集和中删除该指令项对应的观测程序。计算设备在完成上述步骤后,向管理设备反馈响应信息,管理设备基于该响应信息更新第二界面中的运行参数。示例性的,bpftrace,bcc或者其他的bpf程序可以通过分类器增加或者删减追踪点(kprobe,trace point等)以达到动态开关对应插件获取想要的信息的目的。In a possible implementation manner, the modification operation option is used to receive input information from a user, and the input information may be an instruction to add or delete any one or more of at least one instruction item of the target monitoring device. In response to the user's input information, the management device sends the modification request to the computing device to which the target monitoring device belongs, and the computing device that receives the modification request generates and runs a new observation program based on the eBPF framework, and obtains corresponding data information to feed back to the management device; or , the computing device deletes the observation program corresponding to the instruction item in the observation instruction set sum. After completing the above steps, the computing device feeds back response information to the management device, and the management device updates the operating parameters in the second interface based on the response information. Exemplarily, bpftrace, bcc or other bpf programs can add or delete trace points (kprobe, trace point, etc.) through classifiers to achieve the purpose of dynamically switching corresponding plug-ins to obtain desired information.
可选的,管理设备在第二界面中更新运行参数的标识以及相应的操作选项。Optionally, the management device updates the identification of the operating parameters and corresponding operation options on the second interface.
需要说明的是,上述修改运行参数可以针对选中的目标监控对象所属的计算设备单独进行修改,也可以针对业务系统中的多个计算设备进行修改,本申请对此不作限制。It should be noted that the modification of the operating parameters above can be performed individually for the computing device to which the selected target monitoring object belongs, or for multiple computing devices in the business system, which is not limited in this application.
通过上述方式,有助于实现根据用户需求灵活调整各个计算设备中的各项指令的观测程序,提升对集群资源的整体监控效率。Through the above method, it is helpful to flexibly adjust the observation program of each command in each computing device according to user needs, and improve the overall monitoring efficiency of cluster resources.
需要说明的是,在上述方案的实现过程中,目标监控对象可以为一个或多个,目标参数的数量也可以为一个或多个,不同的目标监控对象的目标参数可以相同或不同,对此不作限制。It should be noted that, in the implementation process of the above scheme, there can be one or more target monitoring objects, and the number of target parameters can also be one or more, and the target parameters of different target monitoring objects can be the same or different. No limit.
上述主要从方法的角度对本申请实施例的方案进行了介绍。可以理解的是,管理设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和软件模块中的至少一个。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。The foregoing mainly introduces the solutions of the embodiments of the present application from the perspective of methods. It can be understood that, in order to realize the above functions, the management device includes at least one of corresponding hardware structures and software modules for performing various functions. Those skilled in the art should easily realize that the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
本申请实施例可以根据上述方法示例对管理设备进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiments of the present application may divide the management device into functional units according to the above method examples. For example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units. It should be noted that the division of units in the embodiment of the present application is schematic, and is only a logical function division, and there may be another division manner in actual implementation.
在采用对应各个功能划分各个功能模块的情况下,图11示出上述实施例中所涉及的管理设备的一种可能的结构示意图。如图11所示,管理设备110包括显示单元1101、通信单元1102和处理单元1103。In the case of dividing each functional module corresponding to each function, FIG. 11 shows a possible structural diagram of the management device involved in the above embodiment. As shown in FIG. 11 , the management device 110 includes a display unit 1101 , a communication unit 1102 and a processing unit 1103 .
显示单元1101,用于显示第一界面;其中,第一界面包含多个粒度的监控对象分别对应的候选对象的选择入口。The display unit 1101 is configured to display a first interface; wherein, the first interface includes selection entries of candidate objects corresponding to monitoring objects of multiple granularities.
通信单元1102,用于接收针对第一界面中候选对象的选择入口的第一操作。The communication unit 1102 is configured to receive a first operation for selecting an entry of a candidate object in the first interface.
处理单元1103,用于响应于第一操作,在候选对象中确定至少一个目标监控对象;其中,目标监控对象包括以下任意一种或多种:计算集群中的目标计算设备、目标计算设备中的目标程序以及目标程序中的目标变量。The processing unit 1103 is configured to determine at least one target monitoring object among the candidate objects in response to the first operation; wherein, the target monitoring object includes any one or more of the following: a target computing device in a computing cluster, a target computing device in a The object program and the object variable in the object program.
通信单元1102,还用于向至少一个目标监控对象所对应的计算设备发送监控请求;监控请求包括目标监控对象的标识和被监控的目标参数;其中,目标参数用于指示目标监控对象与云原生业务相关的参数。The communication unit 1102 is further configured to send a monitoring request to the computing device corresponding to at least one target monitoring object; the monitoring request includes the identification of the target monitoring object and the monitored target parameters; wherein, the target parameter is used to indicate that the target monitoring object is compatible with the cloud-native business-related parameters.
通信单元1102,还用于接收目标监控对象所对应的计算设备响应监控请求的目标参数信息。The communication unit 1102 is further configured to receive target parameter information of the computing device corresponding to the target monitoring object responding to the monitoring request.
在一种示例中,多个粒度的监控对象包括:计算设备程序和变量;其中,程序包括以下至少一项:pod、容器和进程;粒度从大到小依次为计算设备、pod、容器、进程和变量。In one example, monitoring objects of multiple granularities include: computing device programs and variables; wherein, the programs include at least one of the following: pods, containers, and processes; the order of granularity from large to small is computing devices, pods, containers, and processes and variables.
在一种示例中,管理设备110还包括存储单元1104,用于存储多个粒度的监控对象之间的关联关系;处理单元1103,具体用于响应于第一操作,根据关联关系按照粒度从大到小在候选对象中确定至少一个目标监控对象。In an example, the management device 110 further includes a storage unit 1104, configured to store association relationships between monitoring objects of multiple granularities; Determine at least one target monitoring object among the candidate objects.
在一种示例中,显示单元1101,还用于显示第二界面,第二界面包括运行参数的操作选项;通信单元1102,还用于接收针对第二界面中运行参数的操作选项的第二操作;处理单元1103,还用于响应于第二操作,在运行参数中确定目标参数。In an example, the display unit 1101 is further configured to display a second interface, and the second interface includes operation options of operating parameters; the communication unit 1102 is also configured to receive a second operation on the operation options of operating parameters in the second interface ; The processing unit 1103 is further configured to determine the target parameter among the running parameters in response to the second operation.
在一种示例中,第二界面中还包括修改操作选项,通信单元1102,还用于接收针对第二界面中运行参数的操作选项的第三操作;通信单元1102,还用于响应于第三操作,向目标计算设备发送修改请求,修改请求包括修改后目标监控对象运行云原生业务的运行参数。In one example, the second interface further includes modifying operation options, and the communication unit 1102 is further configured to receive a third operation for operating parameters in the second interface; the communication unit 1102 is further configured to respond to the third Operation, sending a modification request to the target computing device, the modification request includes the modified operating parameters of the target monitoring object running the cloud native business.
在一种示例中,处理单元1103,还用于将目标参数信息转换为可视化程序可识别的信息,其中,可视化程序用于将可视化程序可识别的信息转换为可视化图像;显示单元1101,还用于显示第三界面,第三界面包括可视化图像。In one example, the processing unit 1103 is further configured to convert target parameter information into information recognizable by a visualization program, wherein the visualization program is used to convert information recognizable by the visualization program into a visualized image; the display unit 1101 is also used to For displaying the third interface, the third interface includes a visualized image.
在一种示例中,处理单元1103,具体用于将目标参数信息映射为可读取的标签与数值;根据标签与数值得到可视化程序可识别的信息。In an example, the processing unit 1103 is specifically configured to map target parameter information into readable labels and values; and obtain information recognizable by the visualization program according to the labels and values.
在一种示例中,存储单元1104,还用于存储目标参数信息。In an example, the storage unit 1104 is further configured to store target parameter information.
在一种示例中,存储单元1104还用于存储计算机执行指令,管理设备中的其他单元可以根据存储单元1104中存储的计算机执行指令执行相应的动作。In an example, the storage unit 1104 is also used to store computer-executable instructions, and other units in the management device may perform corresponding actions according to the computer-executable instructions stored in the storage unit 1104 .
关于上述可选方式的具体描述可以参见前述的方法实施例,此处不再赘述。此外,上述提供的任一种管理设备110的解释以及有益效果的描述均可参考上述对应的方法实施例,不再赘述。For a specific description of the foregoing optional manners, reference may be made to the foregoing method embodiments, and details are not repeated here. In addition, for the explanations and descriptions of the beneficial effects of any management device 110 provided above, reference may be made to the corresponding method embodiments above, and details are not repeated here.
本申请实施例还提供了一种管理设备,参考上述图4所示的通信设备的硬件结构示意图,该管理设备包括:处理器、显示器和通信接口,处理器与通信接口和显示器分别连接。显示器,用于显示第一界面;其中,第一界面包含多个粒度的监控对象分别对应的候选对象的选择入口;通信接口,用于接收针对第一界面中候选对象的选择入口的第一操作;处理器,用于响应于第一操作,在候选对象中确定至少一个目标监控对象;其中,目标监控对象包括以下任意一种或多种:计算集群中的目标计算设备、目标计算设备中的目标程序以及目标程序中的目标变量;通信接口,还用于向至少一个目标监控对象所对应的计算设备发送监控请求;监控请求包括目标监控对象的标识和被监控的目标参数;其中,目标参数用于指示目标监控对象与云原生业务相关的参数;通信接口,还用于接收目标监控对象所对应的计算设备响应监控请求的目标参数信息。The embodiment of the present application also provides a management device. Referring to the schematic diagram of the hardware structure of the communication device shown in FIG. 4 above, the management device includes: a processor, a display, and a communication interface, and the processor is connected to the communication interface and the display respectively. The display is used to display the first interface; wherein, the first interface includes the selection entries of candidate objects corresponding to the monitored objects of multiple granularities; the communication interface is used to receive the first operation for the selection entries of the candidate objects in the first interface a processor, configured to determine at least one target monitoring object among candidate objects in response to the first operation; wherein, the target monitoring object includes any one or more of the following: a target computing device in a computing cluster, a target computing device in a The target program and the target variable in the target program; the communication interface is also used to send a monitoring request to the computing device corresponding to at least one target monitoring object; the monitoring request includes the identification of the target monitoring object and the monitored target parameter; wherein, the target parameter It is used to indicate the parameters related to the target monitoring object and the cloud native business; the communication interface is also used to receive the target parameter information of the computing device corresponding to the target monitoring object responding to the monitoring request.
该管理设备中还包括存储器。其中,存储器中可以包含计算机程序代码。处理器用于执行存储器中存储的计算机程序代码,从而实现本申请实施例提供的方法。The management device also includes a memory. Wherein, the memory may contain computer program codes. The processor is configured to execute the computer program code stored in the memory, so as to realize the method provided by the embodiment of the present application.
在实现过程中,本实施例提供的方法中的各步骤可以通过管理设备的处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。During implementation, each step in the method provided by this embodiment may be implemented by an integrated logic circuit of hardware in a processor of the management device or instructions in the form of software. The steps of the methods disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,当该计算机程序在计算机上运行时,使得该计算机执行上文提供的任一种管理设备所执行的方法。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run on a computer, the computer is made to execute any one of the management devices provided above The method executed.
关于上述提供的任一种计算机可读存储介质中相关内容的解释及有益效果的描述,均可以参考上述对应的实施例,此处不再赘述。Regarding the explanation of relevant content and the description of beneficial effects in any computer-readable storage medium provided above, reference may be made to the above-mentioned corresponding embodiments, and details are not repeated here.
本申请实施例还提供了一种芯片。该芯片中集成了用于实现上述管理设备的功能的控制电路和一个或者多个端口。可选的,该芯片支持的功能可以参考上文,此处不再赘述。本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可通过程序来指令相关的硬件完成。所述的程序可以存储于一种计算机可读存储介质中。上述提到的存储介质可以是只读存储器,随机接入存储器等。上述处理单元或处理器可以是中央处理器,通用处理器、特定集成电路(application specific integrated circuit,ASIC)、微处理器(digital signal processor,DSP),现场可编程门阵列(field programmable gatearray,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。The embodiment of the present application also provides a chip. The chip integrates a control circuit and one or more ports for realizing the above-mentioned functions of the management device. Optionally, the functions supported by the chip can refer to the above, and will not be repeated here. Those of ordinary skill in the art can understand that all or part of the steps for implementing the above-mentioned embodiments can be completed by instructing related hardware through a program. The program can be stored in a computer-readable storage medium. The storage medium mentioned above may be a read-only memory, a random access memory, and the like. The above-mentioned processing unit or processor can be a central processing unit, a general-purpose processor, a specific integrated circuit (application specific integrated circuit, ASIC), a microprocessor (digital signal processor, DSP), a field programmable gate array (field programmable gate array, FPGA) ) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.
本申请实施例还提供了一种包含指令的计算机程序产品,当该指令在计算机上运行时,使得计算机执行上述实施例中的任意一种方法。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如SSD)等。The embodiments of the present application also provide a computer program product containing instructions, which, when the instructions are run on a computer, cause the computer to execute any one of the methods in the foregoing embodiments. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. A computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or may contain one or more data storage devices such as servers and data centers that can be integrated with the medium. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, SSD), etc.
应注意,本申请实施例提供的上述用于存储计算机指令或者计算机程序的器件,例如但不限于,上述存储器、计算机可读存储介质和通信芯片等,均具有非易失性(non-transitory)。It should be noted that the above-mentioned devices for storing computer instructions or computer programs provided in the embodiments of the present application, such as but not limited to, the above-mentioned memory, computer-readable storage medium, and communication chip, etc., all have non-transitory .
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(digitalsubscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. A computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or may contain one or more data storage devices such as servers and data centers that can be integrated with the medium. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)), etc.
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看附图、公开内容、以及所附权利要求书,可理解并实现公开实施例的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。Although the present application has been described in conjunction with various embodiments herein, those skilled in the art can understand and realize the disclosure by viewing the drawings, the disclosure, and the appended claims during the implementation of the claimed application. Other Variations of Embodiments. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Although the application has been described in conjunction with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely illustrative of the application as defined by the appended claims and are deemed to cover any and all modifications, variations, combinations or equivalents within the scope of this application. Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211697581.XA CN116010201B (en) | 2022-12-28 | 2022-12-28 | A monitoring method, management device and computing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211697581.XA CN116010201B (en) | 2022-12-28 | 2022-12-28 | A monitoring method, management device and computing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116010201A true CN116010201A (en) | 2023-04-25 |
CN116010201B CN116010201B (en) | 2025-04-29 |
Family
ID=86034887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211697581.XA Active CN116010201B (en) | 2022-12-28 | 2022-12-28 | A monitoring method, management device and computing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116010201B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020253347A1 (en) * | 2019-06-17 | 2020-12-24 | 深圳前海微众银行股份有限公司 | Container cluster management method, device and system |
CN112835766A (en) * | 2021-02-10 | 2021-05-25 | 杭州橙鹰数据技术有限公司 | Application monitoring method and device |
CN113918414A (en) * | 2021-09-28 | 2022-01-11 | 百融至信(北京)征信有限公司 | An ice monitoring method and system based on pinpoint and prometheus |
CN115225538A (en) * | 2022-07-22 | 2022-10-21 | 中国平安人寿保险股份有限公司 | Monitoring method and device based on self-hosting cluster, electronic equipment and storage medium |
-
2022
- 2022-12-28 CN CN202211697581.XA patent/CN116010201B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020253347A1 (en) * | 2019-06-17 | 2020-12-24 | 深圳前海微众银行股份有限公司 | Container cluster management method, device and system |
CN112835766A (en) * | 2021-02-10 | 2021-05-25 | 杭州橙鹰数据技术有限公司 | Application monitoring method and device |
CN113918414A (en) * | 2021-09-28 | 2022-01-11 | 百融至信(北京)征信有限公司 | An ice monitoring method and system based on pinpoint and prometheus |
CN115225538A (en) * | 2022-07-22 | 2022-10-21 | 中国平安人寿保险股份有限公司 | Monitoring method and device based on self-hosting cluster, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
CHANG LIU ET AL.: "A protocol-independent container network observability analysis system based on eBPF", 2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 25 February 2021 (2021-02-25) * |
刘畅: "基于eBPF的容器网络可观测性方法与实践", 中国优秀硕士论文全文数据库 信息科技辑, vol. 2022, no. 02, 15 February 2022 (2022-02-15), pages 30 - 56 * |
Also Published As
Publication number | Publication date |
---|---|
CN116010201B (en) | 2025-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12248804B1 (en) | Managing user data in a multitenant deployment | |
CN113890826B (en) | Method for computer network, network device and storage medium | |
CN108694071B (en) | Multi-cluster panel for distributed virtualized infrastructure element monitoring and policy control | |
US10776104B2 (en) | Systems and methods for tracking configuration file changes | |
US11630695B1 (en) | Dynamic reassignment in a search and indexing system | |
JP5458308B2 (en) | Virtual computer system, virtual computer system monitoring method, and network device | |
CN112214382B (en) | Alarm method and device | |
US9438665B1 (en) | Scheduling and tracking control plane operations for distributed storage systems | |
US11159390B2 (en) | Systems and methods for service-aware mapping of a system infrastructure | |
CN118138484A (en) | Scalable visualization of health data for network devices | |
US11258675B2 (en) | Message oriented middleware topology explorer | |
US11275667B2 (en) | Handling of workload surges in a software application | |
CN112256423A (en) | System, device and process for dynamic tenant structure adjustment in distributed resource management system | |
US11693710B1 (en) | Workload pool hierarchy for a search and indexing system | |
CN111078695B (en) | Method and device for calculating association relation of metadata in enterprise | |
US12014216B2 (en) | Method for platform-based scheduling of job flow | |
CN114661419A (en) | Service quality control system and method | |
US11874848B2 (en) | Automated dataset placement for application execution | |
CN112068953A (en) | Cloud resource fine management traceability system and method | |
CN114756301A (en) | Log processing method, device and system | |
CN116010201B (en) | A monitoring method, management device and computing system | |
US12238349B2 (en) | Systems and methods for transparent edge application dataset management and control | |
CN116955079A (en) | Distributed monitoring method, device, equipment and computer readable storage medium | |
US11669525B2 (en) | Optimizing workflow movement through device ecosystem boundaries | |
CN116032614A (en) | Container network micro-isolation method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |