WO2020248507A1 - Container cloud-based system resource monitoring method and related device - Google Patents

Container cloud-based system resource monitoring method and related device Download PDF

Info

Publication number
WO2020248507A1
WO2020248507A1 PCT/CN2019/118670 CN2019118670W WO2020248507A1 WO 2020248507 A1 WO2020248507 A1 WO 2020248507A1 CN 2019118670 W CN2019118670 W CN 2019118670W WO 2020248507 A1 WO2020248507 A1 WO 2020248507A1
Authority
WO
WIPO (PCT)
Prior art keywords
framework
container
application
container orchestration
record
Prior art date
Application number
PCT/CN2019/118670
Other languages
French (fr)
Chinese (zh)
Inventor
高峰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020248507A1 publication Critical patent/WO2020248507A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0896Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • This application relates to the technical field of system resource monitoring, and in particular to a method, device, device, and storage medium for monitoring system resources based on a container cloud.
  • the list generation module is set to generate a frame list after obtaining the deployment status of the container orchestration framework under the container cloud platform;
  • the application restart module is configured to, after receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical space occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit After the machine resource occupies the data, the application is reconfigured and restarted, the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node;
  • FIG. 4 is a flowchart of judging insufficient resources in a container cloud-based system resource monitoring method according to an embodiment of the application
  • FIG. 5 is a flowchart of performing data backup before capacity expansion in a container cloud-based system resource monitoring method according to an embodiment of the application
  • the form of the monitoring node can be a functional script composed of commands to access relevant information of the container orchestration framework, and by setting it to request data from the container orchestration framework at a specific time or a specific period, to obtain the running status of the application on it. .
  • the monitoring node By setting the monitoring node, it can correspond to the monitoring requirements of multiple container orchestration frameworks.

Abstract

The present application relates to the field of system resource monitoring technology. Disclosed are a container cloud-based system resource monitoring method and a related device. Said method comprises: acquiring a deployment condition of container orchestration frameworks and then generating a framework list; acquiring operational state information of applications in each of the container orchestration frameworks; determining, according to the operational state information, that the resource of the container orchestration framework is insufficient and then generating alarm information and pushing same to a capacity expansion executor; acquiring physical machine resource configuration data of the container orchestration framework marked as resource being insufficient and occupation data concerning physical machine resources occupied by each application and then recording same in a corresponding recording node; performing reconfiguration and restart of each application after capacity expansion ends, and acquiring current physical machine resource configuration data of the container orchestration framework; and generating a capacity expansion report. In the present application, the operational state of each application on a container cloud platform is monitored, and pre-warning is issued in a timely manner when system resources are insufficient, so that a capacity expansion demand of a container orchestration framework is responded to quickly, and historical data before and after capacity expansion is retained.

Description

基于容器云的系统资源监控方法及相关设备System resource monitoring method and related equipment based on container cloud
本申请要求于2019年06月14日提交中国专利局、申请号为201910515745.4、发明名称为“基于容器云的系统资源监控方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 14, 2019, the application number is 201910515745.4, and the invention title is "Container Cloud-based System Resource Monitoring Method and Related Equipment", the entire content of which is incorporated by reference In application.
技术领域Technical field
本申请涉及系统资源监控技术领域,尤其涉及一种基于容器云的系统资源监控方法、装置、设备及存储介质。This application relates to the technical field of system resource monitoring, and in particular to a method, device, device, and storage medium for monitoring system resources based on a container cloud.
背景技术Background technique
随着分布式计算资源的应用逐渐普及,容器云技术开始为各类用户所青睐,互联网云计算服务提供商也针对自身的特点通过容器云技术开发了自家的产品,比如通过将容器云技术集成在自家的大产品系列中的阿里云和腾讯云,或者深度定制开发的平安Padis平台等,这些产品均是基于应用容器引擎Docker的分布式平台,可以完成应用程序的快速创建、运行、快速缩容扩容以及故障自愈。这些容器云平台的使用均需要通过依赖容器编排框架对运行于平台上的各类服务和应用进行资源的分配和管理。比如,基于Docker的Docker Swarm、Marathon、kubernetes、Nomad等编排工具。通过这些编排工具使各个服务和应用的资源得到合理分配,并且在应用或者服务崩溃时得以被恢复。常见的容器编排框架类产品提供了友好界面和RestAPI等简单易用的数据接口来创建和管理应用,也具有与第三方系统集成的便利性,比如Marathon框架还能够通过JSON格式文本来实现对应用或者服务的定义,在完成对应用的定义后再通过RestAPI提交并运行应用,使其使用难度大大降低。With the increasing popularity of distributed computing resources, container cloud technology has begun to be favored by various users. Internet cloud computing service providers have also developed their own products through container cloud technology based on their own characteristics, such as by integrating container cloud technology In its own large product series, Alibaba Cloud and Tencent Cloud, or the deeply customized Ping An Padis platform, these products are distributed platforms based on the application container engine Docker, which can complete the rapid creation, operation, and rapid reduction of applications. Capacity expansion and failure self-healing. The use of these container cloud platforms requires resource allocation and management of various services and applications running on the platform by relying on the container orchestration framework. For example, Docker-based Docker Swarm, Marathon, kubernetes, Nomad and other orchestration tools. Through these orchestration tools, the resources of each service and application can be reasonably allocated, and can be restored when the application or service crashes. Common container orchestration framework products provide friendly interfaces and easy-to-use data interfaces such as RestAPI to create and manage applications. They also have the convenience of integration with third-party systems. For example, the Marathon framework can also implement applications through JSON format text. Or service definition, after completing the definition of the application, submit and run the application through RestAPI, which greatly reduces the difficulty of using it.
在业内传统方案中,随着使用时间的持续和业务的扩展,往往同一个平台上需要在确保原有部署结构不作较大调整的情况下,对系统资源进行扩容,以满足发展中的业务所对应的应用或者服务对于系统资源的持续增长的需求。比如,开始时在Padis平台上根据现有业务种类的不同,对应搭建并部署了多个Marathon框架后组成了Marathon集群,由这些框架集去管理不同的业务类型中运行的各类应用或者服务。发明人意识到,随着业务持续发展,往往导致现有的应用所占用的系统资源出现紧张而导致应用运行迟缓甚至崩溃的情况,此时即便重启该应用也于事无补,此时需要及时对该应用所在的Marathon框架进行系统资源的扩容。但是,现有技术通常使用Google的容器监控工具cAdvisor来查看运行于Marathon等容器编排框架上的各个应用或者服务所占用的物理机器资源的使用情况,这类技术手段存在如下局限性:In traditional solutions in the industry, with the continuous use of time and business expansion, it is often necessary to expand system resources on the same platform without major adjustments to the original deployment structure to meet the needs of the developing business. Corresponding applications or services continue to increase demand for system resources. For example, at the beginning, based on the different types of existing services on the Padis platform, multiple Marathon frameworks were built and deployed to form a Marathon cluster. These framework sets manage various applications or services running in different business types. The inventor realizes that as the business continues to develop, the system resources occupied by the existing applications are often strained, causing the application to run slowly or even crash. At this time, even restarting the application will not help. The Marathon framework where the application is located expands system resources. However, the prior art usually uses Google's container monitoring tool cAdvisor to view the usage of physical machine resources occupied by various applications or services running on container orchestration frameworks such as Marathon. Such technical methods have the following limitations:
1)同一时间只能监控一台物理主机,相当于单节点监控,而无法满足多节点监控的需求,但是,运行在同一容器云平台上的应用可能分布在不同的容器编排框架所管理的机器资源内运行,因此可能在不同的物理主机上运行,单节点监控无法满足这类应用的实际资源使用的监控需求。1) Only one physical host can be monitored at the same time, which is equivalent to single-node monitoring, and cannot meet the needs of multi-node monitoring. However, applications running on the same container cloud platform may be distributed on machines managed by different container orchestration frameworks It runs within the resource, so it may run on different physical hosts. Single-node monitoring cannot meet the monitoring needs of the actual resource usage of such applications.
2)只能进行实时状态查看,无法查看历史数据,从而无法为一些用于对容器云平台上的应用和服务的运行趋势进行分析的功能提供历史数据支持。2) Only real-time status viewing can be performed, and historical data cannot be viewed, so that historical data support cannot be provided for some functions used to analyze the running trend of applications and services on the container cloud platform.
3)预警功能较弱,缺乏电话或邮件告警的功能,使得容器编排框架在物理机器资源不足时无法及时对外预警,而在容器云平台的实际运行过程中,尤其在某个应用重启或创建时,物理机器资源不足将导致应用无法启动或者创建成功,如果不能及时处理将会导致该应用对应的业务功能瘫痪。3) The early warning function is weak, and the lack of telephone or email warning functions makes the container orchestration framework unable to provide timely external warning when the physical machine resources are insufficient. In the actual operation of the container cloud platform, especially when an application is restarted or created , Insufficient physical machine resources will cause the application to fail to start or create successfully, and if it cannot be processed in time, the corresponding business functions of the application will be paralyzed.
由此可见,业内需要一种便于对容器云平台中的容器编排框架的使用资源进行多节点监控、历史数据查看和分析及故障预警的技术手段来解决上述技术难题。It can be seen that the industry needs a technical means that facilitates multi-node monitoring, historical data viewing and analysis, and fault warning of the used resources of the container orchestration framework in the container cloud platform to solve the above technical problems.
发明内容Summary of the invention
本申请实施例提供了一种基于容器云的系统资源监控方法、装置、设备及存储介质,以解决对运行于容器云平台的资源使用情况进行监控,及时发现问题后预警,避免应用无法重启而导致业务瘫痪的技术问题。The embodiment of the application provides a method, device, device, and storage medium for monitoring system resources based on a container cloud to solve the problem of monitoring the usage of resources running on the container cloud platform, and early warning after problems are discovered in time, so as to prevent the application from restarting. Technical issues that cause business paralysis.
第一方面,本申请提供一种基于容器云的系统资源监控方法,包括:获取容器云平台下的容器编排框架部署情况后生成框架列表,所述框架列表中记录所有部署在所述容器云平台下的容器编排框架;In the first aspect, this application provides a method for monitoring system resources based on a container cloud, including: obtaining a container orchestration framework deployment status under a container cloud platform and generating a framework list, where all deployments on the container cloud platform are recorded in the framework list The container orchestration framework under;
根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内,所述存储单元内设有用于记录每个容器编排框架的物理机器资源配置数据的框架记录节点和用于记录每个应用的运行状态信息、物理机器资源占用数据的应用记录节点,所述运行状态信息用于标识应用在其所在容器编排框架中的运行状态;According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided, and the operation status information is used to identify The running status of the application in its container orchestration framework;
当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业;When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;
获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中;Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;
接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中;After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;
汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报告。A capacity expansion report is generated after summarizing the record data of the framework record node and the application record node.
第二方面,本申请在一些可能的实施例中提供了一种基于容器云的系统资源监控装置,包括:列表生成模块、应用状态获取模块、报警信息推送模块、数据记录模块、应用重启模块、扩容报告生成模块,其中:In the second aspect, this application provides a container cloud-based system resource monitoring device in some possible embodiments, including: a list generation module, an application status acquisition module, an alarm information push module, a data recording module, an application restart module, Expansion report generation module, including:
列表生成模块,设置为获取容器云平台下的容器编排框架部署情况后生成框架列表;The list generation module is set to generate a frame list after obtaining the deployment status of the container orchestration framework under the container cloud platform;
应用状态获取模块,设置为根据记录顺序从所述框架列表中按预设的获 取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内;The application status acquisition module is configured to acquire the operating status information of each application in each container orchestration framework one by one from the frame list according to a preset acquisition cycle according to the recording order, and record the acquired operating status information in a preset storage Within the unit
报警信息推送模块,设置为当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,标记所述容器编排框架资源不足,生成报警信息后推送给执行扩容操作的执行者;The alarm information push module is set to when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, to mark that the container orchestration framework resources are insufficient, and to generate alarm information and push it to the expansion operation. Executor;
数据记录模块,设置为获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中;The data recording module is configured to obtain physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in the corresponding In the record node;
应用重启模块,设置为接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中;The application restart module is configured to, after receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical space occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit After the machine resource occupies the data, the application is reconfigured and restarted, the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node;
扩容报告生成模块,设置为汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报告。The capacity expansion report generation module is configured to generate a capacity expansion report after summarizing the record data of the framework record node and the application record node.
基于相同的发明构思,本申请在一些可能的实施例中提供了一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现上述基于容器云的系统资源监控方法的步骤。Based on the same inventive concept, in some possible embodiments, the present application provides a computer device, including a memory and a processor. The memory stores computer-readable instructions, and the computer-readable instructions are executed by the processor. When executed, the steps of the above-mentioned container cloud-based system resource monitoring method are realized.
基于相同的发明构思,本申请在一些可能的实施例中提供了一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,实现上述基于容器云的系统资源监控方法的步骤。Based on the same inventive concept, the present application provides in some possible embodiments a computer-readable storage medium with computer-readable instructions stored thereon. When the computer-readable instructions are executed by one or more processors, The steps of implementing the above-mentioned container cloud-based system resource monitoring method.
本申请通过为容器云平台中的每个容器编排框架设置监控节点,获取应用的实时状态信息,通过对状态信息的判断来确定容器编排框架是否存在资源不足情况,以此发出扩容预警,并根据扩容前对应用的资源配置状态的备份,在扩容后进行及时的应用重启,实现了传统的容器云平台中的监控手段无法达到的多监控节点、历史数据留存和自动预警的效果。This application sets up a monitoring node for each container orchestration framework in the container cloud platform, obtains real-time status information of the application, and determines whether the container orchestration framework has insufficient resources by judging the status information, so as to issue a capacity expansion warning based on The backup of the application's resource configuration state before expansion, and timely application restart after expansion, realizes the effects of multiple monitoring nodes, historical data retention, and automatic early warning that cannot be achieved by traditional monitoring methods in the container cloud platform.
附图说明Description of the drawings
图1为本申请实施例的一种基于容器云的系统资源监控方法的主流程图;Fig. 1 is a main flowchart of a method for monitoring system resources based on a container cloud according to an embodiment of the application;
图2为本申请实施例的一种基于容器云的系统资源监控方法中的生成框架列表的流程图;2 is a flowchart of generating a frame list in a method for monitoring system resources based on a container cloud according to an embodiment of the application;
图3为本申请实施例的一种基于容器云的系统资源监控方法中的监控应用状态的流程图;3 is a flowchart of monitoring application status in a container cloud-based system resource monitoring method according to an embodiment of the application;
图4为本申请实施例的一种基于容器云的系统资源监控方法中的判断资源不足的流程图;4 is a flowchart of judging insufficient resources in a container cloud-based system resource monitoring method according to an embodiment of the application;
图5为本申请实施例的一种基于容器云的系统资源监控方法中的扩容前进行数据备份的流程图;FIG. 5 is a flowchart of performing data backup before capacity expansion in a container cloud-based system resource monitoring method according to an embodiment of the application;
图6为本申请实施例的一种基于容器云的系统资源监控方法中的扩容后恢复应用运行的流程图;6 is a flowchart of restoring application operation after capacity expansion in a container cloud-based system resource monitoring method according to an embodiment of the application;
图7为本申请实施例的一种基于容器云的系统资源监控装置的功能框图。Fig. 7 is a functional block diagram of a system resource monitoring device based on a container cloud according to an embodiment of the application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例进行描述。In order to enable those skilled in the art to better understand the solutions of the present application, the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
图1为本申请实施例提供的一种基于容器云的系统资源监控方法的流程图,如图所示,一种基于容器云的系统资源监控方法,包括步骤S1~S6:Figure 1 is a flowchart of a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, a method for monitoring system resources based on a container cloud includes steps S1 to S6:
S1、获取容器云平台下的容器编排框架部署情况后生成框架列表,所述框架列表中记录所有部署在所述容器云平台下的容器编排框架。S1. After obtaining the deployment situation of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded.
具体的,容器云平台上一般通过容器技术部署多个容器编排工具,再将各类服务或者应用通过这些工具组成的功能集群分配对应的系统资源。通过获取容器云平台的访问权限连接至平台的管理控制台,再向控制台发送数据请求命令后获取部署情况。比如,在DCOS平台中使用获取Marathon框架服务状态的接口命令“/ping”来调用Marathon的运行情况。将获取到的所有的容器编排框架信息汇总后根据获取到对应的容器编排框架信息的时间生成列表或名称清单。所述列表或者所述清单用于被后续步骤调用后作为获取应用的运行状态的定位和顺序参考。Specifically, multiple container orchestration tools are generally deployed on the container cloud platform through container technology, and various services or applications are allocated corresponding system resources through functional clusters composed of these tools. Connect to the management console of the platform by obtaining the access permission of the container cloud platform, and then send a data request command to the console to obtain the deployment status. For example, in the DCOS platform, use the interface command "/ping" to obtain the service status of the Marathon framework to call the running status of Marathon. After summarizing all the acquired container arrangement framework information, a list or name list is generated according to the time when the corresponding container arrangement framework information is acquired. The list or the list is used as a positioning and sequence reference for obtaining the running status of the application after being called by subsequent steps.
S2、根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内,所述存储单元内设有用于记录每个容器编排框架的物理机器资源配置数据的框架记录节点和用于记录每个应用的运行状态信息、物理机器资源占用数据的应用记录节点,所述运行状态信息用于标识应用在其所在容器编排框架中的运行状态。S2. Acquire the running status information of each application in each container orchestration framework one by one from the frame list according to the preset acquisition cycle according to the recording order, and record the acquired running status information in a preset storage unit. The storage unit is provided with a framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording the operating status information of each application and physical machine resource occupation data. The operating status information is used for To identify the running state of the application in its container orchestration framework.
具体的,通过运行各个容器编排框架的命令获取由它管理的各个应用的信息,再将这些信息存储在专门为其开辟的记录节点中,以便后续步骤调用数据。比如,通过调用Marathon API接口后发送命令给Marathon的管理控制台,可返回请求的内容。比如,向管理控制台发送“/deployments”可获取当前的marathon编排框架上的应用的部署情况,包括每个应用当前的资源占用情况和运行状态情况。另外,在设置这类记录节点的存储空间中,也为容器编排框架的物理机器资源配置数据开辟了对应的记录节点,这些记录节点中的数据可持续按照记录时间顺序永久保存,供某些分析用途的功能单元调用,比如,为了分析某个应用在一定周期内在云平台上的使用情况,以此来推演该应用对应的业务的开展趋势,此时就需要这些存留的历史数据作为计算依据。Specifically, the information of each application managed by it is obtained by running the commands of each container orchestration framework, and then the information is stored in the record node specially developed for it, so that the data can be called in subsequent steps. For example, by calling the Marathon API interface and sending a command to Marathon's management console, the requested content can be returned. For example, sending "/deployments" to the management console can obtain the current deployment of applications on the marathon orchestration framework, including the current resource occupancy and running status of each application. In addition, in the storage space for setting such recording nodes, corresponding recording nodes are also opened for the physical machine resource configuration data of the container orchestration framework. The data in these recording nodes can be permanently stored in the order of recording time for some analysis. The function unit call of the purpose, for example, in order to analyze the usage of an application on the cloud platform in a certain period, in order to deduce the development trend of the business corresponding to the application, at this time, these retained historical data are required as the basis for calculation.
S3、当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业。S3. When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources, at this time, the alarm information is generated and then pushed to the execution of the expansion operation After that, it is convenient to notify it to perform the expansion of the container orchestration framework.
具体的,某些应用的暂时挂起或者等待状态并不一定是由于资源分配紧张造成,在一定时间后会自动重启成功,但是如果由于资源不足而导致的等待,则会持续下去,导致应用无法重启,因此,需要预先设置一个判断时间 长度,在这个时长内如果某个应用的状态始终是等待,则可认为该应用的资源分配情况不足以支持该应用重启或者正常运行,此时可认为该应用对应的容器编排框架的自身资源存在不足的情况,需要增加足够的硬件资源给它,这一操作称为扩容。当发现某个容器编排框架存在资源不足时,生成对应的报警信息后推送给负责扩容操作的执行者,比如第三方维护公司或者平台运维等,推送的方式包括邮件、SMS消息或者语音拨叫。Specifically, the temporary suspension or waiting state of some applications is not necessarily caused by the shortage of resource allocation, and will automatically restart successfully after a certain period of time, but if the waiting due to insufficient resources is caused, it will continue, causing the application to fail Restart, therefore, it is necessary to set a judgment time length in advance. During this time, if the status of an application is always waiting, it can be considered that the resource allocation of the application is not enough to support the restart or normal operation of the application. The application of the corresponding container orchestration framework's own resources is insufficient, and sufficient hardware resources need to be added to it. This operation is called capacity expansion. When a certain container orchestration framework is found to have insufficient resources, the corresponding alarm information is generated and sent to the performer responsible for the expansion operation, such as a third-party maintenance company or platform operation and maintenance. The push methods include email, SMS message or voice dialing .
S4、获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中。S4. Obtain physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding record nodes .
具体的,通过容器编排框架的管理控制台,发送对应的命令后获取物理机器资源配置数据和物理机器资源占用数据,然后将两类数据存储在对应的记录节点中。Specifically, through the management console of the container orchestration framework, after sending corresponding commands, the physical machine resource configuration data and the physical machine resource occupation data are obtained, and then the two types of data are stored in the corresponding record nodes.
S5、接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中。S5. After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit Afterwards, the application is reconfigured and restarted, and the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node.
具体的,扩容结束作业的信号可通过设置专门的反馈界面获取,由执行者输入后提交。可在反馈界面上设置获取新增了哪些硬件资源的输入入口,从而由执行者提交相关信息,当获取到这类相关信息后,可作为在新增物理机器资源配置数据后记录在框架记录节点中。另外,在扩容结束后,需要对当前扩容的容器编排框架上的应用进行重新配置和重新启动,配置依据即为此前已经保存在应用记录节点中的最近记录的数据,包括内存占用、CPU线程分配等配置数据。Specifically, the signal for the completion of the expansion can be obtained by setting a special feedback interface, and submitted by the executor after input. The input entry of which hardware resources have been added can be set on the feedback interface, so that the executor submits relevant information. When such relevant information is obtained, it can be recorded in the framework record node after adding physical machine resource configuration data in. In addition, after the expansion, the application on the currently expanded container orchestration framework needs to be reconfigured and restarted. The configuration basis is the most recently recorded data that has been saved in the application record node, including memory usage and CPU thread allocation. And other configuration data.
S6、汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报告。S6. Generate a capacity expansion report after summarizing the recorded data of the framework record node and the application record node.
具体的,在扩容结束后,为了给后续作业提供参考依据和数据支持,除了保留在存储单元中的备份数据外,还可以将扩容的情况汇总后生成作业报告,其中记录扩容前后的框架记录节点和应用记录节点中的数据。Specifically, after the expansion is completed, in order to provide reference and data support for subsequent operations, in addition to the backup data retained in the storage unit, a job report can also be generated after the expansion situation is summarized, which records the framework record nodes before and after the expansion And application records the data in the node.
本实施例,通过对运行于容器云平台中的每个容器编排框架进行监控,获取其中的应用的运行状态,以此判断是否存在资源不足的情况,及时发出预警,待扩容作业完成后恢复应用运行,可有效避免传统作业中由于监控的节点单一和无法及时预警造成的业务损失。In this embodiment, by monitoring each container orchestration framework running on the container cloud platform, the running status of the applications in it is obtained to determine whether there is insufficient resources, and an early warning is issued in time, and the application is resumed after the expansion operation is completed. Operation can effectively avoid business losses caused by the single monitoring node and the inability to early warning in traditional operations.
图2为本申请实施例提供的基于容器云的系统资源监控方法中的生成框架列表的流程图,如图所示,所述S1、获取容器云平台下的容器编排框架部署情况后生成框架列表,所述框架列表中记录所有部署在所述容器云平台下的容器编排框架,包括步骤S101~步骤S104:FIG. 2 is a flowchart of generating a framework list in a container cloud-based system resource monitoring method provided by an embodiment of the application. As shown in the figure, the S1, obtaining the container orchestration framework deployment status under the container cloud platform, generates the framework list All the container orchestration frameworks deployed under the container cloud platform are recorded in the framework list, including steps S101 to S104:
S101、连接所述容器云平台的管理控制台。S101. Connect to the management console of the container cloud platform.
S102、向所述容器云平台的管理控制台发送用于获取运行于所述容器云平台上的容器编排框架的情况的数据请求。S102. Send a data request for acquiring the status of the container orchestration framework running on the container cloud platform to the management console of the container cloud platform.
具体的,获取容器云平台的访问权限后,连接到管理控制台,然后发送数据请求,要求获取部署在云平台上的容器编排框架的配置数据,所述数据请求中包含了获取容器编排框架的配置数据的命令。云平台的管理权限包括访问地址、数据端口、用户名和密码等信息。Specifically, after obtaining the access permission of the container cloud platform, connect to the management console, and then send a data request requesting to obtain the configuration data of the container orchestration framework deployed on the cloud platform. The data request includes information about obtaining the container orchestration framework. Command to configure data. The management authority of the cloud platform includes information such as access address, data port, user name and password.
S103、接收所述管理控制台的反馈后生成所述框架列表,所述框架列表中按反馈的时间顺序记录所有运行于所述容器云平台上的容器编排框架。S103. Generate the framework list after receiving feedback from the management console, and record all the container orchestration frameworks running on the container cloud platform in the framework list in the order of the feedback time.
S104、为所述框架列表中的每一个所述容器编排框架按照记录时间生成记录序号,所述记录序号为容器编排框架在容器云平台中的识别序号,用于区分不同的容器编排框架。S104. Generate a record serial number for each of the container arrangement frameworks in the frame list according to the recording time, where the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.
具体的,在接收到控制台返回的数据后,按照返回的时间顺序将获取到的每个容器编排框架均编上序号后整理成列表,便于后续步骤调用和区分。Specifically, after receiving the data returned by the console, each obtained container arrangement frame is serialized and sorted into a list according to the return time sequence, which is convenient for calling and distinguishing in subsequent steps.
本实施例,通过将运行于容器云平台上的所有的容器编排框架整理成列表,便于后续步骤调用。In this embodiment, all the container orchestration frameworks running on the container cloud platform are organized into a list, which is convenient for calling in subsequent steps.
图3为本申请实施例提供的基于容器云的系统资源监控方法中的监控应用状态的流程图,如图所示,所述S2、根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内,包括步骤S201~S204:FIG. 3 is a flowchart of monitoring application status in a container cloud-based system resource monitoring method provided by an embodiment of the application. As shown in the figure, the S2, according to the recording sequence, is obtained from the frame list according to a preset period Obtaining the running status information of each application in each container arrangement framework one by one, and recording the acquired running status information in a preset storage unit, including steps S201 to S204:
S201、为所述框架列表中的每个容器编排框架生成监控节点,所述监控节点用于在设定周期内连接容器编排框架的管理控制台后获取运行其上的各个应用的运行状态信息。S201. Generate a monitoring node for each container orchestration framework in the framework list, where the monitoring node is used to obtain the running status information of each application running on it after connecting to the management console of the container orchestration framework within a set period.
具体的,监控节点的形式可以是利用访问容器编排框架的相关信息的命令组成的功能脚本,通过设置其在特定时间或者特定周期对相关容器编排框架进行数据请求,获取其上的应用的运行情况。通过设置监控节点的方式可以对应多个容器编排框架的监控需求。Specifically, the form of the monitoring node can be a functional script composed of commands to access relevant information of the container orchestration framework, and by setting it to request data from the container orchestration framework at a specific time or a specific period, to obtain the running status of the application on it. . By setting the monitoring node, it can correspond to the monitoring requirements of multiple container orchestration frameworks.
S202、根据所述框架列表中的容器编排框架的记录序号,为每个所述容器编排框架上的应用在所述存储单元中生成对应的应用记录节点,所述应用记录节点用于记录所述监控节点获取的运行于容器编排框架上的各个应用的运行状态信息。S202. According to the record sequence number of the container arrangement framework in the frame list, generate a corresponding application record node in the storage unit for each application on the container arrangement framework, where the application record node is used to record the The running status information of each application running on the container orchestration framework obtained by the monitoring node.
具体的,为了将监控节点获取的应用运行情况等数据永久存储,可以讲数据记录至数据库或者独立的数据文件中,并针对容器编排框架自身和运行其上的应用均设置不同的记录节点,每类记录节点均可按记录时间为顺序,依次记录获取到的数据。Specifically, in order to permanently store data such as application operating conditions obtained by the monitoring node, the data can be recorded in a database or an independent data file, and different recording nodes can be set for the container orchestration framework itself and the applications running on it. The class record nodes can record the acquired data in sequence according to the record time.
S203、通过所述监控节点,按照设定的监测周期连接所述容器编排框架的管理控制台后,请求获取所有运行于所述容器编排框架上的应用的运行状态信息。S203. After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework.
S204、接收所述容器编排框架的管理控制台的反馈后,将应用的运行状态信息按接收到反馈的时间记录于所述应用记录节点内。S204. After receiving the feedback from the management console of the container orchestration framework, record the running status information of the application in the application record node at the time when the feedback is received.
具体的,为监控节点设置监测周期,相当于设置定时任务执行具有监控作用的功能脚本,脚本内配置容器编排框架的管理控制台和记录节点的连接权限信息并具有读写权限。在接收到反馈数据后,利用读写权限在应用记录 节点中写入应用的运行数据,包括应用的运行状态。比如,Marathon管理的应用的状态包括“等待”、“延时”、“挂起”、“运行”。其中,“等待”表示存在某个应用或者服务处于故障或者崩溃的情况,需要对应用或者服务重启;“延时”表示存在应用或者服务由于资源用尽或者堵塞导致执行被延后;“挂起”表示存在应用或者服务暂时被中断后不执行,“运行”表示当前的应用或者服务处于正常运行状态。如果报错则表示存在应用或者服务停用的情况,如果存在这类情况,一般的,Marathon会抛出如“等待”的状态字,用于表示当前该Marathon正等待相关的应用或者服务重启。Specifically, setting a monitoring period for a monitoring node is equivalent to setting a timing task to execute a functional script with a monitoring effect. The script is configured with a management console of the container orchestration framework and records the connection authority information of the node and has read and write permissions. After receiving the feedback data, use the read and write permissions to write the running data of the application in the application record node, including the running state of the application. For example, the status of applications managed by Marathon includes "waiting", "delayed", "suspended", and "running". Among them, "waiting" means that there is a situation in which an application or service is malfunctioning or crashing, and the application or service needs to be restarted; "delay" means that the execution of the application or service is delayed due to resource exhaustion or blockage; "hanging "Indicates that there is an application or service that is temporarily interrupted and will not be executed, and "running" indicates that the current application or service is in a normal running state. If an error is reported, it means that there is a situation where the application or service is disabled. If there is such a situation, generally, Marathon will throw a status word such as "waiting" to indicate that the Marathon is currently waiting for the relevant application or service to restart.
本实施例通过为容器编排框架设置监控节点来获取实时的应用运行情况,并永久记录这些运行数据,供后续调用。In this embodiment, a monitoring node is set for the container orchestration framework to obtain real-time application operating conditions, and these operating data are permanently recorded for subsequent calls.
图4为本申请实施例提供的基于容器云的系统资源监控方法中的判断资源不足的流程图,如图所示,所述S3、当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者,包括步骤S301~S305:Fig. 4 is a flowchart of judging insufficient resources in a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, the S3, when the running state information of any one of the applications is in a preset judgment If it continues to be in the waiting state within the time threshold, it is marked that the container orchestration framework resource is insufficient. At this time, the alarm information is generated and pushed to the performer performing the expansion operation, including steps S301 to S305:
S301、读取所述应用记录节点中的任一应用的运行状态信息。S301. Read the running state information of any application in the application recording node.
S302、判断所述应用在所述判断时间阈值范围内的运行状态信息是否持续为等待状态,如果是,则标记所述容器编排框架的状态为资源不足,如果否,则标记所述容器编排框架的状态为运行正常,所述判断时间阈值范围为预先设置的一段时长。S302. Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if yes, mark the state of the container orchestration framework as insufficient resources, if not, mark the container orchestration framework The status is normal operation, and the judgment time threshold range is a preset period of time.
具体的,如果某个应用在设定的判断时间段内处于等待状态,可认为应用发生故障而需要容器编排框架对其进行重建或者重启,但是当应用始终处于等待状态时,则可认为该应用是无法恢复的。一般的,在容器编排框架上运行的应用,相当于在虚拟机中运行的独立软件程序,当该程序被破坏或者失效后崩溃,一般来说,虚拟机系统会尝试重新启动或者唤醒它,但是对于和业务关系绑定的应用来说,其占用资源随着业务的变化而发生对应变化,一般而言,不进行维护和优化的情况下,其对于资源的需求是越来越多的。当发生此类情况时,一般需要对该应用对应的容器编排框架进行资源的重配置,即硬件资源的扩容,从而分配更多的硬件资源给该容器编排框架使用,使其可分配给出现问题的应用更多的资源,使其可被重新创建或者重新启动。Specifically, if an application is in the waiting state within the set judgment time period, it can be considered that the application has failed and the container orchestration framework needs to be rebuilt or restarted, but when the application is always in the waiting state, it can be considered that the application It cannot be recovered. Generally, an application running on a container orchestration framework is equivalent to an independent software program running in a virtual machine. When the program is destroyed or fails, it crashes. Generally speaking, the virtual machine system will try to restart or wake it up, but For applications that are bound to business relationships, their occupied resources will correspondingly change with business changes. Generally speaking, without maintenance and optimization, their demand for resources is increasing. When such a situation occurs, it is generally necessary to reconfigure the resource of the container orchestration framework corresponding to the application, that is, to expand the hardware resources, so as to allocate more hardware resources to the container orchestration framework to make it available for allocation to the problem. The application has more resources so that it can be recreated or restarted.
S303、按上述步骤遍历所述框架列表中的所有的容器编排框架下的所有应用,标记所有的容器编排框架的状态。S303. Traverse all applications under all container orchestration frameworks in the frame list according to the above steps, and mark the status of all container orchestration frameworks.
具体的,根据框架列表中的各个容器编排框架的序号,逐一从应用记录节点中获取对应的应用的运行状态数据并判断其是否存在需要进行扩容的情况,然后将判断结果记录下来。Specifically, according to the serial number of each container arrangement framework in the framework list, the running state data of the corresponding application is obtained from the application record node one by one, and it is judged whether there is a situation that needs to be expanded, and then the judgment result is recorded.
S304、调用邮件模板后生成报警邮件,在所述报警邮件中记录所述标记为资源不足的容器编排框架的记录序号和标识资源不足的提示信息。S304. Generate an alarm email after calling the email template, and record the record sequence number of the container arrangement framework marked as insufficient resource and prompt information identifying the insufficient resource in the alarm email.
S305、从预设的收信人地址列表中读取所述执行者的邮件地址后将所述报警邮件推送给所述执行者。S305: After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.
具体的,根据上述步骤的记录情况,在发生某个容器编排框架存在资源 不足的情况时,通过调用预先准备好的邮件模板生成具有特定格式的报警邮件,其中记载了发生的问题和发生问题的定位,再根据邮件地址信息将这一报警邮件发送到处理人处,处理人一般是进行扩容操作的执行人,也可以是调度部门,由其转发执行部门。另外,在另一些实施例中,也可以通过设置特定报警内容的语音拨叫电话来实现预警效果,比如,根据对资源不足情况的判断记录生成预警文本,根据文本语音转译引擎生成预警语音后连接执行人后播放该预警语音。在一些实施例中,还可以通过将本申请与执行人的移动端APP绑定的形式进行预警信息的实时推送。Specifically, according to the record of the above steps, when a certain container arrangement framework has insufficient resources, the pre-prepared email template is called to generate an alarm email with a specific format, which records the problem and the problem. Locate, and then send this alarm email to the processor based on the email address information. The processor is generally the executor of the expansion operation, or it can be the dispatch department, which forwards the execution department. In addition, in other embodiments, the warning effect can also be achieved by setting a voice dialing phone with specific warning content. For example, the warning text is generated according to the judgment record of the resource shortage, and the warning voice is generated according to the text-to-speech engine to connect After the executor plays the warning voice. In some embodiments, the real-time push of early warning information can also be carried out in the form of binding the application to the executor's mobile terminal APP.
本实施例,通过对应用的运行状态的判断来确定容器编排框架是否发生资源不足的情况,并结合预警机制实现预警情报的及时发送,为扩容需求的及时满足提供助力。In this embodiment, it is determined by judging the running state of the application whether the container orchestration framework is insufficient in resources, and combined with the early warning mechanism to realize the timely transmission of early warning information, which provides assistance for timely meeting the expansion requirements.
图5为本申请实施例提供的基于容器云的系统资源监控方法中的扩容前进行数据备份的流程图,如图所示,所述S4、获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中,包括步骤S401~S403:FIG. 5 is a flowchart of performing data backup before expansion in a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, the S4 is to obtain physical machine resources of a container orchestration framework marked as insufficient resources. For configuration data and physical machine resource occupation data occupied by any application running in the container orchestration framework, recording two types of data in the corresponding recording node includes steps S401 to S403:
S401、连接所述标记为资源不足的容器编排框架的管理控制台。S401: Connect to the management console of the container orchestration framework marked as insufficient resources.
S402、向所述管理控制台发送数据请求,用于获取所述容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据。S402. Send a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework.
具体的,为扩容准备,需要预先将发生资源不足情况的容器编排框架的扩容前运行情况记录保存下来,为后续扩容后的恢复做储备。为此,要对各个应用的配置情况进行数据采集和记录。通过连接容器编排框架的管理控制台,发送获取应用状态的数据请求后获取对应的数据。获取数据的命令根据每个容器编排框架自身的特点制定。可以根据应用的ID获取,比如,通过“/v2/apps/{id}”命令获取对应id的应用在marathon框架的部署情况,也可以直接获取应用列表后获取,比如,通过“/v2/groups/{id}”获取id标识的应用组的情况。Specifically, to prepare for capacity expansion, it is necessary to record the pre-expansion operating conditions of the container orchestration framework where resource shortages have occurred in advance, so as to reserve for subsequent restoration after expansion. To this end, data collection and recording of the configuration of each application is required. Through the management console connected to the container orchestration framework, the corresponding data is obtained after sending a data request for obtaining the application state. The command for obtaining data is formulated according to the characteristics of each container arrangement framework. It can be obtained according to the ID of the application, for example, through the "/v2/apps/{id}" command to obtain the deployment status of the application with the corresponding id in the marathon framework, or it can be obtained directly after obtaining the application list, for example, through "/v2/groups /{id}" Get the status of the application group identified by id.
S403、接收所述管理控制台的反馈后,按收到反馈的时间,将物理机器资源配置数据记录于所述框架记录节点、将物理机器资源占用数据记录至应用记录节点。S403. After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.
具体的,在接收到控制台的返回后,连接框架记录节点和应用记录节点,将返回的两类数据记录保存。Specifically, after receiving the return from the console, connect the framework record node and the application record node, and save the returned two types of data records.
本实施例,通过在扩容前对应用的配置情况和容器编排框架的物理机器资源情况进行记录,便于生成完整的历史数据记录,也为扩容后的应用恢复提供数据恢复基础。In this embodiment, by recording the configuration of the application and the physical machine resource condition of the container orchestration framework before capacity expansion, it is convenient to generate a complete historical data record and also provides a data recovery basis for application restoration after capacity expansion.
图6为本申请实施例提供的基于容器云的系统资源监控方法中的扩容后恢复应用运行的流程图,如图所示,所述S5、接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重 新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中,包括步骤S501~S504:6 is a flowchart of restoring application operation after capacity expansion in a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, after S5, after receiving the completion signal of the expansion operation fed back by the executor, Retrieve from the storage unit the physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources, and then reconfigure and restart the application, and obtain the marked resource The current physical machine resource configuration data of the insufficient container orchestration framework is then recorded in the framework record node, including steps S501 to S504:
S501、接收所述执行者的包含扩容作业结束信号的反馈信息。S501. Receive feedback information from the executor that includes a signal that the expansion operation ends.
S502、连接所述存储单元后,从所述应用记录节点中读取所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据的距离当前时间最近的一次记录。S502. After connecting to the storage unit, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resource and the closest to the current time.
S503、根据所述物理机器资源占用数据配置对应的应用,在配置完成后对所述应用进行重启。S503: Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed.
具体的,在扩容结束后,通过获取扩容前备份的数据对各个应用进行重启。其中,扩容结束信号可由执行者在扩容作业结束后,根据预先设置的输入界面提供。其中,从记录节点中提取数据时需判断离当前提取时间最近一次记录的时间,根据该时间提取的记录为本次作业前的扩容前备份数据。Specifically, after the expansion is completed, each application is restarted by obtaining the data backed up before the expansion. Among them, the expansion end signal can be provided by the executor according to a preset input interface after the expansion operation is completed. Among them, when extracting data from the recording node, it is necessary to determine the time of the latest recording from the current extraction time, and the record extracted according to the time is the backup data before the expansion before the current job.
S504、连接所述容器编排框架的管理控制台,获取所述容器编排框架的当前的物理机器资源配置数据,将获取的数据按获取时间记录在所述框架记录节点中。S504. Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.
具体的,在扩容后的应用恢复后,通过连接容器编排框架的管理控制台,获取物理机器资源配置数据的当前数据,并将该数据记录到框架记录节点中生成新的数据记录,在这一记录前记录的,是扩容前,未分配给该容器编排框架新硬件资源前的硬件资源分配情况。根据框架记录节点中的这一物理机器资源配置数据,配合应用的恢复情况可分析出业务发展和硬件增长趋势之间的联系。Specifically, after the expanded application is restored, the current data of the physical machine resource configuration data is obtained through the management console connected to the container orchestration framework, and the data is recorded in the framework record node to generate a new data record. What is recorded before recording is the allocation of hardware resources before new hardware resources are allocated to the container orchestration framework before capacity expansion. According to this physical machine resource configuration data in the frame record node, the relationship between business development and hardware growth trends can be analyzed with the recovery of the application.
本实施例,通过调用扩容前的应用配置情况的备份数据,可迅速在扩容结束后恢复应用运行,同时,通过记录扩容前后的容器编排框架的硬件变化数据,可为业务分析的职能部门提供数据分析基础。In this embodiment, by calling the backup data of the application configuration before the expansion, the application operation can be quickly restored after the expansion is completed. At the same time, by recording the hardware change data of the container orchestration framework before and after the expansion, it can provide data for the functional department of business analysis Analysis basis.
在其中一些实施例中,所述获取容器云平台下的容器编排框架部署情况后生成框架列表之后,包括:In some of the embodiments, after obtaining the deployment status of the container orchestration framework under the container cloud platform and generating the framework list, the method includes:
根据所述框架列表中的记录顺序,逐一连接每一个所述容器编排框架的管理控制台。对连接成功的容器编排框架,在所述记录序号后追加成功标记后生成新的记录序号。对连接失败的容器编排框架,在所述记录序号后追加失败标记后生成新的记录序号。According to the record sequence in the frame list, the management console of each container orchestration frame is connected one by one. For the successfully connected container layout framework, a new record sequence number is generated after adding a success mark after the record sequence number. For the container layout framework that fails to connect, a new record sequence number is generated after adding a failure mark after the record sequence number.
具体的,为了对框架列表中的容器编排框架的记录顺序进行筛选,提高连接访问的精度,可预先根据列表顺序对各个容器编排框架进行逐一连接确认,并根据连接情况生成对应的标记,将标记附加到容器编排框架的记录序号后,生成新的记录序号,从而使后续步骤直接从记录序号中识别出当前的连接状态,可跳过执行当前的连接流程。同时,通过追加了标记,可保留完整的框架列表,可在执行完当前对所有标记连接成功的容器编排框架的扩容判断后,再回过头来执行连接失败的容器编排框架的再连接,避免在生成新的记录序号的过程中,由于某些容器编排框架正在扩容而导致判断失真。Specifically, in order to filter the record order of the container arrangement frames in the frame list and improve the accuracy of connection access, each container arrangement frame can be connected and confirmed one by one according to the list order, and the corresponding mark is generated according to the connection situation, and the mark After the record sequence number attached to the container arrangement framework, a new record sequence number is generated, so that the subsequent steps can directly identify the current connection state from the record sequence number, and the current connection process can be skipped. At the same time, by adding tags, a complete list of frames can be retained. After performing the current expansion judgment on all the container orchestration frameworks that are successfully connected, go back and perform the reconnection of the container orchestration frameworks that have failed to connect. In the process of generating a new record serial number, the judgment is distorted due to the expansion of some container layout frameworks.
在其中一些实施例中,所述根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取 的运行状态信息记录在预设的存储单元内之前,包括:In some of the embodiments, the running status information of each application in each container orchestration framework is obtained one by one from the frame list according to a preset acquisition cycle according to the recording order, and the acquired running status information is recorded in a preset Before the storage unit includes:
对所述容器编排框架的记录序号的识别,当所述记录序号中包含成功标记后,执行读取所述容器编排框架中的应用的运行状态信息的操作,当所述记录序号中包含失败标记时,不执行读取所述容器编排框架中的应用的运行状态信息的操作。To identify the record sequence number of the container orchestration framework, when the record sequence number contains a success mark, perform an operation of reading the running status information of the application in the container orchestration framework, when the record sequence number contains a failure mark At this time, the operation of reading the running state information of the application in the container orchestration framework is not performed.
具体的,通过识别新的记录序号,可有效避开部分有问题的容器编排框架,提高连接定位的精准性,从而提高扩容判断作业的效率。Specifically, by identifying the new record serial number, some problematic container arrangement frames can be effectively avoided, the accuracy of connection positioning can be improved, and the efficiency of capacity expansion judgment operations can be improved.
在其中一些实施例中,本申请提供了一种基于容器云的系统资源监控装置,如图7所示,包括列表生成模块、应用状态获取模块、报警信息推送模块、数据记录模块、应用重启模块、扩容报告生成模块,其中:In some of the embodiments, this application provides a system resource monitoring device based on a container cloud, as shown in FIG. 7, including a list generation module, an application status acquisition module, an alarm information push module, a data recording module, and an application restart module , Capacity expansion report generation module, including:
列表生成模块11,设置为获取容器云平台下的容器编排框架部署情况后生成框架列表;The list generating module 11 is configured to generate a frame list after obtaining the deployment status of the container orchestration framework under the container cloud platform;
应用状态获取模块12,设置为根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内;The application status acquisition module 12 is configured to acquire the operating status information of each application in each container orchestration framework one by one from the frame list according to a preset acquisition cycle according to the recording order, and record the acquired operating status information in a preset Storage unit
报警信息推送模块13,设置为当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,标记所述容器编排框架资源不足,生成报警信息后推送给执行扩容操作的执行者;The alarm information push module 13 is configured to, when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, mark that the container orchestration framework resource is insufficient, generate alarm information and push it to perform the expansion operation Executor
数据记录模块14,设置为获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中;The data recording module 14 is configured to obtain physical machine resource configuration data of a container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in In the corresponding record node;
应用重启模块15,设置为接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中;The application restart module 15 is configured to, after receiving the completion signal of the expansion operation fed back by the executor, retrieve from the storage unit the previously recorded resources occupied by each application in the container orchestration framework marked as insufficient After the physical machine resource occupies the data, the application is reconfigured and restarted, and the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node;
扩容报告生成模块16,设置为汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报告。The capacity expansion report generating module 16 is configured to summarize the record data of the framework record node and the application record node to generate a capacity expansion report.
在其中一些实施例中,本申请提出了一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现上述基于容器云的系统资源监控方法的步骤。In some of the embodiments, the present application proposes a computer device including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the above-mentioned The steps of the system resource monitoring method of the container cloud.
在其中一些实施例中,本申请提出了一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,实现上述基于容器云的系统资源监控方法的步骤,其中,所述存储介质可以为非易失性存储介质,也可以为易失性存储介质。In some of the embodiments, this application proposes a computer-readable storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed by one or more processors, the above-mentioned container cloud-based In the steps of the method for monitoring system resources, the storage medium may be a non-volatile storage medium or a volatile storage medium.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they It should be considered as the scope of this specification.

Claims (20)

  1. 一种基于容器云的系统资源监控方法,包括:A method for monitoring system resources based on a container cloud includes:
    获取容器云平台下的容器编排框架部署情况后生成框架列表,所述框架列表中记录所有部署在所述容器云平台下的容器编排框架;After obtaining the deployment status of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded;
    根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内,所述存储单元内设有用于记录每个容器编排框架的物理机器资源配置数据的框架记录节点和用于记录每个应用的运行状态信息、物理机器资源占用数据的应用记录节点,所述运行状态信息用于标识应用在其所在容器编排框架中的运行状态;According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided. The running status of the application in its container orchestration framework;
    当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业;When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;
    获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中;Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;
    接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中;After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;
    汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报告。A capacity expansion report is generated after summarizing the record data of the framework record node and the application record node.
  2. 根据权利要求1所述的基于容器云的系统资源监控方法,所述获取容器云平台下的容器编排框架部署情况后生成框架列表,包括:The method for monitoring system resources based on a container cloud according to claim 1, wherein said obtaining the deployment status of the container orchestration framework under the container cloud platform and generating the framework list comprises:
    连接所述容器云平台的管理控制台;Connect to the management console of the container cloud platform;
    向所述容器云平台的管理控制台发送用于获取运行于所述容器云平台上的容器编排框架的情况的数据请求;Sending to the management console of the container cloud platform a data request for obtaining the status of the container orchestration framework running on the container cloud platform;
    接收所述管理控制台的反馈后生成所述框架列表,所述框架列表中按反馈的时间顺序记录所有运行于所述容器云平台上的容器编排框架;After receiving the feedback from the management console, generate the framework list, in which all the container orchestration frameworks running on the container cloud platform are recorded in the time sequence of the feedback;
    为所述框架列表中的每一个所述容器编排框架按照记录时间生成记录序号,所述记录序号为容器编排框架在容器云平台中的识别序号,用于区分不同的容器编排框架。A record serial number is generated for each container arrangement framework in the frame list according to the recording time, and the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.
  3. 根据权利要求2所述的基于容器云的系统资源监控方法,所述根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内,包括:The method for monitoring system resources based on the container cloud according to claim 2, wherein the running status information of each application in each container orchestration framework is obtained one by one from the frame list according to the sequence of records according to a preset acquisition period, and The acquired operating status information is recorded in a preset storage unit, including:
    为所述框架列表中的每个容器编排框架生成监控节点,所述监控节点用于在设定周期内连接容器编排框架的管理控制台后获取运行其上的各个应用 的运行状态信息;A monitoring node is generated for each container orchestration framework in the framework list, and the monitoring node is used to obtain operating status information of each application running on it after connecting to the management console of the container orchestration framework within a set period;
    根据所述框架列表中的容器编排框架的记录序号,为每个所述容器编排框架上的应用在所述存储单元中生成对应的应用记录节点,所述应用记录节点用于记录所述监控节点获取的运行于容器编排框架上的各个应用的运行状态信息;According to the record sequence number of the container orchestration framework in the framework list, a corresponding application record node is generated in the storage unit for each application on the container orchestration framework, and the application record node is used to record the monitoring node The acquired operating status information of each application running on the container orchestration framework;
    通过所述监控节点,按照设定的监测周期连接所述容器编排框架的管理控制台后,请求获取所有运行于所述容器编排框架上的应用的运行状态信息;After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework;
    接收所述容器编排框架的管理控制台的反馈后,将应用的运行状态信息按接收到反馈的时间记录于所述应用记录节点内。After receiving the feedback from the management console of the container orchestration framework, the running status information of the application is recorded in the application recording node at the time when the feedback is received.
  4. 根据权利要求1或3所述的基于容器云的系统资源监控方法,所述当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业,包括:The method for monitoring system resources based on a container cloud according to claim 1 or 3, wherein when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, the container arrangement is marked The framework resources are insufficient. At this time, after the alarm information is generated and pushed to the performer who performs the expansion operation, it is convenient to notify them to perform the expansion operation of the container orchestration framework, including:
    读取所述应用记录节点中的任一应用的运行状态信息;Read the running state information of any application in the application recording node;
    判断所述应用在所述判断时间阈值范围内的运行状态信息是否持续为等待状态,如果是,则标记所述容器编排框架的状态为资源不足,如果否,则标记所述容器编排框架的状态为运行正常,所述判断时间阈值范围为预先设置的一段时长;Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if so, mark the state of the container orchestration framework as insufficient resources, if not, mark the state of the container orchestration framework For normal operation, the judgment time threshold range is a preset period of time;
    按上述步骤遍历所述框架列表中的所有的容器编排框架下的所有应用,标记所有的容器编排框架的状态;Follow the above steps to traverse all applications under all container orchestration frameworks in the frame list, and mark the status of all container orchestration frameworks;
    调用邮件模板后生成报警邮件,在所述报警邮件中记录所述标记为资源不足的容器编排框架的记录序号和标识资源不足的提示信息;After calling the mail template, an alarm email is generated, and the record serial number of the container arrangement framework marked as insufficient resources and the prompt information identifying the insufficient resources are recorded in the alarm email;
    从预设的收信人地址列表中读取所述执行者的邮件地址后将所述报警邮件推送给所述执行者。After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.
  5. 根据权利要求1所述的基于容器云的系统资源监控方法,所述获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中,包括:The method for monitoring system resources based on the container cloud according to claim 1, wherein the acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and the physical resources occupied by any application running in the container orchestration framework For machine resource occupancy data, two types of data are recorded in the corresponding record node, including:
    连接所述标记为资源不足的容器编排框架的管理控制台;Connecting to the management console of the container orchestration framework marked as insufficient resources;
    向所述管理控制台发送数据请求,用于获取所述容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据;Sending a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework;
    接收所述管理控制台的反馈后,按收到反馈的时间,将物理机器资源配置数据记录于所述框架记录节点、将物理机器资源占用数据记录至应用记录节点。After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.
  6. 根据权利要求1所述的基于容器云的系统资源监控方法,所述接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排 框架的当前的物理机器资源配置数据后记录于所述框架记录节点中,包括:The method for monitoring system resources based on a container cloud according to claim 1, wherein after receiving the completion signal of the expansion operation fed back by the executor, the previously recorded data marked as insufficient resources are retrieved from the storage unit After the physical machine resources occupied by each application in the container orchestration framework occupy data, the application is reconfigured and restarted, and the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is acquired and recorded in the framework The record node includes:
    接收所述执行者的包含扩容作业结束信号的反馈信息;Receiving feedback information from the executor that includes a signal that the expansion operation ends;
    连接所述存储单元后,从所述应用记录节点中读取所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据的距离当前时间最近的一次记录;After the storage unit is connected, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resources and the closest to the current time;
    根据所述物理机器资源占用数据配置对应的应用,在配置完成后对所述应用进行重启;Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed;
    连接所述容器编排框架的管理控制台,获取所述容器编排框架的当前的物理机器资源配置数据,将获取的数据按获取时间记录在所述框架记录节点中。Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.
  7. 根据权利要求2所述的基于容器云的系统资源监控方法,所述获取容器云平台下的容器编排框架部署情况后生成框架列表之后,包括:The method for monitoring system resources based on the container cloud according to claim 2, after obtaining the deployment status of the container orchestration framework under the container cloud platform and generating the framework list, the method comprises:
    根据所述框架列表中的记录顺序,逐一连接每一个所述容器编排框架的管理控制台;Connect the management console of each container orchestration framework one by one according to the record sequence in the framework list;
    对连接成功的容器编排框架,在所述记录序号后追加成功标记后生成新的记录序号;For the successfully connected container layout framework, a new record sequence number is generated after adding a success mark after the record sequence number;
    对连接失败的容器编排框架,在所述记录序号后追加失败标记后生成新的记录序号;For the container layout framework that fails to connect, a new record sequence number is generated after adding a failure mark after the record sequence number;
    所述根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内之前,包括对所述容器编排框架的记录序号的识别,当所述记录序号中包含成功标记后,执行读取所述容器编排框架中的应用的运行状态信息的操作,当所述记录序号中包含失败标记时,不执行读取所述容器编排框架中的应用的运行状态信息的操作。According to the recording sequence, obtaining the running status information of each application in each container arrangement framework one by one according to a preset acquisition cycle from the frame list, and recording the acquired running status information in a preset storage unit includes To identify the record sequence number of the container orchestration framework, when the record sequence number contains a success mark, perform an operation of reading the running status information of the application in the container orchestration framework, when the record sequence number contains a failure mark At this time, the operation of reading the running state information of the application in the container orchestration framework is not performed.
  8. 一种基于容器云的系统资源监控装置,包括:A system resource monitoring device based on container cloud includes:
    列表生成模块,设置为获取容器云平台下的容器编排框架部署情况后生成框架列表;The list generation module is set to generate a frame list after obtaining the deployment status of the container orchestration framework under the container cloud platform;
    应用状态获取模块,设置为根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内;The application status acquisition module is configured to acquire the operating status information of each application in each container orchestration framework one by one from the frame list according to a preset acquisition cycle according to the recording order, and record the acquired operating status information in a preset storage Within the unit
    报警信息推送模块,设置为当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,标记所述容器编排框架资源不足,生成报警信息后推送给执行扩容操作的执行者;The alarm information push module is set to when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, to mark that the container orchestration framework resources are insufficient, and to generate alarm information and push it to the expansion operation. Executor;
    数据记录模块,设置为获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中;The data recording module is configured to obtain physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in the corresponding In the record node;
    应用重启模块,设置为接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获 取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中;The application restart module is configured to, after receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical space occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit After the machine resource occupies the data, the application is reconfigured and restarted, the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node;
    扩容报告生成模块,设置为汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报告。The capacity expansion report generation module is configured to generate a capacity expansion report after summarizing the record data of the framework record node and the application record node.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现如下步骤:A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the following steps are implemented:
    获取容器云平台下的容器编排框架部署情况后生成框架列表,所述框架列表中记录所有部署在所述容器云平台下的容器编排框架;After obtaining the deployment status of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded;
    根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内,所述存储单元内设有用于记录每个容器编排框架的物理机器资源配置数据的框架记录节点和用于记录每个应用的运行状态信息、物理机器资源占用数据的应用记录节点,所述运行状态信息用于标识应用在其所在容器编排框架中的运行状态;According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided. The running status of the application in its container orchestration framework;
    当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业;When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;
    获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中;Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;
    接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中;After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;
    汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报告。A capacity expansion report is generated after summarizing the record data of the framework record node and the application record node.
  10. 如权利要求9所述的计算机设备,所述计算机可读指令被一个或多个所述处理器执行时,使得一个或多个所述处理器实现所述获取容器云平台下的容器编排框架部署情况后生成框架列表的步骤时,执行如下步骤:The computer device according to claim 9, when the computer-readable instructions are executed by one or more of the processors, the one or more of the processors realize the deployment of the container orchestration framework under the acquisition container cloud platform When generating the frame list step after the situation, perform the following steps:
    连接所述容器云平台的管理控制台;Connect to the management console of the container cloud platform;
    向所述容器云平台的管理控制台发送用于获取运行于所述容器云平台上的容器编排框架的情况的数据请求;Sending to the management console of the container cloud platform a data request for obtaining the status of the container orchestration framework running on the container cloud platform;
    接收所述管理控制台的反馈后生成所述框架列表,所述框架列表中按反馈的时间顺序记录所有运行于所述容器云平台上的容器编排框架;After receiving the feedback from the management console, generate the framework list, in which all the container orchestration frameworks running on the container cloud platform are recorded in the time sequence of the feedback;
    为所述框架列表中的每一个所述容器编排框架按照记录时间生成记录序号,所述记录序号为容器编排框架在容器云平台中的识别序号,用于区分不同的容器编排框架。A record serial number is generated for each container arrangement framework in the frame list according to the recording time, and the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.
  11. 如权利要求10所述的计算机设备,所述计算机可读指令被一个或多 个所述处理器实现所述根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内时,执行如下步骤:The computer device according to claim 10, wherein the computer-readable instructions are implemented by one or more of the processors to obtain each container arrangement frame one by one from the frame list according to a preset acquisition cycle according to the recording order When recording the running status information of each application in the running status information in the preset storage unit, perform the following steps:
    为所述框架列表中的每个容器编排框架生成监控节点,所述监控节点用于在设定周期内连接容器编排框架的管理控制台后获取运行其上的各个应用的运行状态信息;A monitoring node is generated for each container orchestration framework in the framework list, and the monitoring node is used to obtain running status information of each application running on it after connecting to the management console of the container orchestration framework within a set period;
    根据所述框架列表中的容器编排框架的记录序号,为每个所述容器编排框架上的应用在所述存储单元中生成对应的应用记录节点,所述应用记录节点用于记录所述监控节点获取的运行于容器编排框架上的各个应用的运行状态信息;According to the record sequence number of the container orchestration framework in the framework list, a corresponding application record node is generated in the storage unit for each application on the container orchestration framework, and the application record node is used to record the monitoring node The acquired operating status information of each application running on the container orchestration framework;
    通过所述监控节点,按照设定的监测周期连接所述容器编排框架的管理控制台后,请求获取所有运行于所述容器编排框架上的应用的运行状态信息;After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework;
    接收所述容器编排框架的管理控制台的反馈后,将应用的运行状态信息按接收到反馈的时间记录于所述应用记录节点内。After receiving the feedback from the management console of the container orchestration framework, the running status information of the application is recorded in the application recording node at the time when the feedback is received.
  12. 如权利要求11所述的计算机设备,所述计算机可读指令被一个或多个所述处理器实现所述当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业时,执行如下步骤:The computer device of claim 11, wherein the computer-readable instructions are implemented by one or more of the processors when the running status information of any one of the applications continues to be waiting within a preset judgment time threshold range Status, the container orchestration framework is marked as insufficient resources. At this time, after the alarm information is generated and pushed to the performer performing the expansion operation, it is convenient to notify them to perform the expansion operation of the container orchestration framework, and perform the following steps:
    读取所述应用记录节点中的任一应用的运行状态信息;Read the running state information of any application in the application recording node;
    判断所述应用在所述判断时间阈值范围内的运行状态信息是否持续为等待状态,如果是,则标记所述容器编排框架的状态为资源不足,如果否,则标记所述容器编排框架的状态为运行正常,所述判断时间阈值范围为预先设置的一段时长;Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if so, mark the state of the container orchestration framework as insufficient resources, if not, mark the state of the container orchestration framework For normal operation, the judgment time threshold range is a preset period of time;
    按上述步骤遍历所述框架列表中的所有的容器编排框架下的所有应用,标记所有的容器编排框架的状态;Follow the above steps to traverse all applications under all container orchestration frameworks in the frame list, and mark the status of all container orchestration frameworks;
    调用邮件模板后生成报警邮件,在所述报警邮件中记录所述标记为资源不足的容器编排框架的记录序号和标识资源不足的提示信息;After calling the mail template, an alarm email is generated, and the record serial number of the container arrangement framework marked as insufficient resources and the prompt information identifying the insufficient resources are recorded in the alarm email;
    从预设的收信人地址列表中读取所述执行者的邮件地址后将所述报警邮件推送给所述执行者。After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.
  13. 如权利要求9所述的计算机设备,所述计算机可读指令被一个或多个所述处理器实现所述获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中时,执行如下步骤:The computer device according to claim 9, wherein the computer-readable instructions are implemented by one or more of the processors to obtain physical machine resource configuration data of a container orchestration framework marked as insufficient resources and run on the container orchestration For the physical machine resource occupancy data occupied by any application in the framework, when the two types of data are recorded in the corresponding record node, the following steps are performed:
    连接所述标记为资源不足的容器编排框架的管理控制台;Connecting to the management console of the container orchestration framework marked as insufficient resources;
    向所述管理控制台发送数据请求,用于获取所述容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据;Sending a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework;
    接收所述管理控制台的反馈后,按收到反馈的时间,将物理机器资源配 置数据记录于所述框架记录节点、将物理机器资源占用数据记录至应用记录节点。After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.
  14. 如权利要求9所述的计算机设备,所述计算机可读指令被一个或多个所述处理器实现所述接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中时,执行如下步骤:The computer device according to claim 9, after the computer-readable instructions are implemented by one or more of the processors, after receiving the completion signal of the expansion operation fed back by the executor, they are retrieved from the storage unit After the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources, the application is reconfigured and restarted, and the current physical of the container orchestration framework marked as insufficient resources is obtained. When the machine resource configuration data is recorded in the frame record node, the following steps are performed:
    接收所述执行者的包含扩容作业结束信号的反馈信息;Receiving feedback information from the executor that includes a signal that the expansion operation ends;
    连接所述存储单元后,从所述应用记录节点中读取所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据的距离当前时间最近的一次记录;After the storage unit is connected, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resources and the closest to the current time;
    根据所述物理机器资源占用数据配置对应的应用,在配置完成后对所述应用进行重启;Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed;
    连接所述容器编排框架的管理控制台,获取所述容器编排框架的当前的物理机器资源配置数据,将获取的数据按获取时间记录在所述框架记录节点中。Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.
  15. 一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,实现如下步骤:A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by one or more processors, the following steps are implemented:
    获取容器云平台下的容器编排框架部署情况后生成框架列表,所述框架列表中记录所有部署在所述容器云平台下的容器编排框架;After obtaining the deployment status of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded;
    根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内,所述存储单元内设有用于记录每个容器编排框架的物理机器资源配置数据的框架记录节点和用于记录每个应用的运行状态信息、物理机器资源占用数据的应用记录节点,所述运行状态信息用于标识应用在其所在容器编排框架中的运行状态;According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided. The running status of the application in its container orchestration framework;
    当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业;When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;
    获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中;Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;
    接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中;After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;
    汇总所述框架记录节点和所述应用记录节点的记录数据后生成扩容报 告。After summarizing the record data of the framework record node and the application record node, a capacity expansion report is generated.
  16. 如权利要求15所述的计算机可读存储介质,所述计算机可读指令被一个或多个所述处理器执行时,使得一个或多个所述处理器实现所述获取容器云平台下的容器编排框架部署情况后生成框架列表的步骤时,执行如下步骤:The computer-readable storage medium according to claim 15, when the computer-readable instructions are executed by one or more of the processors, the one or more of the processors realize the acquisition of the container under the container cloud platform When the steps of generating the framework list after orchestrating the framework deployment situation, perform the following steps:
    连接所述容器云平台的管理控制台;Connect to the management console of the container cloud platform;
    向所述容器云平台的管理控制台发送用于获取运行于所述容器云平台上的容器编排框架的情况的数据请求;Sending to the management console of the container cloud platform a data request for obtaining the status of the container orchestration framework running on the container cloud platform;
    接收所述管理控制台的反馈后生成所述框架列表,所述框架列表中按反馈的时间顺序记录所有运行于所述容器云平台上的容器编排框架;After receiving the feedback from the management console, generate the framework list, in which all the container orchestration frameworks running on the container cloud platform are recorded in the time sequence of the feedback;
    为所述框架列表中的每一个所述容器编排框架按照记录时间生成记录序号,所述记录序号为容器编排框架在容器云平台中的识别序号,用于区分不同的容器编排框架。A record serial number is generated for each container arrangement framework in the frame list according to the recording time, and the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.
  17. 如权利要求16所述的计算机可读存储介质,所述计算机可读指令被一个或多个所述处理器实现所述根据记录顺序从所述框架列表中按预设的获取周期逐一获取每一个容器编排框架中的各个应用的运行状态信息,将获取的运行状态信息记录在预设的存储单元内时,执行如下步骤:16. The computer-readable storage medium according to claim 16, wherein the computer-readable instructions are implemented by one or more of the processors, and each one is acquired one by one from the frame list according to a preset acquisition cycle according to the recording order. When the running status information of each application in the container orchestration framework is recorded in the preset storage unit, the following steps are performed:
    为所述框架列表中的每个容器编排框架生成监控节点,所述监控节点用于在设定周期内连接容器编排框架的管理控制台后获取运行其上的各个应用的运行状态信息;A monitoring node is generated for each container orchestration framework in the framework list, and the monitoring node is used to obtain running status information of each application running on it after connecting to the management console of the container orchestration framework within a set period;
    根据所述框架列表中的容器编排框架的记录序号,为每个所述容器编排框架上的应用在所述存储单元中生成对应的应用记录节点,所述应用记录节点用于记录所述监控节点获取的运行于容器编排框架上的各个应用的运行状态信息;According to the record sequence number of the container orchestration framework in the framework list, a corresponding application record node is generated in the storage unit for each application on the container orchestration framework, and the application record node is used to record the monitoring node The acquired operating status information of each application running on the container orchestration framework;
    通过所述监控节点,按照设定的监测周期连接所述容器编排框架的管理控制台后,请求获取所有运行于所述容器编排框架上的应用的运行状态信息;After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework;
    接收所述容器编排框架的管理控制台的反馈后,将应用的运行状态信息按接收到反馈的时间记录于所述应用记录节点内。After receiving the feedback from the management console of the container orchestration framework, the running status information of the application is recorded in the application recording node at the time when the feedback is received.
  18. 如权利要求17所述的计算机可读存储介质,所述计算机可读指令被一个或多个所述处理器实现所述当任一所述应用的运行状态信息在预设的判断时间阈值范围内持续为等待状态,则标记所述容器编排框架资源不足,此时,生成报警信息后推送给执行扩容操作的执行者后,便于通知其执行容器编排框架的扩容作业时,执行如下步骤:The computer-readable storage medium according to claim 17, wherein the computer-readable instructions are implemented by one or more of the processors when the running state information of any one of the applications is within a preset judgment time threshold range If it continues to be in the waiting state, the container orchestration framework is marked as insufficient resources. At this time, after the alarm information is generated and pushed to the performer performing the expansion operation, it is convenient to notify them to perform the expansion operation of the container orchestration framework, and perform the following steps:
    读取所述应用记录节点中的任一应用的运行状态信息;Read the running state information of any application in the application recording node;
    判断所述应用在所述判断时间阈值范围内的运行状态信息是否持续为等待状态,如果是,则标记所述容器编排框架的状态为资源不足,如果否,则标记所述容器编排框架的状态为运行正常,所述判断时间阈值范围为预先设置的一段时长;Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if so, mark the state of the container orchestration framework as insufficient resources, if not, mark the state of the container orchestration framework For normal operation, the judgment time threshold range is a preset period of time;
    按上述步骤遍历所述框架列表中的所有的容器编排框架下的所有应用, 标记所有的容器编排框架的状态;Traverse all applications under all container orchestration frameworks in the frame list according to the above steps, and mark the status of all container orchestration frameworks;
    调用邮件模板后生成报警邮件,在所述报警邮件中记录所述标记为资源不足的容器编排框架的记录序号和标识资源不足的提示信息;After calling the mail template, an alarm email is generated, and the record serial number of the container arrangement framework marked as insufficient resources and the prompt information identifying the insufficient resources are recorded in the alarm email;
    从预设的收信人地址列表中读取所述执行者的邮件地址后将所述报警邮件推送给所述执行者。After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.
  19. 如权利要求15所述的计算机可读存储介质,所述计算机可读指令被一个或多个所述处理器实现所述获取标记为资源不足的容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据,将两类数据记录在对应的记录节点中时,执行如下步骤:The computer-readable storage medium of claim 15, wherein the computer-readable instructions are implemented by one or more of the processors to obtain the physical machine resource configuration data of the container orchestration framework marked as insufficient resources and run on all For the physical machine resource occupancy data occupied by any application in the container orchestration framework, when two types of data are recorded in the corresponding record node, the following steps are performed:
    连接所述标记为资源不足的容器编排框架的管理控制台;Connecting to the management console of the container orchestration framework marked as insufficient resources;
    向所述管理控制台发送数据请求,用于获取所述容器编排框架的物理机器资源配置数据和运行于所述容器编排框架中的任一应用所占用的物理机器资源占用数据;Sending a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework;
    接收所述管理控制台的反馈后,按收到反馈的时间,将物理机器资源配置数据记录于所述框架记录节点、将物理机器资源占用数据记录至应用记录节点。After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.
  20. 如权利要求15所述的计算机可读存储介质,所述计算机可读指令被一个或多个所述处理器实现所述接收所述执行者反馈的扩容作业结束信号后,从所述存储单元中调取在前记录的所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据后进行应用的重新配置和重启,获取所述标记为资源不足的容器编排框架的当前的物理机器资源配置数据后记录于所述框架记录节点中时,执行如下步骤:15. The computer-readable storage medium according to claim 15, wherein the computer-readable instructions are implemented by one or more of the processors after receiving the completion signal of the expansion operation fed back by the executor, and then read from the storage unit After recalling the physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources, the application is reconfigured and restarted to obtain the information of the container orchestration framework marked as insufficient resources. When the current physical machine resource configuration data is recorded in the framework record node, the following steps are performed:
    接收所述执行者的包含扩容作业结束信号的反馈信息;Receiving feedback information from the executor that includes a signal that the expansion operation ends;
    连接所述存储单元后,从所述应用记录节点中读取所述标记为资源不足的容器编排框架中的每一个应用所占用的物理机器资源占用数据的距离当前时间最近的一次记录;After the storage unit is connected, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resource to the current time;
    根据所述物理机器资源占用数据配置对应的应用,在配置完成后对所述应用进行重启;Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed;
    连接所述容器编排框架的管理控制台,获取所述容器编排框架的当前的物理机器资源配置数据,将获取的数据按获取时间记录在所述框架记录节点中。Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.
PCT/CN2019/118670 2019-06-14 2019-11-15 Container cloud-based system resource monitoring method and related device WO2020248507A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910515745.4A CN110311831B (en) 2019-06-14 2019-06-14 Container cloud-based system resource monitoring method and related equipment
CN201910515745.4 2019-06-14

Publications (1)

Publication Number Publication Date
WO2020248507A1 true WO2020248507A1 (en) 2020-12-17

Family

ID=68077167

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118670 WO2020248507A1 (en) 2019-06-14 2019-11-15 Container cloud-based system resource monitoring method and related device

Country Status (2)

Country Link
CN (1) CN110311831B (en)
WO (1) WO2020248507A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626288A (en) * 2021-08-12 2021-11-09 杭州朗和科技有限公司 Fault processing method, system, device, storage medium and electronic equipment
CN113791954A (en) * 2021-09-17 2021-12-14 上海道客网络科技有限公司 Container bare metal server and method and system for coping with physical environment risks thereof
WO2024002190A1 (en) * 2022-06-30 2024-01-04 中兴通讯股份有限公司 Monitor-based container adjustment method and device, and storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110311831B (en) * 2019-06-14 2022-03-25 平安科技(深圳)有限公司 Container cloud-based system resource monitoring method and related equipment
CN110874291B (en) * 2019-10-31 2022-10-21 北京中科云脑智能技术有限公司 Real-time detection method for abnormal container
CN110768850A (en) * 2019-11-12 2020-02-07 国家电网有限公司 Communication capacity expansion processing method and device based on power system
CN111245900B (en) * 2019-12-31 2021-09-14 北京健康之家科技有限公司 Distributed message sending processing system and processing method thereof
CN111277460B (en) * 2020-01-17 2022-02-25 江苏满运软件科技有限公司 ZooKeeper containerization control method and device, storage medium and electronic equipment
CN113485788B (en) * 2021-06-30 2023-08-29 中国民航信息网络股份有限公司 Container resource allocation method and device, server and computer storage medium
CN114039974A (en) * 2021-10-20 2022-02-11 支付宝(杭州)信息技术有限公司 Cloud container generation method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016155816A1 (en) * 2015-04-01 2016-10-06 Telefonaktiebolaget Lm Ericsson (Publ) Methods and devices for monitoring of network performance for container virtualization
CN108243012A (en) * 2016-12-26 2018-07-03 中国移动通信集团上海有限公司 Charging application processing system, method and device in online charging system OCS
CN109348235A (en) * 2018-11-01 2019-02-15 北京京航计算通讯研究所 VOD method based on private clound
CN109756366A (en) * 2018-12-24 2019-05-14 上海欣方智能系统有限公司 System is realized in intelligent network SCP cloud service based on CAAS
CN110311831A (en) * 2019-06-14 2019-10-08 平安科技(深圳)有限公司 System resource monitoring method and relevant device based on container cloud

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229646A (en) * 2016-03-24 2017-10-03 中兴通讯股份有限公司 Dispositions method, the apparatus and system of data cluster
US10171377B2 (en) * 2017-04-18 2019-01-01 International Business Machines Corporation Orchestrating computing resources between different computing environments
CN109495398A (en) * 2017-09-11 2019-03-19 中国移动通信集团浙江有限公司 A kind of resource regulating method and equipment of container cloud
US10572320B2 (en) * 2017-12-01 2020-02-25 International Business Machines Corporation Detecting co-resident services in a container cloud
CN109491776B (en) * 2018-11-06 2022-05-31 北京百度网讯科技有限公司 Task arranging method and system
CN109586999B (en) * 2018-11-12 2021-03-23 深圳先进技术研究院 Container cloud platform state monitoring and early warning system and method and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016155816A1 (en) * 2015-04-01 2016-10-06 Telefonaktiebolaget Lm Ericsson (Publ) Methods and devices for monitoring of network performance for container virtualization
CN108243012A (en) * 2016-12-26 2018-07-03 中国移动通信集团上海有限公司 Charging application processing system, method and device in online charging system OCS
CN109348235A (en) * 2018-11-01 2019-02-15 北京京航计算通讯研究所 VOD method based on private clound
CN109756366A (en) * 2018-12-24 2019-05-14 上海欣方智能系统有限公司 System is realized in intelligent network SCP cloud service based on CAAS
CN110311831A (en) * 2019-06-14 2019-10-08 平安科技(深圳)有限公司 System resource monitoring method and relevant device based on container cloud

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626288A (en) * 2021-08-12 2021-11-09 杭州朗和科技有限公司 Fault processing method, system, device, storage medium and electronic equipment
CN113626288B (en) * 2021-08-12 2023-08-25 杭州朗和科技有限公司 Fault processing method, system, device, storage medium and electronic equipment
CN113791954A (en) * 2021-09-17 2021-12-14 上海道客网络科技有限公司 Container bare metal server and method and system for coping with physical environment risks thereof
CN113791954B (en) * 2021-09-17 2023-09-22 上海道客网络科技有限公司 Container bare metal server and method and system for coping physical environment risk of container bare metal server
WO2024002190A1 (en) * 2022-06-30 2024-01-04 中兴通讯股份有限公司 Monitor-based container adjustment method and device, and storage medium

Also Published As

Publication number Publication date
CN110311831B (en) 2022-03-25
CN110311831A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
WO2020248507A1 (en) Container cloud-based system resource monitoring method and related device
US11226847B2 (en) Implementing an application manifest in a node-specific manner using an intent-based orchestrator
CN107209710B (en) Node system, server device, scaling control method, and program
US20180285216A1 (en) Virtual Machine Recovery Method and Virtual Machine Management Device
CN108712501B (en) Information sending method and device, computing equipment and storage medium
CN110365762B (en) Service processing method, device, equipment and storage medium
CN108551399B (en) Service deployment method, system and related device in cloud environment
CN106452836B (en) main node setting method and device
WO2019153532A1 (en) Deployment method and apparatus for monitoring system, and computer device and storage medium
CN109361777B (en) Synchronization method, synchronization system and related device for distributed cluster node states
CN112559461A (en) File transmission method and device, storage medium and electronic equipment
CN109002263B (en) Method and device for adjusting storage capacity
CN110196749B (en) Virtual machine recovery method and device, storage medium and electronic device
US11544091B2 (en) Determining and implementing recovery actions for containers to recover the containers from failures
CN112306640A (en) Container dispensing method, apparatus, device and medium therefor
CN111913927A (en) Data writing method and device and computer equipment
JP6394212B2 (en) Information processing system, storage device, and program
JP2006285453A (en) Information processor, information processing method, and information processing program
CN113010263A (en) Method, system, equipment and storage medium for creating virtual machine in cloud platform
CN113448775A (en) Multi-source heterogeneous data backup method and device
CN114598604A (en) Monitoring method, monitoring device and terminal for virtual network function instance information
CN111147554A (en) Data storage method and device and computer system
CN116820686B (en) Physical machine deployment method, virtual machine and container unified monitoring method and device
CN108920164A (en) The management method and device of host in cloud computing system
CN108847980A (en) A kind of method and device of CTDB node failure virtual IP address migration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932342

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932342

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 30.05.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19932342

Country of ref document: EP

Kind code of ref document: A1