WO2020248507A1

WO2020248507A1 - Container cloud-based system resource monitoring method and related device

Info

Publication number: WO2020248507A1
Application number: PCT/CN2019/118670
Authority: WO
Inventors: 高峰
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-06-14
Filing date: 2019-11-15
Publication date: 2020-12-17
Also published as: CN110311831B; CN110311831A

Abstract

The present application relates to the field of system resource monitoring technology. Disclosed are a container cloud-based system resource monitoring method and a related device. Said method comprises: acquiring a deployment condition of container orchestration frameworks and then generating a framework list; acquiring operational state information of applications in each of the container orchestration frameworks; determining, according to the operational state information, that the resource of the container orchestration framework is insufficient and then generating alarm information and pushing same to a capacity expansion executor; acquiring physical machine resource configuration data of the container orchestration framework marked as resource being insufficient and occupation data concerning physical machine resources occupied by each application and then recording same in a corresponding recording node; performing reconfiguration and restart of each application after capacity expansion ends, and acquiring current physical machine resource configuration data of the container orchestration framework; and generating a capacity expansion report. In the present application, the operational state of each application on a container cloud platform is monitored, and pre-warning is issued in a timely manner when system resources are insufficient, so that a capacity expansion demand of a container orchestration framework is responded to quickly, and historical data before and after capacity expansion is retained.

Description

System resource monitoring method and related equipment based on container cloud

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 14, 2019, the application number is 201910515745.4, and the invention title is "Container Cloud-based System Resource Monitoring Method and Related Equipment", the entire content of which is incorporated by reference In application.

Technical field

This application relates to the technical field of system resource monitoring, and in particular to a method, device, device, and storage medium for monitoring system resources based on a container cloud.

Background technique

With the increasing popularity of distributed computing resources, container cloud technology has begun to be favored by various users. Internet cloud computing service providers have also developed their own products through container cloud technology based on their own characteristics, such as by integrating container cloud technology In its own large product series, Alibaba Cloud and Tencent Cloud, or the deeply customized Ping An Padis platform, these products are distributed platforms based on the application container engine Docker, which can complete the rapid creation, operation, and rapid reduction of applications. Capacity expansion and failure self-healing. The use of these container cloud platforms requires resource allocation and management of various services and applications running on the platform by relying on the container orchestration framework. For example, Docker-based Docker Swarm, Marathon, kubernetes, Nomad and other orchestration tools. Through these orchestration tools, the resources of each service and application can be reasonably allocated, and can be restored when the application or service crashes. Common container orchestration framework products provide friendly interfaces and easy-to-use data interfaces such as RestAPI to create and manage applications. They also have the convenience of integration with third-party systems. For example, the Marathon framework can also implement applications through JSON format text. Or service definition, after completing the definition of the application, submit and run the application through RestAPI, which greatly reduces the difficulty of using it.

In traditional solutions in the industry, with the continuous use of time and business expansion, it is often necessary to expand system resources on the same platform without major adjustments to the original deployment structure to meet the needs of the developing business. Corresponding applications or services continue to increase demand for system resources. For example, at the beginning, based on the different types of existing services on the Padis platform, multiple Marathon frameworks were built and deployed to form a Marathon cluster. These framework sets manage various applications or services running in different business types. The inventor realizes that as the business continues to develop, the system resources occupied by the existing applications are often strained, causing the application to run slowly or even crash. At this time, even restarting the application will not help. The Marathon framework where the application is located expands system resources. However, the prior art usually uses Google's container monitoring tool cAdvisor to view the usage of physical machine resources occupied by various applications or services running on container orchestration frameworks such as Marathon. Such technical methods have the following limitations:

1) Only one physical host can be monitored at the same time, which is equivalent to single-node monitoring, and cannot meet the needs of multi-node monitoring. However, applications running on the same container cloud platform may be distributed on machines managed by different container orchestration frameworks It runs within the resource, so it may run on different physical hosts. Single-node monitoring cannot meet the monitoring needs of the actual resource usage of such applications.

2) Only real-time status viewing can be performed, and historical data cannot be viewed, so that historical data support cannot be provided for some functions used to analyze the running trend of applications and services on the container cloud platform.

3) The early warning function is weak, and the lack of telephone or email warning functions makes the container orchestration framework unable to provide timely external warning when the physical machine resources are insufficient. In the actual operation of the container cloud platform, especially when an application is restarted or created , Insufficient physical machine resources will cause the application to fail to start or create successfully, and if it cannot be processed in time, the corresponding business functions of the application will be paralyzed.

It can be seen that the industry needs a technical means that facilitates multi-node monitoring, historical data viewing and analysis, and fault warning of the used resources of the container orchestration framework in the container cloud platform to solve the above technical problems.

Summary of the invention

The embodiment of the application provides a method, device, device, and storage medium for monitoring system resources based on a container cloud to solve the problem of monitoring the usage of resources running on the container cloud platform, and early warning after problems are discovered in time, so as to prevent the application from restarting. Technical issues that cause business paralysis.

In the first aspect, this application provides a method for monitoring system resources based on a container cloud, including: obtaining a container orchestration framework deployment status under a container cloud platform and generating a framework list, where all deployments on the container cloud platform are recorded in the framework list The container orchestration framework under;

According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided, and the operation status information is used to identify The running status of the application in its container orchestration framework;

When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;

Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;

After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;

A capacity expansion report is generated after summarizing the record data of the framework record node and the application record node.

In the second aspect, this application provides a container cloud-based system resource monitoring device in some possible embodiments, including: a list generation module, an application status acquisition module, an alarm information push module, a data recording module, an application restart module, Expansion report generation module, including:

The list generation module is set to generate a frame list after obtaining the deployment status of the container orchestration framework under the container cloud platform;

The application status acquisition module is configured to acquire the operating status information of each application in each container orchestration framework one by one from the frame list according to a preset acquisition cycle according to the recording order, and record the acquired operating status information in a preset storage Within the unit

The alarm information push module is set to when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, to mark that the container orchestration framework resources are insufficient, and to generate alarm information and push it to the expansion operation. Executor;

The data recording module is configured to obtain physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in the corresponding In the record node;

The application restart module is configured to, after receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical space occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit After the machine resource occupies the data, the application is reconfigured and restarted, the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node;

The capacity expansion report generation module is configured to generate a capacity expansion report after summarizing the record data of the framework record node and the application record node.

Based on the same inventive concept, in some possible embodiments, the present application provides a computer device, including a memory and a processor. The memory stores computer-readable instructions, and the computer-readable instructions are executed by the processor. When executed, the steps of the above-mentioned container cloud-based system resource monitoring method are realized.

Based on the same inventive concept, the present application provides in some possible embodiments a computer-readable storage medium with computer-readable instructions stored thereon. When the computer-readable instructions are executed by one or more processors, The steps of implementing the above-mentioned container cloud-based system resource monitoring method.

This application sets up a monitoring node for each container orchestration framework in the container cloud platform, obtains real-time status information of the application, and determines whether the container orchestration framework has insufficient resources by judging the status information, so as to issue a capacity expansion warning based on The backup of the application's resource configuration state before expansion, and timely application restart after expansion, realizes the effects of multiple monitoring nodes, historical data retention, and automatic early warning that cannot be achieved by traditional monitoring methods in the container cloud platform.

Description of the drawings

Fig. 1 is a main flowchart of a method for monitoring system resources based on a container cloud according to an embodiment of the application;

2 is a flowchart of generating a frame list in a method for monitoring system resources based on a container cloud according to an embodiment of the application;

3 is a flowchart of monitoring application status in a container cloud-based system resource monitoring method according to an embodiment of the application;

4 is a flowchart of judging insufficient resources in a container cloud-based system resource monitoring method according to an embodiment of the application;

FIG. 5 is a flowchart of performing data backup before capacity expansion in a container cloud-based system resource monitoring method according to an embodiment of the application;

6 is a flowchart of restoring application operation after capacity expansion in a container cloud-based system resource monitoring method according to an embodiment of the application;

Fig. 7 is a functional block diagram of a system resource monitoring device based on a container cloud according to an embodiment of the application.

Detailed ways

In order to enable those skilled in the art to better understand the solutions of the present application, the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

Figure 1 is a flowchart of a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, a method for monitoring system resources based on a container cloud includes steps S1 to S6:

S1. After obtaining the deployment situation of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded.

Specifically, multiple container orchestration tools are generally deployed on the container cloud platform through container technology, and various services or applications are allocated corresponding system resources through functional clusters composed of these tools. Connect to the management console of the platform by obtaining the access permission of the container cloud platform, and then send a data request command to the console to obtain the deployment status. For example, in the DCOS platform, use the interface command "/ping" to obtain the service status of the Marathon framework to call the running status of Marathon. After summarizing all the acquired container arrangement framework information, a list or name list is generated according to the time when the corresponding container arrangement framework information is acquired. The list or the list is used as a positioning and sequence reference for obtaining the running status of the application after being called by subsequent steps.

S2. Acquire the running status information of each application in each container orchestration framework one by one from the frame list according to the preset acquisition cycle according to the recording order, and record the acquired running status information in a preset storage unit. The storage unit is provided with a framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording the operating status information of each application and physical machine resource occupation data. The operating status information is used for To identify the running state of the application in its container orchestration framework.

Specifically, the information of each application managed by it is obtained by running the commands of each container orchestration framework, and then the information is stored in the record node specially developed for it, so that the data can be called in subsequent steps. For example, by calling the Marathon API interface and sending a command to Marathon's management console, the requested content can be returned. For example, sending "/deployments" to the management console can obtain the current deployment of applications on the marathon orchestration framework, including the current resource occupancy and running status of each application. In addition, in the storage space for setting such recording nodes, corresponding recording nodes are also opened for the physical machine resource configuration data of the container orchestration framework. The data in these recording nodes can be permanently stored in the order of recording time for some analysis. The function unit call of the purpose, for example, in order to analyze the usage of an application on the cloud platform in a certain period, in order to deduce the development trend of the business corresponding to the application, at this time, these retained historical data are required as the basis for calculation.

S3. When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources, at this time, the alarm information is generated and then pushed to the execution of the expansion operation After that, it is convenient to notify it to perform the expansion of the container orchestration framework.

Specifically, the temporary suspension or waiting state of some applications is not necessarily caused by the shortage of resource allocation, and will automatically restart successfully after a certain period of time, but if the waiting due to insufficient resources is caused, it will continue, causing the application to fail Restart, therefore, it is necessary to set a judgment time length in advance. During this time, if the status of an application is always waiting, it can be considered that the resource allocation of the application is not enough to support the restart or normal operation of the application. The application of the corresponding container orchestration framework's own resources is insufficient, and sufficient hardware resources need to be added to it. This operation is called capacity expansion. When a certain container orchestration framework is found to have insufficient resources, the corresponding alarm information is generated and sent to the performer responsible for the expansion operation, such as a third-party maintenance company or platform operation and maintenance. The push methods include email, SMS message or voice dialing .

S4. Obtain physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding record nodes .

Specifically, through the management console of the container orchestration framework, after sending corresponding commands, the physical machine resource configuration data and the physical machine resource occupation data are obtained, and then the two types of data are stored in the corresponding record nodes.

S5. After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit Afterwards, the application is reconfigured and restarted, and the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node.

Specifically, the signal for the completion of the expansion can be obtained by setting a special feedback interface, and submitted by the executor after input. The input entry of which hardware resources have been added can be set on the feedback interface, so that the executor submits relevant information. When such relevant information is obtained, it can be recorded in the framework record node after adding physical machine resource configuration data in. In addition, after the expansion, the application on the currently expanded container orchestration framework needs to be reconfigured and restarted. The configuration basis is the most recently recorded data that has been saved in the application record node, including memory usage and CPU thread allocation. And other configuration data.

S6. Generate a capacity expansion report after summarizing the recorded data of the framework record node and the application record node.

Specifically, after the expansion is completed, in order to provide reference and data support for subsequent operations, in addition to the backup data retained in the storage unit, a job report can also be generated after the expansion situation is summarized, which records the framework record nodes before and after the expansion And application records the data in the node.

In this embodiment, by monitoring each container orchestration framework running on the container cloud platform, the running status of the applications in it is obtained to determine whether there is insufficient resources, and an early warning is issued in time, and the application is resumed after the expansion operation is completed. Operation can effectively avoid business losses caused by the single monitoring node and the inability to early warning in traditional operations.

FIG. 2 is a flowchart of generating a framework list in a container cloud-based system resource monitoring method provided by an embodiment of the application. As shown in the figure, the S1, obtaining the container orchestration framework deployment status under the container cloud platform, generates the framework list All the container orchestration frameworks deployed under the container cloud platform are recorded in the framework list, including steps S101 to S104:

S101. Connect to the management console of the container cloud platform.

S102. Send a data request for acquiring the status of the container orchestration framework running on the container cloud platform to the management console of the container cloud platform.

Specifically, after obtaining the access permission of the container cloud platform, connect to the management console, and then send a data request requesting to obtain the configuration data of the container orchestration framework deployed on the cloud platform. The data request includes information about obtaining the container orchestration framework. Command to configure data. The management authority of the cloud platform includes information such as access address, data port, user name and password.

S103. Generate the framework list after receiving feedback from the management console, and record all the container orchestration frameworks running on the container cloud platform in the framework list in the order of the feedback time.

S104. Generate a record serial number for each of the container arrangement frameworks in the frame list according to the recording time, where the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.

Specifically, after receiving the data returned by the console, each obtained container arrangement frame is serialized and sorted into a list according to the return time sequence, which is convenient for calling and distinguishing in subsequent steps.

In this embodiment, all the container orchestration frameworks running on the container cloud platform are organized into a list, which is convenient for calling in subsequent steps.

FIG. 3 is a flowchart of monitoring application status in a container cloud-based system resource monitoring method provided by an embodiment of the application. As shown in the figure, the S2, according to the recording sequence, is obtained from the frame list according to a preset period Obtaining the running status information of each application in each container arrangement framework one by one, and recording the acquired running status information in a preset storage unit, including steps S201 to S204:

S201. Generate a monitoring node for each container orchestration framework in the framework list, where the monitoring node is used to obtain the running status information of each application running on it after connecting to the management console of the container orchestration framework within a set period.

Specifically, the form of the monitoring node can be a functional script composed of commands to access relevant information of the container orchestration framework, and by setting it to request data from the container orchestration framework at a specific time or a specific period, to obtain the running status of the application on it. . By setting the monitoring node, it can correspond to the monitoring requirements of multiple container orchestration frameworks.

S202. According to the record sequence number of the container arrangement framework in the frame list, generate a corresponding application record node in the storage unit for each application on the container arrangement framework, where the application record node is used to record the The running status information of each application running on the container orchestration framework obtained by the monitoring node.

Specifically, in order to permanently store data such as application operating conditions obtained by the monitoring node, the data can be recorded in a database or an independent data file, and different recording nodes can be set for the container orchestration framework itself and the applications running on it. The class record nodes can record the acquired data in sequence according to the record time.

S203. After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework.

S204. After receiving the feedback from the management console of the container orchestration framework, record the running status information of the application in the application record node at the time when the feedback is received.

Specifically, setting a monitoring period for a monitoring node is equivalent to setting a timing task to execute a functional script with a monitoring effect. The script is configured with a management console of the container orchestration framework and records the connection authority information of the node and has read and write permissions. After receiving the feedback data, use the read and write permissions to write the running data of the application in the application record node, including the running state of the application. For example, the status of applications managed by Marathon includes "waiting", "delayed", "suspended", and "running". Among them, "waiting" means that there is a situation in which an application or service is malfunctioning or crashing, and the application or service needs to be restarted; "delay" means that the execution of the application or service is delayed due to resource exhaustion or blockage; "hanging "Indicates that there is an application or service that is temporarily interrupted and will not be executed, and "running" indicates that the current application or service is in a normal running state. If an error is reported, it means that there is a situation where the application or service is disabled. If there is such a situation, generally, Marathon will throw a status word such as "waiting" to indicate that the Marathon is currently waiting for the relevant application or service to restart.

In this embodiment, a monitoring node is set for the container orchestration framework to obtain real-time application operating conditions, and these operating data are permanently recorded for subsequent calls.

Fig. 4 is a flowchart of judging insufficient resources in a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, the S3, when the running state information of any one of the applications is in a preset judgment If it continues to be in the waiting state within the time threshold, it is marked that the container orchestration framework resource is insufficient. At this time, the alarm information is generated and pushed to the performer performing the expansion operation, including steps S301 to S305:

S301. Read the running state information of any application in the application recording node.

S302. Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if yes, mark the state of the container orchestration framework as insufficient resources, if not, mark the container orchestration framework The status is normal operation, and the judgment time threshold range is a preset period of time.

Specifically, if an application is in the waiting state within the set judgment time period, it can be considered that the application has failed and the container orchestration framework needs to be rebuilt or restarted, but when the application is always in the waiting state, it can be considered that the application It cannot be recovered. Generally, an application running on a container orchestration framework is equivalent to an independent software program running in a virtual machine. When the program is destroyed or fails, it crashes. Generally speaking, the virtual machine system will try to restart or wake it up, but For applications that are bound to business relationships, their occupied resources will correspondingly change with business changes. Generally speaking, without maintenance and optimization, their demand for resources is increasing. When such a situation occurs, it is generally necessary to reconfigure the resource of the container orchestration framework corresponding to the application, that is, to expand the hardware resources, so as to allocate more hardware resources to the container orchestration framework to make it available for allocation to the problem. The application has more resources so that it can be recreated or restarted.

S303. Traverse all applications under all container orchestration frameworks in the frame list according to the above steps, and mark the status of all container orchestration frameworks.

Specifically, according to the serial number of each container arrangement framework in the framework list, the running state data of the corresponding application is obtained from the application record node one by one, and it is judged whether there is a situation that needs to be expanded, and then the judgment result is recorded.

S304. Generate an alarm email after calling the email template, and record the record sequence number of the container arrangement framework marked as insufficient resource and prompt information identifying the insufficient resource in the alarm email.

S305: After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.

Specifically, according to the record of the above steps, when a certain container arrangement framework has insufficient resources, the pre-prepared email template is called to generate an alarm email with a specific format, which records the problem and the problem. Locate, and then send this alarm email to the processor based on the email address information. The processor is generally the executor of the expansion operation, or it can be the dispatch department, which forwards the execution department. In addition, in other embodiments, the warning effect can also be achieved by setting a voice dialing phone with specific warning content. For example, the warning text is generated according to the judgment record of the resource shortage, and the warning voice is generated according to the text-to-speech engine to connect After the executor plays the warning voice. In some embodiments, the real-time push of early warning information can also be carried out in the form of binding the application to the executor's mobile terminal APP.

In this embodiment, it is determined by judging the running state of the application whether the container orchestration framework is insufficient in resources, and combined with the early warning mechanism to realize the timely transmission of early warning information, which provides assistance for timely meeting the expansion requirements.

FIG. 5 is a flowchart of performing data backup before expansion in a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, the S4 is to obtain physical machine resources of a container orchestration framework marked as insufficient resources. For configuration data and physical machine resource occupation data occupied by any application running in the container orchestration framework, recording two types of data in the corresponding recording node includes steps S401 to S403:

S401: Connect to the management console of the container orchestration framework marked as insufficient resources.

S402. Send a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework.

Specifically, to prepare for capacity expansion, it is necessary to record the pre-expansion operating conditions of the container orchestration framework where resource shortages have occurred in advance, so as to reserve for subsequent restoration after expansion. To this end, data collection and recording of the configuration of each application is required. Through the management console connected to the container orchestration framework, the corresponding data is obtained after sending a data request for obtaining the application state. The command for obtaining data is formulated according to the characteristics of each container arrangement framework. It can be obtained according to the ID of the application, for example, through the "/v2/apps/{id}" command to obtain the deployment status of the application with the corresponding id in the marathon framework, or it can be obtained directly after obtaining the application list, for example, through "/v2/groups /{id}" Get the status of the application group identified by id.

S403. After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.

Specifically, after receiving the return from the console, connect the framework record node and the application record node, and save the returned two types of data records.

In this embodiment, by recording the configuration of the application and the physical machine resource condition of the container orchestration framework before capacity expansion, it is convenient to generate a complete historical data record and also provides a data recovery basis for application restoration after capacity expansion.

6 is a flowchart of restoring application operation after capacity expansion in a method for monitoring system resources based on a container cloud provided by an embodiment of the application. As shown in the figure, after S5, after receiving the completion signal of the expansion operation fed back by the executor, Retrieve from the storage unit the physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources, and then reconfigure and restart the application, and obtain the marked resource The current physical machine resource configuration data of the insufficient container orchestration framework is then recorded in the framework record node, including steps S501 to S504:

S501. Receive feedback information from the executor that includes a signal that the expansion operation ends.

S502. After connecting to the storage unit, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resource and the closest to the current time.

S503: Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed.

Specifically, after the expansion is completed, each application is restarted by obtaining the data backed up before the expansion. Among them, the expansion end signal can be provided by the executor according to a preset input interface after the expansion operation is completed. Among them, when extracting data from the recording node, it is necessary to determine the time of the latest recording from the current extraction time, and the record extracted according to the time is the backup data before the expansion before the current job.

S504. Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.

Specifically, after the expanded application is restored, the current data of the physical machine resource configuration data is obtained through the management console connected to the container orchestration framework, and the data is recorded in the framework record node to generate a new data record. What is recorded before recording is the allocation of hardware resources before new hardware resources are allocated to the container orchestration framework before capacity expansion. According to this physical machine resource configuration data in the frame record node, the relationship between business development and hardware growth trends can be analyzed with the recovery of the application.

In this embodiment, by calling the backup data of the application configuration before the expansion, the application operation can be quickly restored after the expansion is completed. At the same time, by recording the hardware change data of the container orchestration framework before and after the expansion, it can provide data for the functional department of business analysis Analysis basis.

In some of the embodiments, after obtaining the deployment status of the container orchestration framework under the container cloud platform and generating the framework list, the method includes:

According to the record sequence in the frame list, the management console of each container orchestration frame is connected one by one. For the successfully connected container layout framework, a new record sequence number is generated after adding a success mark after the record sequence number. For the container layout framework that fails to connect, a new record sequence number is generated after adding a failure mark after the record sequence number.

Specifically, in order to filter the record order of the container arrangement frames in the frame list and improve the accuracy of connection access, each container arrangement frame can be connected and confirmed one by one according to the list order, and the corresponding mark is generated according to the connection situation, and the mark After the record sequence number attached to the container arrangement framework, a new record sequence number is generated, so that the subsequent steps can directly identify the current connection state from the record sequence number, and the current connection process can be skipped. At the same time, by adding tags, a complete list of frames can be retained. After performing the current expansion judgment on all the container orchestration frameworks that are successfully connected, go back and perform the reconnection of the container orchestration frameworks that have failed to connect. In the process of generating a new record serial number, the judgment is distorted due to the expansion of some container layout frameworks.

In some of the embodiments, the running status information of each application in each container orchestration framework is obtained one by one from the frame list according to a preset acquisition cycle according to the recording order, and the acquired running status information is recorded in a preset Before the storage unit includes:

To identify the record sequence number of the container orchestration framework, when the record sequence number contains a success mark, perform an operation of reading the running status information of the application in the container orchestration framework, when the record sequence number contains a failure mark At this time, the operation of reading the running state information of the application in the container orchestration framework is not performed.

Specifically, by identifying the new record serial number, some problematic container arrangement frames can be effectively avoided, the accuracy of connection positioning can be improved, and the efficiency of capacity expansion judgment operations can be improved.

In some of the embodiments, this application provides a system resource monitoring device based on a container cloud, as shown in FIG. 7, including a list generation module, an application status acquisition module, an alarm information push module, a data recording module, and an application restart module , Capacity expansion report generation module, including:

The list generating module 11 is configured to generate a frame list after obtaining the deployment status of the container orchestration framework under the container cloud platform;

The application status acquisition module 12 is configured to acquire the operating status information of each application in each container orchestration framework one by one from the frame list according to a preset acquisition cycle according to the recording order, and record the acquired operating status information in a preset Storage unit

The alarm information push module 13 is configured to, when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, mark that the container orchestration framework resource is insufficient, generate alarm information and push it to perform the expansion operation Executor

The data recording module 14 is configured to obtain physical machine resource configuration data of a container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in In the corresponding record node;

The application restart module 15 is configured to, after receiving the completion signal of the expansion operation fed back by the executor, retrieve from the storage unit the previously recorded resources occupied by each application in the container orchestration framework marked as insufficient After the physical machine resource occupies the data, the application is reconfigured and restarted, and the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node;

The capacity expansion report generating module 16 is configured to summarize the record data of the framework record node and the application record node to generate a capacity expansion report.

In some of the embodiments, the present application proposes a computer device including a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the above-mentioned The steps of the system resource monitoring method of the container cloud.

In some of the embodiments, this application proposes a computer-readable storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed by one or more processors, the above-mentioned container cloud-based In the steps of the method for monitoring system resources, the storage medium may be a non-volatile storage medium or a volatile storage medium.

The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they It should be considered as the scope of this specification.

Claims

A method for monitoring system resources based on a container cloud includes:

After obtaining the deployment status of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded;

According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided. The running status of the application in its container orchestration framework;

When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;

Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;

After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;

A capacity expansion report is generated after summarizing the record data of the framework record node and the application record node.
The method for monitoring system resources based on a container cloud according to claim 1, wherein said obtaining the deployment status of the container orchestration framework under the container cloud platform and generating the framework list comprises:

Connect to the management console of the container cloud platform;

Sending to the management console of the container cloud platform a data request for obtaining the status of the container orchestration framework running on the container cloud platform;

After receiving the feedback from the management console, generate the framework list, in which all the container orchestration frameworks running on the container cloud platform are recorded in the time sequence of the feedback;

A record serial number is generated for each container arrangement framework in the frame list according to the recording time, and the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.
The method for monitoring system resources based on the container cloud according to claim 2, wherein the running status information of each application in each container orchestration framework is obtained one by one from the frame list according to the sequence of records according to a preset acquisition period, and The acquired operating status information is recorded in a preset storage unit, including:

A monitoring node is generated for each container orchestration framework in the framework list, and the monitoring node is used to obtain operating status information of each application running on it after connecting to the management console of the container orchestration framework within a set period;

According to the record sequence number of the container orchestration framework in the framework list, a corresponding application record node is generated in the storage unit for each application on the container orchestration framework, and the application record node is used to record the monitoring node The acquired operating status information of each application running on the container orchestration framework;

After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework;

After receiving the feedback from the management console of the container orchestration framework, the running status information of the application is recorded in the application recording node at the time when the feedback is received.
The method for monitoring system resources based on a container cloud according to claim 1 or 3, wherein when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, the container arrangement is marked The framework resources are insufficient. At this time, after the alarm information is generated and pushed to the performer who performs the expansion operation, it is convenient to notify them to perform the expansion operation of the container orchestration framework, including:

Read the running state information of any application in the application recording node;

Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if so, mark the state of the container orchestration framework as insufficient resources, if not, mark the state of the container orchestration framework For normal operation, the judgment time threshold range is a preset period of time;

Follow the above steps to traverse all applications under all container orchestration frameworks in the frame list, and mark the status of all container orchestration frameworks;

After calling the mail template, an alarm email is generated, and the record serial number of the container arrangement framework marked as insufficient resources and the prompt information identifying the insufficient resources are recorded in the alarm email;

After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.
The method for monitoring system resources based on the container cloud according to claim 1, wherein the acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and the physical resources occupied by any application running in the container orchestration framework For machine resource occupancy data, two types of data are recorded in the corresponding record node, including:

Connecting to the management console of the container orchestration framework marked as insufficient resources;

Sending a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework;

After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.
The method for monitoring system resources based on a container cloud according to claim 1, wherein after receiving the completion signal of the expansion operation fed back by the executor, the previously recorded data marked as insufficient resources are retrieved from the storage unit After the physical machine resources occupied by each application in the container orchestration framework occupy data, the application is reconfigured and restarted, and the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is acquired and recorded in the framework The record node includes:

Receiving feedback information from the executor that includes a signal that the expansion operation ends;

After the storage unit is connected, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resources and the closest to the current time;

Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed;

Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.
The method for monitoring system resources based on the container cloud according to claim 2, after obtaining the deployment status of the container orchestration framework under the container cloud platform and generating the framework list, the method comprises:

Connect the management console of each container orchestration framework one by one according to the record sequence in the framework list;

For the successfully connected container layout framework, a new record sequence number is generated after adding a success mark after the record sequence number;

For the container layout framework that fails to connect, a new record sequence number is generated after adding a failure mark after the record sequence number;

According to the recording sequence, obtaining the running status information of each application in each container arrangement framework one by one according to a preset acquisition cycle from the frame list, and recording the acquired running status information in a preset storage unit includes To identify the record sequence number of the container orchestration framework, when the record sequence number contains a success mark, perform an operation of reading the running status information of the application in the container orchestration framework, when the record sequence number contains a failure mark At this time, the operation of reading the running state information of the application in the container orchestration framework is not performed.
A system resource monitoring device based on container cloud includes:

The list generation module is set to generate a frame list after obtaining the deployment status of the container orchestration framework under the container cloud platform;

The application status acquisition module is configured to acquire the operating status information of each application in each container orchestration framework one by one from the frame list according to a preset acquisition cycle according to the recording order, and record the acquired operating status information in a preset storage Within the unit

The alarm information push module is set to when the running status information of any one of the applications continues to be in a waiting state within a preset judgment time threshold, to mark that the container orchestration framework resources are insufficient, and to generate alarm information and push it to the expansion operation. Executor;

The data recording module is configured to obtain physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupation data occupied by any application running in the container orchestration framework, and record the two types of data in the corresponding In the record node;

The application restart module is configured to, after receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical space occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit After the machine resource occupies the data, the application is reconfigured and restarted, the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources is obtained and then recorded in the framework record node;

The capacity expansion report generation module is configured to generate a capacity expansion report after summarizing the record data of the framework record node and the application record node.
A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the following steps are implemented:

After obtaining the deployment status of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded;

According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided. The running status of the application in its container orchestration framework;

When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;

Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;

After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;

A capacity expansion report is generated after summarizing the record data of the framework record node and the application record node.
The computer device according to claim 9, when the computer-readable instructions are executed by one or more of the processors, the one or more of the processors realize the deployment of the container orchestration framework under the acquisition container cloud platform When generating the frame list step after the situation, perform the following steps:

Connect to the management console of the container cloud platform;

Sending to the management console of the container cloud platform a data request for obtaining the status of the container orchestration framework running on the container cloud platform;

After receiving the feedback from the management console, generate the framework list, in which all the container orchestration frameworks running on the container cloud platform are recorded in the time sequence of the feedback;

A record serial number is generated for each container arrangement framework in the frame list according to the recording time, and the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.
The computer device according to claim 10, wherein the computer-readable instructions are implemented by one or more of the processors to obtain each container arrangement frame one by one from the frame list according to a preset acquisition cycle according to the recording order When recording the running status information of each application in the running status information in the preset storage unit, perform the following steps:

A monitoring node is generated for each container orchestration framework in the framework list, and the monitoring node is used to obtain running status information of each application running on it after connecting to the management console of the container orchestration framework within a set period;

According to the record sequence number of the container orchestration framework in the framework list, a corresponding application record node is generated in the storage unit for each application on the container orchestration framework, and the application record node is used to record the monitoring node The acquired operating status information of each application running on the container orchestration framework;

After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework;

After receiving the feedback from the management console of the container orchestration framework, the running status information of the application is recorded in the application recording node at the time when the feedback is received.
The computer device of claim 11, wherein the computer-readable instructions are implemented by one or more of the processors when the running status information of any one of the applications continues to be waiting within a preset judgment time threshold range Status, the container orchestration framework is marked as insufficient resources. At this time, after the alarm information is generated and pushed to the performer performing the expansion operation, it is convenient to notify them to perform the expansion operation of the container orchestration framework, and perform the following steps:

Read the running state information of any application in the application recording node;

Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if so, mark the state of the container orchestration framework as insufficient resources, if not, mark the state of the container orchestration framework For normal operation, the judgment time threshold range is a preset period of time;

Follow the above steps to traverse all applications under all container orchestration frameworks in the frame list, and mark the status of all container orchestration frameworks;

After calling the mail template, an alarm email is generated, and the record serial number of the container arrangement framework marked as insufficient resources and the prompt information identifying the insufficient resources are recorded in the alarm email;

After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.
The computer device according to claim 9, wherein the computer-readable instructions are implemented by one or more of the processors to obtain physical machine resource configuration data of a container orchestration framework marked as insufficient resources and run on the container orchestration For the physical machine resource occupancy data occupied by any application in the framework, when the two types of data are recorded in the corresponding record node, the following steps are performed:

Connecting to the management console of the container orchestration framework marked as insufficient resources;

Sending a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework;

After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.
The computer device according to claim 9, after the computer-readable instructions are implemented by one or more of the processors, after receiving the completion signal of the expansion operation fed back by the executor, they are retrieved from the storage unit After the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources, the application is reconfigured and restarted, and the current physical of the container orchestration framework marked as insufficient resources is obtained. When the machine resource configuration data is recorded in the frame record node, the following steps are performed:

Receiving feedback information from the executor that includes a signal that the expansion operation ends;

After the storage unit is connected, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resources and the closest to the current time;

Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed;

Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.
A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by one or more processors, the following steps are implemented:

After obtaining the deployment status of the container orchestration framework under the container cloud platform, generate a framework list, in which all the container orchestration frameworks deployed under the container cloud platform are recorded;

According to the recording sequence, the running status information of each application in each container arrangement framework is acquired one by one according to a preset acquisition cycle from the frame list, and the acquired running status information is recorded in a preset storage unit, the storage unit A framework recording node for recording physical machine resource configuration data of each container orchestration framework and an application recording node for recording operation status information of each application and physical machine resource occupancy data are provided. The running status of the application in its container orchestration framework;

When the running status information of any one of the applications continues to be in the waiting state within the preset judgment time threshold, the container orchestration framework is marked as insufficient resources. At this time, the alarm information is generated and then pushed to the performer performing the expansion operation , It is convenient to notify it to perform the expansion of the container orchestration framework;

Acquiring physical machine resource configuration data of the container orchestration framework marked as insufficient resources and physical machine resource occupancy data occupied by any application running in the container orchestration framework, and record the two types of data in corresponding recording nodes;

After receiving the completion signal of the expansion operation fed back by the executor, retrieve the previously recorded physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources from the storage unit. Reconfiguration and restart of the application, obtaining the current physical machine resource configuration data of the container orchestration framework marked as insufficient resources and recording it in the framework record node;

After summarizing the record data of the framework record node and the application record node, a capacity expansion report is generated.
The computer-readable storage medium according to claim 15, when the computer-readable instructions are executed by one or more of the processors, the one or more of the processors realize the acquisition of the container under the container cloud platform When the steps of generating the framework list after orchestrating the framework deployment situation, perform the following steps:

Connect to the management console of the container cloud platform;

Sending to the management console of the container cloud platform a data request for obtaining the status of the container orchestration framework running on the container cloud platform;

After receiving the feedback from the management console, generate the framework list, in which all the container orchestration frameworks running on the container cloud platform are recorded in the time sequence of the feedback;

A record serial number is generated for each container arrangement framework in the frame list according to the recording time, and the record serial number is an identification serial number of the container arrangement framework in the container cloud platform, and is used to distinguish different container arrangement frameworks.
16. The computer-readable storage medium according to claim 16, wherein the computer-readable instructions are implemented by one or more of the processors, and each one is acquired one by one from the frame list according to a preset acquisition cycle according to the recording order. When the running status information of each application in the container orchestration framework is recorded in the preset storage unit, the following steps are performed:

A monitoring node is generated for each container orchestration framework in the framework list, and the monitoring node is used to obtain running status information of each application running on it after connecting to the management console of the container orchestration framework within a set period;

According to the record sequence number of the container orchestration framework in the framework list, a corresponding application record node is generated in the storage unit for each application on the container orchestration framework, and the application record node is used to record the monitoring node The acquired operating status information of each application running on the container orchestration framework;

After connecting to the management console of the container orchestration framework through the monitoring node according to the set monitoring period, request to obtain the running status information of all applications running on the container orchestration framework;

After receiving the feedback from the management console of the container orchestration framework, the running status information of the application is recorded in the application recording node at the time when the feedback is received.
The computer-readable storage medium according to claim 17, wherein the computer-readable instructions are implemented by one or more of the processors when the running state information of any one of the applications is within a preset judgment time threshold range If it continues to be in the waiting state, the container orchestration framework is marked as insufficient resources. At this time, after the alarm information is generated and pushed to the performer performing the expansion operation, it is convenient to notify them to perform the expansion operation of the container orchestration framework, and perform the following steps:

Read the running state information of any application in the application recording node;

Determine whether the running status information of the application within the judgment time threshold continues to be in a waiting state, if so, mark the state of the container orchestration framework as insufficient resources, if not, mark the state of the container orchestration framework For normal operation, the judgment time threshold range is a preset period of time;

Traverse all applications under all container orchestration frameworks in the frame list according to the above steps, and mark the status of all container orchestration frameworks;

After calling the mail template, an alarm email is generated, and the record serial number of the container arrangement framework marked as insufficient resources and the prompt information identifying the insufficient resources are recorded in the alarm email;

After reading the email address of the executor from the preset recipient address list, push the alarm email to the executor.
The computer-readable storage medium of claim 15, wherein the computer-readable instructions are implemented by one or more of the processors to obtain the physical machine resource configuration data of the container orchestration framework marked as insufficient resources and run on all For the physical machine resource occupancy data occupied by any application in the container orchestration framework, when two types of data are recorded in the corresponding record node, the following steps are performed:

Connecting to the management console of the container orchestration framework marked as insufficient resources;

Sending a data request to the management console for obtaining physical machine resource configuration data of the container orchestration framework and physical machine resource occupation data occupied by any application running in the container orchestration framework;

After receiving the feedback from the management console, according to the time when the feedback is received, the physical machine resource configuration data is recorded in the framework recording node, and the physical machine resource occupation data is recorded in the application recording node.
15. The computer-readable storage medium according to claim 15, wherein the computer-readable instructions are implemented by one or more of the processors after receiving the completion signal of the expansion operation fed back by the executor, and then read from the storage unit After recalling the physical machine resource occupancy data occupied by each application in the container orchestration framework marked as insufficient resources, the application is reconfigured and restarted to obtain the information of the container orchestration framework marked as insufficient resources. When the current physical machine resource configuration data is recorded in the framework record node, the following steps are performed:

Receiving feedback information from the executor that includes a signal that the expansion operation ends;

After the storage unit is connected, read from the application record node the record of the physical machine resource occupation data occupied by each application in the container orchestration framework marked as insufficient resource to the current time;

Configure a corresponding application according to the physical machine resource occupation data, and restart the application after the configuration is completed;

Connect to the management console of the container orchestration framework, acquire the current physical machine resource configuration data of the container orchestration framework, and record the acquired data in the framework record node according to the acquisition time.