WO2023071576A1

WO2023071576A1 - Container cluster construction method and system

Info

Publication number: WO2023071576A1
Application number: PCT/CN2022/118653
Authority: WO
Inventors: 杨巍巍
Original assignee: 中移(苏州)软件技术有限公司; 中国移动通信集团有限公司
Priority date: 2021-10-28
Filing date: 2022-09-14
Publication date: 2023-05-04
Also published as: CN116048825A

Abstract

Disclosed in the present invention are a container cluster construction method and system. The system comprises: an order system, which is configured to receive a container cluster construction request sent by a user by means of calling a cluster service logic interface, call an order generation process to generate a container cluster construction message on the basis of the container cluster construction request, and publish the container cluster construction message to a container message queue; a cluster service monitoring process, which is configured to acquire the container cluster construction message from the container message queue if it is detected that there is an unconsumed container cluster construction message in the container message queue, start working instances on the basis of the container cluster construction message, and distribute the working instances to corresponding working nodes; and the working nodes, which are configured to construct container clusters on the basis of the working instances. By means of the solution of the present invention, service and order systems are separated, such that the pressure of large-scale requests can be effectively relieved, and the stable execution of the requests is guaranteed, thereby guaranteeing the success rate of the deployment of container clusters in a large-scale scenario.

Description

Container cluster construction method and system

Cross References to Related Applications

The present invention is based on a Chinese patent application with the application number 202111264499.3 and the application date of October 28, 2021. The applicants are: China Mobile (Suzhou) Software Technology Co., Ltd. and China Mobile Communications Group Co., Ltd., and the application name is "container cluster construction" method and system” and claim the priority of the Chinese patent application, the entire content of the Chinese patent application is hereby incorporated by reference in the present invention.

technical field

The present invention relates to the technical field of container clusters, and relates to but not limited to a container cluster construction method and system.

Background technique

Cloud-native technology is inseparable from container clusters, but under the pressure of large-scale users, the deployment of container clusters will appear very slow. Analysis of the reasons is mainly due to the insufficient processing capacity of the user-oriented container business and the frequent processing business of the order system Errors, there will be a lot of lag and retries.

Contents of the invention

In view of the above problems, embodiments of the present invention are proposed to provide a container cluster construction system and a corresponding container cluster construction method that overcome the above problems or at least partially solve the above problems.

An embodiment of the present invention provides a container cluster construction system, including: an order system, a container message queue, multiple cluster service monitoring processes, and multiple working nodes;

The order system is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue;

The cluster business monitoring process is configured so that if there is an unconsumed container cluster construction message in the container message queue, it will obtain the container cluster construction message from the container message queue, start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding work node;

Worker nodes, configured to build container clusters from worker instances.

An embodiment of the present invention provides a container cluster construction method, including:

The order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue;

If the cluster business monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it will obtain the container cluster construction message from the container message queue, start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding working node;

Worker nodes build container clusters based on work instances.

An embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus;

The memory is configured to store at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the above container cluster construction method.

An embodiment of the present invention provides a computer storage medium, at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the container cluster construction method described above.

According to the solutions provided by the above embodiments of the present invention, the separation of business and order systems can effectively relieve the pressure of large-scale requests, and ensure the stable execution of requests and the success rate of container cluster deployment in large-scale scenarios; the introduction of message caching , which can alleviate the pressure when large-scale requests arrive, and ensure fast processing of requests by scheduling the consumption logic of multiple instances in the cluster.

The above description is only an overview of the technical solutions of the embodiments of the present invention. In order to better understand the technical means of the embodiments of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and The advantages can be more obvious and understandable, and the specific implementation manners of the embodiments of the present invention are enumerated below.

Description of drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating the preferred embodiments and are not considered as limiting the embodiments of the present invention. Also throughout the drawings, the same reference numerals are used to designate the same parts. In the attached picture:

FIG. 1 shows a schematic diagram of a container cluster construction system provided by an embodiment of the present invention;

FIG. 2 shows a flow chart of a container cluster construction method provided by an embodiment of the present invention;

Fig. 3 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed ways

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

The inventors of the present invention found that when constructing a container cluster, related technologies need to be deployed on cloud hosts, relying on resources such as cloud hard disks. When large-scale user services arrive, in order to improve user satisfaction, mainstream manufacturers will first create orders, then start the deployment process of container services, and complete cluster construction step by step. In order to reduce the number of invocations or pressure of an order, it is usually necessary to complete the creation of the resources it depends on in a customized way, and these resources are managed separately by the container cluster, that is, no orders are created for these resources. Therefore, at this time, users cannot operate the orders of these hosts, and the business of some manufacturers will be affected by the pressure of orders, errors will occur, time-consuming retries, or be affected by the time of serial processing logic.

In addition, when a large-scale business arrives, in addition to the internal management business of the cluster, such as viewing and other operations, the existing cluster deployment process, from building an order to completing business construction, is completed in one go. Therefore, the process of processing business takes a long time, takes up more file handles, and even consumes more memory and central processing unit (Central Processing Unit, CPU) resources, causing multiple processes to interact with each other, and the final result is one of them. Errors will cause the resources created first to be rolled back, which will greatly waste resource overhead and reduce the success rate of cluster deployment.

In addition, in the existing container cluster deployment scheme, the order and the operation process of the container cluster are basically synchronized, that is, the resource is created, and the order is confirmed to take effect. For manufacturers who can guarantee the success rate of resource activation, the efficiency is very high, but it is usually difficult to achieve such a high interface success rate of each resource under high pressure. It can be seen that in the face of multi-user pressure, the rational call of the interface is particularly important. High-frequency order validation calls will affect the overall deployment process of the cluster, especially in the case of errors.

Furthermore, the existing container cluster construction scheme follows a certain deployment process, and usually adopts the order and business control sequence execution process. Therefore, when a large-scale user request arrives, it is affected by the CPU and memory resource preemption of the back-end business server. The cluster construction of the upper layer will appear to be very slow.

In the public cloud scenario, the container cluster resources built by the user are in principle owned by the user, and the order should not be customized because of the construction of the container cluster, that is, a set of container cluster orders manages the charges for all resources, thereby affecting the user's interest in Order operations for some resources. In addition, the settings only allow users to operate resources on the container interface through addition, deletion, modification, and query of the cluster, which lacks the granularity of free resource control.

In the current container cluster construction scheme, the order system usually does not target a system service, so it has a theoretical threshold, that is, if the threshold is exceeded, the request cannot be responded and a large number of errors will be triggered. Therefore, in the scenario where large-scale users build clusters, the pressure on the order system will affect the success rate of cluster deployment.

In addition, the business interface of the container backend is "equal" to the system, and the way they seize resources is affected by the CPU and other resources of the server where they are located. Therefore, when large-scale user requests arrive, the successful execution of the cluster construction process cannot be guaranteed. That is, the process may cause large-scale container construction failure due to the upper limit of the server's processing capacity, which in turn affects the reputation of the product or system.

Therefore, the container cluster construction scheme of the embodiment of the present invention is proposed, which meets the processing requirements of large-scale business arrivals, and can achieve the following goals: 1. Design a reasonable order and business separation scheme to ensure that container clusters Deployment success rate. 2. Design instances of back-end business elastic scheduling functions to different working nodes to improve the success rate of container cluster construction. 3. Introduce message caching to alleviate the pressure when large-scale requests arrive, and ensure fast processing of requests by scheduling the consumption logic of multiple instances in the cluster. 4. Design a reasonable order operation process, which can flexibly build orders for each resource of the user, and the effective logic of the order will take effect uniformly after the container is built, reducing the number of interactions with the order system.

FIG. 1 shows a schematic diagram of a system for building a container cluster provided by an embodiment of the present invention. As shown in Figure 1, the system includes: an order system 101, a container message queue 102, multiple cluster service monitoring processes 103, and multiple working nodes 104 (Figure 1 is only a schematic illustration).

The order system 101 is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue.

Exemplarily, this embodiment provides the user with a cluster business logic interface (Cluster Bll). When the user wants to build a container cluster, he can call the cluster business logic interface to send a container order construction request to the order system. Once the request is successful, the upper layer business The process ends, and the status of the container cluster construction is returned to the user. In some embodiments, the order system is further configured to: return a status message to the user by invoking the cluster business logic interface, wherein the status message is used to identify that the container cluster is in a state of construction. Because this request only involves order construction, and there is no resource preemption of the back-end business, there is usually no problem. If sporadic errors occur due to the pressure of the order system, at the layer of the cluster business logic interface (Cluster Bll), distribute and retry according to the strategy preferred by the existing nodes until all batches of orders are successfully constructed.

The order system can call the order generation process. The order generation process is multi-instance in the cluster. It can quickly process the orders created by users and publish them to the container message queue as a message carrier. That is, the order generation process is responsible for the container cluster. The generation of construction messages and the process of message generation depend on the processing capacity of the order generation process. Here, the order generation process is set to multi-instance to improve the order processing capacity.

Exemplarily, the order system calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue. Container construction in a large-scale scenario generates a large number of container cluster construction messages.

The cluster service monitoring process 103 is configured to receive the container cluster construction request sent by the user by calling the cluster service logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue.

Exemplarily, the cluster service monitoring process, as a consumer of the back-end business logic, listens to the container cluster construction message sent by the order system, and only then starts the construction process of the back-end container cluster.

Exemplarily, the cluster service monitoring process monitors in real time whether there is an unconsumed container cluster construction message in the container message queue, and if it detects that there is an unconsumed container cluster construction message in the container message queue, it immediately removes the container from the container message queue The cluster construction message starts the construction process of the container cluster. Exemplarily, the cluster service monitoring process can start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding working node.

Among them, the cluster service monitoring process can start a new working instance (Pod instance) to process the construction request of the container cluster. On the other hand, Pod instances can be preferentially distributed to different working nodes according to the preferred scheduling strategy of the node. The working nodes are specifically used as different servers, and there is no impact on resource competition between each other. Even if they are scheduled to the same working node, reserved processing can be set. resources to improve the processing performance of Pod instances. In addition, the process of Pod instances in Kubernetes is lightweight, and its startup is very fast. Therefore, the cluster business monitoring process can achieve the fastest batch processing speed by starting different Pod instances, and process all the messages on the queue. This realizes the separation of order and business processing, and achieves efficient cluster construction.

It should be noted that in the construction process of the container cluster, other resources may be relied on, which can be completed through the scheduling interface of other resources. The construction of other resources is also based on the principle of separation of orders and business construction, which ensures the stability and efficiency of other resource construction. Therefore, after the dependent resource construction is completed, the cluster business monitoring process constructed by the container will monitor the process. All dependent orders of all resources, and the effective time when a batch of orders is completed to the order system. This will greatly reduce the number of interactions between other resource orders and the order system, and at the same time, each resource will be generated for the user as an item of the order. Users can flexibly operate all the resources involved in the container cluster without being affected by the container cluster order business itself. However, when the construction of the resources that the container cluster relies on fails occasionally, a retry mechanism is added to the pod business process to ensure the success rate of resource provisioning, thereby ensuring the success rate of cluster construction.

Flexible scheduling of working instances to different server nodes to improve the success rate of container cluster construction means that in the management cluster, in addition to the nodes where the components are located, multiple working nodes are also configured in order to respond to large-scale user requests. It can be scheduled by different pod instances to make full use of working node resources, and after the work instance is completed, it will be automatically destroyed to achieve resource recovery. In other words, the cluster service monitoring process is also configured to: destroy the working instance after the container cluster is built.

The working node 104 is configured to build a container cluster according to the working instance.

Work nodes build container clusters based on work instances, and call back to the order system to make all related orders take effect, reducing frequent interaction with order business and greatly improving the success rate and performance of container cluster construction.

As shown in Figure 1, the container cluster construction system of the present embodiment includes a three-layer architecture, wherein the uppermost layer is a cluster business logic interface (Cluster Bll), which is used to open the business logic interface to users; the middle layer includes a container message queue, an order system, order generation process), cluster business monitoring process, etc., are used to separate business and order systems, which can effectively relieve the pressure of large-scale container construction requests and ensure the stable execution of container construction requests; the lowest layer is where the server instance is located. Nodes are used to distribute working instances (pods) that process business according to their functions. This layer is an internal server, which is not open to the outside world. Scheduling policies can be set to schedule business processing processes according to the granularity of pods. Scale container build requests are no longer affected by single server processing when they come in.

The content data structure of the container cluster construction request is (requestId, orderType, userId, poolId, orderId, productType, openParams).

Among them, requestId, the identification number (identity document, id) representing the request, is used to represent the request;

orderType, expressing the type of operation requested, such as creation;

userId, indicating the user requesting the resource;

poolId, indicating the resource pool where the requested resource is located;

orderId, which indicates the requested order information and is used to bind resources. After the container cluster is built, it will call back the status uniformly;

productType, indicating the requested resource type, such as a container cluster;

openParams, indicates the parameter information of the request.

In this way, after receiving the request, you can clarify its relevant information, which can include important information such as source, request type, order information, and operating parameters. The container cluster construction message carries all the data in the container cluster construction request. Build message parsing and dispatch to different service nodes.

That is to say, the container cluster construction message includes at least one of the following: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, and container cluster parameter information. Wherein, the container cluster parameter information includes: a first container cluster scale value and a cluster type.

In an optional implementation of the present invention, the cluster service monitoring process is further configured to: determine the instance function according to the scale value of the first container cluster, start the working instance with the corresponding instance function, and distribute the working instance with the corresponding instance function to the corresponding the working node;

The working node is further configured to: build a container cluster according to the working instance with the corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.

Wherein, the instance function may include one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.

Exemplarily, based on the user's container construction request I=(requestId, orderType, userId, poolId, orderId, productType, openParams), the request information Q corresponding to the user is calculated.

Q=(b*baseM*c*q*a, b*baseC*c*q*a), where b is the basic operating coefficient factor, base is the basic operating coefficient, baseM is the basic operating coefficient corresponding to the memory, and baseC is The basic operating coefficient corresponding to the CPU, c is the container type factor, q is the container scale factor, and a is the container operation method factor.

Then determine the instance function of the working instance, and schedule the working instance based on the value of Q. Can include the following situations:

(1) When the cluster size of q<10;

Design according to the normal operating coefficient (1, 1) (here is the empirical value, which is (memory calculation value, CPU calculation value)), and at the same time, based on the original clusterType request cluster type, combined with the container scale q, that is, A=a *clusterType+b*q, when A is greater than or equal to X (X is the preset threshold), the work instance provides the creation function, capacity expansion function, and capacity reduction function; if A is less than X, the work instance with the creation function is provided , after the construction of the container cluster is completed, when the user has other needs, the service is provided directly in the process, and there is no need to create an additional instance to execute it. The advantage of this is that for small-scale clusters, it can provide expansion and shrinkage functions. Due to the small scale, there is no need to create an instance to execute it, but the execution speed is fast.

When the orderType is created, the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. According to different types of clusters, they are scheduled to different types of servers, and configured according to different types of cluster factors, so as to obtain different resource allocation specifications. Calculate the value of this resource according to the algorithm Q formula and various factors, and then perform scheduling and schedule to different types of servers. If it is an ordinary small-scale cluster, it only needs to be scheduled to an ordinary server. If it is a GPU cluster, it needs to be scheduled to a GPU host to provide faster response performance; if it is an elastic bare metal cluster, it needs to be scheduled to an elastic bare metal cluster. on metal machines. (Here is a brief description of the scheduling scheme. The focus of the present invention is to distinguish different types of clusters and scales. In large-scale scenarios, scheduling strategies are used to provide better performance).

(2) For the cluster after q=10-50 enlargement, the cpu and memory resources required within the controllable range are increased in proportion, and the operation coefficient is set to (2.5, 2.5); at the same time, according to the cluster type and cluster size q, That is, B=clusterType*q, when B is less than X, the work instance provides creation, expansion, and shrinkage functions; if B is greater than or equal to X, the work instance provides creation, expansion, shrinkage, and viewing functions, When the cluster reaches the set threshold standard, the efficiency and stability of the viewing function is improved through the pod work instance.

When the orderType is created, the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. According to different cluster types, they are scheduled to different types of servers and configured according to different types of cluster factors to obtain different resource allocation specifications.

(3) When q=50-100 medium and large scale clusters, the set CPU resource requirements increase, the memory overhead is almost the same as that of 10-50, and the operating coefficient is set to (2.5, 5). At this time, the working instance provides the creation function , expansion function, shrink function, freeze function, restore function.

(4) For large-scale clusters between q=100-1000, the frequency of internal interactions within the cluster increases exponentially, so the demand for resources is therefore adjusted to (10, 20) based on experience, and the cluster type is only general cloud hosts, others In this type of container cluster, the resource overhead is too large at this scale. At this time, the working instance provides the creation function, viewing function, freezing function, and recovery function.

When the orderType is created, the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. Schedule them on high-performance servers and configure them according to different types of cluster factors to obtain different resource allocation specifications.

(5) The ultra-large cluster scale with q=>1000 or more is actually rare in terms of business for the construction of large-scale clusters. Therefore, the impact of clusters is not only affected by the current deployment process, but also the traffic within the cluster is affected by bandwidth, The impact of delay and storage performance leads to an increase in the control process, so this parameter is set to the upper line value of the container scale factor, which can be adjusted to 100/100 according to experience, and the cluster type is set, only for general cloud hosts , the working example provides only the creation function, viewing function, unsubscribing function, and shrinking function. Due to the large size of the cluster, the processing of small instances usually affects the stability of the cluster construction, and the cluster size has reached, and the operation of freezing and restoring such unstable clusters is not supported.

Create matching containers and corresponding container functions based on user request information data, which can improve the adaptability and practicability of containers, and reduce the problems of slow container creation and low success rate caused by too many container functions.

The container cluster construction system in this embodiment can be applied to large-scale scenarios. According to different user requests, the processing logic is sent to multiple different servers, and the logic is processed in the form of pod instances. The processed processes are automatically destroyed. pod instance, resource recycling is completed by the container cluster. It can make full use of system server resources. In large-scale scenarios, the construction process of container clusters is smoother, which greatly reduces the time loss caused by resource preemption. Therefore, through such a process design, in large-scale user scenarios, the construction of container clusters can be scheduled to different nodes through multiple pod instances, which greatly improves the business processing performance and thus improves the success rate of cluster construction.

This embodiment introduces a message cache method to relieve the pressure when large-scale requests arrive, and ensures fast processing of requests by scheduling the consumption logic of multiple instances in the cluster, which mainly refers to starting multiple Cluster Consumers at the back end of the container management layer process (cluster business monitoring process) to monitor the container message queue. The messages on the container message queue are generated by the order generation process, that is, when a large-scale request arrives, the order generation process processes the order logic in batches according to a certain batch processing strategy, and sends the order information in the form of a message Put it on the container message queue, and the Cluster Consumer will process the message. The message processing flow of the Cluster Consumer is not completed by the Cluster Consumer process itself. It distributes the pressure to different pod instances by scheduling instances. In this embodiment, the pod instances are scheduled to servers with sufficient resources. Performance It is stable and has sufficient guarantees. Therefore, the container message queue introduced in this embodiment can fully improve the stability and processing performance of the system. This embodiment can flexibly build an order for each resource of the user, and the effective logic of the order will take effect uniformly after the container is built, reducing the number of interactions with the order system. This embodiment creates matching containers and corresponding container functions based on user requests, which can improve the adaptability and practicability of containers, and reduce the problems of slow container creation and low success rate caused by too many container functions.

In an optional implementation of the present invention, the order system is further configured to: receive the container cluster processing request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster processing message according to the container cluster construction request, and convert the container The cluster processing message is published to the container message queue;

The cluster business monitoring process is also configured as follows: if there is an unconsumed container cluster processing message in the container message queue, obtain the container cluster processing message from the container message queue, start a working instance with the corresponding instance function according to the container cluster processing message, and set The working instance with the corresponding instance function is distributed to the corresponding working node; or, the container cluster is processed according to the container cluster processing message;

The worker nodes are also configured to process the cluster of containers according to the worker instances with corresponding instance capabilities.

Exemplarily, after the construction of the container cluster is completed, the user can also perform corresponding processing on the container cluster, such as expanding, shrinking, viewing, freezing, restoring, etc.; Exemplarily, the user can call the cluster business logic interface to Send container cluster processing requests, for example, any requests such as capacity expansion requests, capacity reduction requests, viewing requests, freezing requests, recovery requests, etc.; the order system is also configured to: receive container cluster processing requests sent by users by calling the cluster business logic interface , call the order generation process to generate a container cluster processing message according to the container cluster construction request, and publish the container cluster processing message to the container message queue;

For requests of different cluster sizes, the container clusters are processed in different ways:

(1) When the cluster size of q<10;

Design according to the normal operating coefficient (1, 1) (here is the empirical value, which is (memory calculation value, CPU calculation value)), and at the same time, based on the original clusterType request cluster type, combined with the container scale q, that is, A=a *clusterType+b*q, when A is greater than or equal to X (X is the preset threshold), when the request is an expansion request or a capacity reduction request, it is necessary to start a working instance with the corresponding instance function, by scheduling the working instance to Corresponding work nodes, so that the work nodes process the container cluster according to the work instance with the corresponding instance function, for example, expansion processing, shrinking processing; if A is less than X, due to the small size of the cluster, when there is a need for expansion or shrink When required, the cluster business monitoring process can directly provide services without creating additional instances to execute. The advantage of this is that due to the small scale, there is no need to create an instance to execute, but the execution speed is fast.

When the orderType is expanding or shrinking, the resources for this kind of request are not as many as creating a cluster. However, due to the small size of the cluster, the work instance is scheduled to the common host server (the scheduling process is to compare the two values (memory calculation value, cpu calculation value) to find a suitable value, and the server has been labeled, which means Class cluster requests will be scheduled to ordinary host servers according to the algorithm), and instances of work pod resources are allocated, affected by the cluster type factor, using the base*c mode configuration, base indicates the basic operating cost of cpu or memory, c indicates Factors of different types of clusters, when the cluster is a resource such as GPU, the factor c needs to be configured larger, so as to quickly complete the expansion and contraction of the container cluster.

(2) For the cluster after q=10-50 enlargement, the cpu and memory resources required within the controllable range are increased in proportion, and the operation coefficient is set to (2.5, 2.5); at the same time, according to the cluster type and cluster size q, That is, B=clusterType*q, when B is less than X, when the request is an expansion request or a capacity reduction request, it is necessary to start a working instance with the corresponding instance function, and when the request is a viewing request, there is no need to start a working instance, the cluster business The listening process can directly provide services; when B is greater than or equal to X, and the request is an expansion request, a shrinkage request, or a view request, a working instance with the corresponding instance function needs to be started. When the cluster reaches the set threshold standard, the pod instance To improve the efficiency and stability of the viewing function, the specific implementation content is as follows:

When the orderType is expanding or shrinking, the resources for this kind of request are not as many as creating a cluster. However, due to the medium scale of the cluster, the working pod instance is scheduled to an idle ordinary host server, and the instance of the working pod resource is allocated, which is affected by the cluster type factor, and the mode configuration of base*c is adopted, and base indicates that the cpu or memory is basically running Overhead, c indicates the different type factors of the cluster. When the cluster is a resource such as GPU, the factor c needs to be configured to be larger, so as to quickly complete the expansion and contraction of the container cluster.

When the orderType is View, the request here only needs to view the resource information related to the database. Therefore, no matter what type of cluster, they will be scheduled to the cheapest server, that is, a general-purpose host, and basic memory and CPU will be allocated. For overhead, after the request is completed, the work instance is quickly released. Such a request can be processed quickly, and resources are recovered in time, which improves the processing efficiency of the server.

(3) For medium and large scale clusters with q=50-100, the set cpu resource requirements increase, the memory overhead is almost the same as that of 10-50, and the operating coefficient is (2.5, 5). Clusters of this scale support through work Expand, shrink, freeze, restore, etc. in the form of a pod instance, that is, when the request is for expanding, shrinking, freezing, restoring, etc., you need to start a working instance with the corresponding instance function to process the container cluster. The specific method is as follows :

When the orderType is expanding or shrinking, the resources for this kind of request are not as many as creating a cluster. However, due to the large scale of the cluster, the working pod instance is scheduled to the server of the corresponding host model, and the instance of the working pod resource is allocated, affected by the cluster type factor, the mode configuration of base*c is adopted, and base indicates the basic operation of cpu or memory Overhead, c indicates the different type factors of the cluster. When the cluster is a resource such as GPU, the factor c needs to be configured to be larger, so as to quickly complete the expansion and contraction of the container cluster.

When the orderType is view, the request here only needs to view the resource information related to the database. Therefore, due to the large scale of the cluster, they are scheduled to the server of the corresponding cluster model, and the basic memory and CPU overhead are allocated. After the request is completed, the work Instances are quickly released, such requests can be processed quickly, and resources are recovered in a timely manner, improving the processing efficiency of the server.

When the orderType is frozen or resumed, this kind of request does not operate resources and only needs to adjust the state, so it requires very little resource overhead, but the scale is large, and they are scheduled to the corresponding cluster type server. After the processing is completed, the resource Automatically recovered by the system.

(4) For large-scale clusters between q=100-1000, the frequency of internal interactions within the cluster increases exponentially, so the demand for resources is therefore adjusted to (10, 20) based on experience, and the cluster type is only general cloud hosts, others Type of container cluster, at this scale, the resource overhead is too large, and the container cluster only provides functions such as viewing, freezing, and restoring. When performing operations such as viewing, freezing, and restoring, you need to start an instance of the corresponding instance function. The specific method is as follows:

When the orderType is view, the request here only needs to view the resource information related to the database. Therefore, due to the large scale of the cluster, they are scheduled to high-performance servers, and a certain amount of memory and cpu overhead is set according to the operation factor. Request After completion, the working instance is quickly released, such requests can be processed quickly, and resources are recovered in time, which improves the processing efficiency of the server.

When the orderType is frozen or resumed, this kind of request does not operate resources, but only needs to adjust the state, so it requires very little resource overhead, but the scale is large, and they are scheduled to high-performance servers. After the processing is completed, the resources are automatically Recycled by the system.

(5) The ultra-large cluster scale with q=>1000 or more is actually rare in terms of business for the construction of large-scale clusters. Therefore, the impact of clusters is not only affected by the current deployment process, but also the traffic within the cluster is affected by bandwidth, The impact of delay and storage performance leads to an increase in the control process, so this parameter is set to the upper line value of the container scale factor, which can be adjusted to 100/100 according to experience, and the cluster type is set, only for general cloud hosts . Container clusters only provide viewing, unsubscribing, and shrinking functions. Due to the large size of the cluster, the processing of small instances usually affects the stability of the cluster construction, and the cluster size has reached, and the operation of freezing and restoring such unstable clusters is not supported. When performing operations such as viewing, unsubscribing, and shrinking, it is necessary to start an instance of the corresponding instance function.

The system provided by the embodiment of the present invention can ease the distribution of user requests in a large-scale scenario by introducing a container message queue to cache messages. At the same time, the order system is designed to process the order business and the cluster business monitoring process. By starting the pod instance and scheduling it to other servers, the purpose of fast processing of messages on the queue can be realized. The pod construction process depends on other resource interfaces. The principle of separation of orders and back-end business is also carried out to realize fast and stable processing of other resources, thus ensuring the fast and stable construction of container clusters. The order takes effect, reducing the frequent interaction process with the order business, and greatly improving the success rate and performance of container cluster construction.

FIG. 2 shows a flowchart of a method for constructing a container cluster provided by an embodiment of the present invention. As shown in Figure 2, the method includes the following steps:

Step S201, the order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue.

Step S202, if the cluster business monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it obtains the container cluster construction message from the container message queue, starts the working instance according to the container cluster construction message, and distributes the working instance to the corresponding work node.

In step S203, the working node builds a container cluster according to the working instance.

Optionally, the method further includes: the cluster service monitoring process destroys the working instance after the construction of the container cluster is completed.

Optionally, the container cluster construction request includes: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, container cluster parameter information, wherein the container cluster parameter information includes : The first container cluster scale value.

Optionally, the cluster service monitoring process starts the work instance according to the container cluster construction message, and distributing the work instance to the corresponding work node further includes:

Determine the instance function according to the scale value of the first container cluster, start the working instance with the corresponding instance function, and distribute the working instance with the corresponding instance function to the corresponding working node;

The working node constructing the container cluster according to the working instance further includes: constructing the container cluster according to the working instance having the corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.

Optionally, the instance function includes one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.

Optionally, the method further includes: the order system receives the container cluster processing request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster processing message according to the container cluster construction request, and publishes the container cluster processing message to the container message queue;

If the cluster business monitoring process detects that there are unconsumed container cluster processing messages in the container message queue, it will obtain container cluster processing messages from the container message queue, and start a working instance with corresponding instance functions according to the container cluster processing messages, which will have corresponding instance functions Distribute the working instances to the corresponding working nodes; or process the container cluster according to the container cluster processing message;

Work nodes process container clusters according to work instances with corresponding instance functions.

Optionally, the method further includes: the order system returns a status message to the user by calling the cluster business logic interface, wherein the status message is used to identify that the container cluster is in a state of construction.

The method provided by the embodiment of the present invention can ease the distribution of user requests in a large-scale scenario by introducing a container message queue to cache messages. At the same time, the order system is designed to process the order business and the cluster business monitoring process. By starting the pod instance and scheduling it to other servers, the purpose of fast processing of messages on the queue can be realized. The pod construction process depends on other resource interfaces. The principle of separation of orders and back-end business is also carried out to realize fast and stable processing of other resources, thus ensuring the fast and stable construction of container clusters. The order takes effect, which reduces the frequent interaction process with the order business, and greatly improves the success rate and performance of container cluster construction.

An embodiment of the present invention provides a non-volatile computer storage medium, where at least one executable instruction is stored on the computer storage medium, and the computer executable instruction can execute the method for constructing a container cluster in any of the above method embodiments.

FIG. 3 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device. As shown in FIG. 3, the computing device may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), and a communication bus. Wherein: the processor, the communication interface, and the memory complete the mutual communication through the communication bus. The communication interface is used to communicate with network elements of other devices such as clients or other servers. The processor is configured to execute the program, specifically, may execute the relevant steps in the above embodiment of the method for constructing a container cluster for a computing device.

Exemplarily, the program may include program code including computer operation instructions.

The processor may be a CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computing device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.

Memory for storing programs. The memory may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory.

The program may specifically be used to cause the processor to execute the container cluster construction method in any of the above method embodiments. For the specific implementation of each step in the program, refer to the corresponding steps and the corresponding descriptions in the units in the above container cluster construction embodiment, and details are not repeated here. Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the foregoing method embodiments, and details are not repeated here.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present invention are not directed to any particular programming language. It should be understood that various programming languages can be used to implement the contents of the embodiments of the present invention described herein, and the above description of specific languages is for disclosing the best implementation mode of the embodiments of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

Similarly, it should be understood that in the above description of the exemplary embodiments of the present invention, various features of the embodiments of the present invention are sometimes grouped together in order to simplify the embodiments of the present invention and facilitate understanding of one or more of the various inventive aspects. in a single embodiment, figure, or description thereof. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will understand that although some embodiments herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. And form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (Digital Signal Processing, DSP) may be used in practice to implement some or all functions of some or all components according to the embodiments of the present invention. Embodiments of the present invention can also be implemented as a device or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein. Such a program implementing an embodiment of the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names. The steps in the above embodiments, unless otherwise specified, should not be construed as limiting the execution order.

Claims

A container cluster construction system, including: an order system, a container message queue, multiple cluster business monitoring processes, and multiple working nodes;

The order system is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue ;

The cluster service monitoring process is configured to obtain a container cluster construction message from the container message queue if it detects that there is an unconsumed container cluster construction message in the container message queue, start a working instance according to the container cluster construction message, and set The work instances are distributed to corresponding work nodes;

A worker node configured to build a container cluster based on the worker instance.
The system according to claim 1, wherein the cluster service monitoring process is further configured to: destroy the working instance after the construction of the container cluster is completed.
The system according to claim 1 or 2, wherein the container cluster construction request includes: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, container Cluster parameter information, wherein the container cluster parameter information includes: a first container cluster scale value.
The system according to claim 3, wherein the cluster service monitoring process is further configured to: determine the instance function according to the first container cluster scale value, start the working instance with the corresponding instance function, and set the working instance with the corresponding instance function The instance is distributed to the corresponding working node;

The working node is further configured to: construct a container cluster according to a working instance having a corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.
The system according to claim 4, wherein the instance function includes one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.
The system according to claim 1 or 2, wherein the order system is further configured to: receive the container cluster processing request sent by the user by calling the cluster business logic interface, and call the order generation process to generate a container according to the container cluster construction request Cluster processing messages, publishing the container cluster processing messages to the container message queue;

The cluster service monitoring process is further configured to: if there is an unconsumed container cluster processing message in the container message queue, obtain a container cluster processing message from the container message queue, and start a For the working instance corresponding to the instance function, distribute the working instance with the corresponding instance function to the corresponding working node; or process the container cluster according to the container cluster processing message;

The working node is further configured to: process the container cluster according to the working instance having the corresponding instance function.
The system according to claim 1 or 2, wherein the order system is further configured to: return a status message to the user by calling the cluster business logic interface, wherein the status message is used to identify that the container cluster is under construction state.
A container cluster construction method, comprising:

The order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue;

If the cluster service monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it obtains a container cluster construction message from the container message queue, starts a work instance according to the container cluster construction message, and converts the work The instance is distributed to the corresponding working node;

The working nodes build container clusters according to the working instances.
A computing device, comprising: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface complete mutual communication through the communication bus;

The memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform operations corresponding to the method for constructing a container cluster according to claim 8 .
A computer storage medium, wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the method for constructing a container cluster according to claim 8 .