WO2023071576A1 - Container cluster construction method and system - Google Patents

Container cluster construction method and system Download PDF

Info

Publication number
WO2023071576A1
WO2023071576A1 PCT/CN2022/118653 CN2022118653W WO2023071576A1 WO 2023071576 A1 WO2023071576 A1 WO 2023071576A1 CN 2022118653 W CN2022118653 W CN 2022118653W WO 2023071576 A1 WO2023071576 A1 WO 2023071576A1
Authority
WO
WIPO (PCT)
Prior art keywords
container
cluster
container cluster
instance
message
Prior art date
Application number
PCT/CN2022/118653
Other languages
French (fr)
Chinese (zh)
Inventor
杨巍巍
Original Assignee
中移(苏州)软件技术有限公司
中国移动通信集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中移(苏州)软件技术有限公司, 中国移动通信集团有限公司 filed Critical 中移(苏州)软件技术有限公司
Publication of WO2023071576A1 publication Critical patent/WO2023071576A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication

Definitions

  • the present invention is based on a Chinese patent application with the application number 202111264499.3 and the application date of October 28, 2021.
  • the applicants are: China Mobile (Suzhou) Software Technology Co., Ltd. and China Mobile Communications Group Co., Ltd., and the application name is "container cluster construction" method and system” and claim the priority of the Chinese patent application, the entire content of the Chinese patent application is hereby incorporated by reference in the present invention.
  • the present invention relates to the technical field of container clusters, and relates to but not limited to a container cluster construction method and system.
  • Cloud-native technology is inseparable from container clusters, but under the pressure of large-scale users, the deployment of container clusters will appear very slow. Analysis of the reasons is mainly due to the insufficient processing capacity of the user-oriented container business and the frequent processing business of the order system Errors, there will be a lot of lag and retries.
  • embodiments of the present invention are proposed to provide a container cluster construction system and a corresponding container cluster construction method that overcome the above problems or at least partially solve the above problems.
  • An embodiment of the present invention provides a container cluster construction system, including: an order system, a container message queue, multiple cluster service monitoring processes, and multiple working nodes;
  • the order system is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue;
  • the cluster business monitoring process is configured so that if there is an unconsumed container cluster construction message in the container message queue, it will obtain the container cluster construction message from the container message queue, start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding work node;
  • Worker nodes configured to build container clusters from worker instances.
  • An embodiment of the present invention provides a container cluster construction method, including:
  • the order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue;
  • the cluster business monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it will obtain the container cluster construction message from the container message queue, start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding working node;
  • Worker nodes build container clusters based on work instances.
  • An embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus;
  • the memory is configured to store at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the above container cluster construction method.
  • An embodiment of the present invention provides a computer storage medium, at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the container cluster construction method described above.
  • the separation of business and order systems can effectively relieve the pressure of large-scale requests, and ensure the stable execution of requests and the success rate of container cluster deployment in large-scale scenarios; the introduction of message caching , which can alleviate the pressure when large-scale requests arrive, and ensure fast processing of requests by scheduling the consumption logic of multiple instances in the cluster.
  • FIG. 1 shows a schematic diagram of a container cluster construction system provided by an embodiment of the present invention
  • FIG. 2 shows a flow chart of a container cluster construction method provided by an embodiment of the present invention
  • Fig. 3 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • the inventors of the present invention found that when constructing a container cluster, related technologies need to be deployed on cloud hosts, relying on resources such as cloud hard disks.
  • resources such as cloud hard disks.
  • mainstream manufacturers will first create orders, then start the deployment process of container services, and complete cluster construction step by step.
  • it is usually necessary to complete the creation of the resources it depends on in a customized way, and these resources are managed separately by the container cluster, that is, no orders are created for these resources. Therefore, at this time, users cannot operate the orders of these hosts, and the business of some manufacturers will be affected by the pressure of orders, errors will occur, time-consuming retries, or be affected by the time of serial processing logic.
  • the order and the operation process of the container cluster are basically synchronized, that is, the resource is created, and the order is confirmed to take effect.
  • the efficiency is very high, but it is usually difficult to achieve such a high interface success rate of each resource under high pressure. It can be seen that in the face of multi-user pressure, the rational call of the interface is particularly important. High-frequency order validation calls will affect the overall deployment process of the cluster, especially in the case of errors.
  • the existing container cluster construction scheme follows a certain deployment process, and usually adopts the order and business control sequence execution process. Therefore, when a large-scale user request arrives, it is affected by the CPU and memory resource preemption of the back-end business server.
  • the cluster construction of the upper layer will appear to be very slow.
  • the container cluster resources built by the user are in principle owned by the user, and the order should not be customized because of the construction of the container cluster, that is, a set of container cluster orders manages the charges for all resources, thereby affecting the user's interest in Order operations for some resources.
  • the settings only allow users to operate resources on the container interface through addition, deletion, modification, and query of the cluster, which lacks the granularity of free resource control.
  • the order system usually does not target a system service, so it has a theoretical threshold, that is, if the threshold is exceeded, the request cannot be responded and a large number of errors will be triggered. Therefore, in the scenario where large-scale users build clusters, the pressure on the order system will affect the success rate of cluster deployment.
  • the business interface of the container backend is "equal" to the system, and the way they seize resources is affected by the CPU and other resources of the server where they are located. Therefore, when large-scale user requests arrive, the successful execution of the cluster construction process cannot be guaranteed. That is, the process may cause large-scale container construction failure due to the upper limit of the server's processing capacity, which in turn affects the reputation of the product or system.
  • the container cluster construction scheme of the embodiment of the present invention is proposed, which meets the processing requirements of large-scale business arrivals, and can achieve the following goals: 1. Design a reasonable order and business separation scheme to ensure that container clusters Deployment success rate. 2. Design instances of back-end business elastic scheduling functions to different working nodes to improve the success rate of container cluster construction. 3. Introduce message caching to alleviate the pressure when large-scale requests arrive, and ensure fast processing of requests by scheduling the consumption logic of multiple instances in the cluster. 4. Design a reasonable order operation process, which can flexibly build orders for each resource of the user, and the effective logic of the order will take effect uniformly after the container is built, reducing the number of interactions with the order system.
  • FIG. 1 shows a schematic diagram of a system for building a container cluster provided by an embodiment of the present invention.
  • the system includes: an order system 101, a container message queue 102, multiple cluster service monitoring processes 103, and multiple working nodes 104 ( Figure 1 is only a schematic illustration).
  • the order system 101 is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue.
  • this embodiment provides the user with a cluster business logic interface (Cluster Bll).
  • cluster Bll cluster business logic interface
  • the order system is further configured to: return a status message to the user by invoking the cluster business logic interface, wherein the status message is used to identify that the container cluster is in a state of construction. Because this request only involves order construction, and there is no resource preemption of the back-end business, there is usually no problem. If sporadic errors occur due to the pressure of the order system, at the layer of the cluster business logic interface (Cluster Bll), distribute and retry according to the strategy preferred by the existing nodes until all batches of orders are successfully constructed.
  • the order system can call the order generation process.
  • the order generation process is multi-instance in the cluster. It can quickly process the orders created by users and publish them to the container message queue as a message carrier. That is, the order generation process is responsible for the container cluster.
  • the generation of construction messages and the process of message generation depend on the processing capacity of the order generation process.
  • the order generation process is set to multi-instance to improve the order processing capacity.
  • the order system calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue.
  • Container construction in a large-scale scenario generates a large number of container cluster construction messages.
  • the cluster service monitoring process 103 is configured to receive the container cluster construction request sent by the user by calling the cluster service logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue.
  • the cluster service monitoring process as a consumer of the back-end business logic, listens to the container cluster construction message sent by the order system, and only then starts the construction process of the back-end container cluster.
  • the cluster service monitoring process monitors in real time whether there is an unconsumed container cluster construction message in the container message queue, and if it detects that there is an unconsumed container cluster construction message in the container message queue, it immediately removes the container from the container message queue
  • the cluster construction message starts the construction process of the container cluster.
  • the cluster service monitoring process can start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding working node.
  • the cluster service monitoring process can start a new working instance (Pod instance) to process the construction request of the container cluster.
  • Pod instances can be preferentially distributed to different working nodes according to the preferred scheduling strategy of the node.
  • the working nodes are specifically used as different servers, and there is no impact on resource competition between each other. Even if they are scheduled to the same working node, reserved processing can be set. resources to improve the processing performance of Pod instances.
  • the process of Pod instances in Kubernetes is lightweight, and its startup is very fast. Therefore, the cluster business monitoring process can achieve the fastest batch processing speed by starting different Pod instances, and process all the messages on the queue. This realizes the separation of order and business processing, and achieves efficient cluster construction.
  • Flexible scheduling of working instances to different server nodes to improve the success rate of container cluster construction means that in the management cluster, in addition to the nodes where the components are located, multiple working nodes are also configured in order to respond to large-scale user requests. It can be scheduled by different pod instances to make full use of working node resources, and after the work instance is completed, it will be automatically destroyed to achieve resource recovery.
  • the cluster service monitoring process is also configured to: destroy the working instance after the container cluster is built.
  • the working node 104 is configured to build a container cluster according to the working instance.
  • Work nodes build container clusters based on work instances, and call back to the order system to make all related orders take effect, reducing frequent interaction with order business and greatly improving the success rate and performance of container cluster construction.
  • the container cluster construction system of the present embodiment includes a three-layer architecture, wherein the uppermost layer is a cluster business logic interface (Cluster Bll), which is used to open the business logic interface to users; the middle layer includes a container message queue, an order system, order generation process), cluster business monitoring process, etc., are used to separate business and order systems, which can effectively relieve the pressure of large-scale container construction requests and ensure the stable execution of container construction requests; the lowest layer is where the server instance is located. Nodes are used to distribute working instances (pods) that process business according to their functions. This layer is an internal server, which is not open to the outside world. Scheduling policies can be set to schedule business processing processes according to the granularity of pods. Scale container build requests are no longer affected by single server processing when they come in.
  • Cluster Bll cluster business logic interface
  • the middle layer includes a container message queue, an order system, order generation process), cluster business monitoring process, etc.
  • the content data structure of the container cluster construction request is (requestId, orderType, userId, poolId, orderId, productType, openParams).
  • requestId the identification number (identity document, id) representing the request, is used to represent the request;
  • orderType expressing the type of operation requested, such as creation
  • poolId indicating the resource pool where the requested resource is located
  • orderId which indicates the requested order information and is used to bind resources. After the container cluster is built, it will call back the status uniformly;
  • productType indicating the requested resource type, such as a container cluster
  • openParams indicates the parameter information of the request.
  • the container cluster construction message carries all the data in the container cluster construction request. Build message parsing and dispatch to different service nodes.
  • the container cluster construction message includes at least one of the following: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, and container cluster parameter information.
  • the container cluster parameter information includes: a first container cluster scale value and a cluster type.
  • the cluster service monitoring process is further configured to: determine the instance function according to the scale value of the first container cluster, start the working instance with the corresponding instance function, and distribute the working instance with the corresponding instance function to the corresponding the working node;
  • the working node is further configured to: build a container cluster according to the working instance with the corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.
  • the instance function may include one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.
  • the request information Q corresponding to the user is calculated.
  • Q (b*baseM*c*q*a, b*baseC*c*q*a), where b is the basic operating coefficient factor, base is the basic operating coefficient, baseM is the basic operating coefficient corresponding to the memory, and baseC is The basic operating coefficient corresponding to the CPU, c is the container type factor, q is the container scale factor, and a is the container operation method factor.
  • the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment.
  • clusters they are scheduled to different types of servers, and configured according to different types of cluster factors, so as to obtain different resource allocation specifications. Calculate the value of this resource according to the algorithm Q formula and various factors, and then perform scheduling and schedule to different types of servers. If it is an ordinary small-scale cluster, it only needs to be scheduled to an ordinary server. If it is a GPU cluster, it needs to be scheduled to a GPU host to provide faster response performance; if it is an elastic bare metal cluster, it needs to be scheduled to an elastic bare metal cluster. on metal machines. (Here is a brief description of the scheduling scheme. The focus of the present invention is to distinguish different types of clusters and scales. In large-scale scenarios, scheduling strategies are used to provide better performance).
  • the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. According to different cluster types, they are scheduled to different types of servers and configured according to different types of cluster factors to obtain different resource allocation specifications.
  • the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. According to different cluster types, they are scheduled to different types of servers and configured according to different types of cluster factors to obtain different resource allocation specifications.
  • the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. Schedule them on high-performance servers and configure them according to different types of cluster factors to obtain different resource allocation specifications.
  • the container cluster construction system in this embodiment can be applied to large-scale scenarios.
  • the processing logic is sent to multiple different servers, and the logic is processed in the form of pod instances.
  • the processed processes are automatically destroyed.
  • pod instance resource recycling is completed by the container cluster. It can make full use of system server resources.
  • the construction process of container clusters is smoother, which greatly reduces the time loss caused by resource preemption. Therefore, through such a process design, in large-scale user scenarios, the construction of container clusters can be scheduled to different nodes through multiple pod instances, which greatly improves the business processing performance and thus improves the success rate of cluster construction.
  • This embodiment introduces a message cache method to relieve the pressure when large-scale requests arrive, and ensures fast processing of requests by scheduling the consumption logic of multiple instances in the cluster, which mainly refers to starting multiple Cluster Consumers at the back end of the container management layer process (cluster business monitoring process) to monitor the container message queue.
  • the messages on the container message queue are generated by the order generation process, that is, when a large-scale request arrives, the order generation process processes the order logic in batches according to a certain batch processing strategy, and sends the order information in the form of a message Put it on the container message queue, and the Cluster Consumer will process the message.
  • the message processing flow of the Cluster Consumer is not completed by the Cluster Consumer process itself. It distributes the pressure to different pod instances by scheduling instances.
  • the pod instances are scheduled to servers with sufficient resources. Performance It is stable and has sufficient guarantees. Therefore, the container message queue introduced in this embodiment can fully improve the stability and processing performance of the system.
  • This embodiment can flexibly build an order for each resource of the user, and the effective logic of the order will take effect uniformly after the container is built, reducing the number of interactions with the order system.
  • This embodiment creates matching containers and corresponding container functions based on user requests, which can improve the adaptability and practicability of containers, and reduce the problems of slow container creation and low success rate caused by too many container functions.
  • the order system is further configured to: receive the container cluster processing request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster processing message according to the container cluster construction request, and convert the container The cluster processing message is published to the container message queue;
  • the cluster business monitoring process is also configured as follows: if there is an unconsumed container cluster processing message in the container message queue, obtain the container cluster processing message from the container message queue, start a working instance with the corresponding instance function according to the container cluster processing message, and set The working instance with the corresponding instance function is distributed to the corresponding working node; or, the container cluster is processed according to the container cluster processing message;
  • the worker nodes are also configured to process the cluster of containers according to the worker instances with corresponding instance capabilities.
  • the user can also perform corresponding processing on the container cluster, such as expanding, shrinking, viewing, freezing, restoring, etc.;
  • the user can call the cluster business logic interface to Send container cluster processing requests, for example, any requests such as capacity expansion requests, capacity reduction requests, viewing requests, freezing requests, recovery requests, etc.;
  • the order system is also configured to: receive container cluster processing requests sent by users by calling the cluster business logic interface , call the order generation process to generate a container cluster processing message according to the container cluster construction request, and publish the container cluster processing message to the container message queue;
  • the cluster business monitoring process is also configured as follows: if there is an unconsumed container cluster processing message in the container message queue, obtain the container cluster processing message from the container message queue, start a working instance with the corresponding instance function according to the container cluster processing message, and set The working instance with the corresponding instance function is distributed to the corresponding working node; or, the container cluster is processed according to the container cluster processing message;
  • the container clusters are processed in different ways:
  • the work instance is scheduled to the common host server (the scheduling process is to compare the two values (memory calculation value, cpu calculation value) to find a suitable value, and the server has been labeled, which means Class cluster requests will be scheduled to ordinary host servers according to the algorithm), and instances of work pod resources are allocated, affected by the cluster type factor, using the base*c mode configuration, base indicates the basic operating cost of cpu or memory, c indicates Factors of different types of clusters, when the cluster is a resource such as GPU, the factor c needs to be configured larger, so as to quickly complete the expansion and contraction of the container cluster.
  • the listening process can directly provide services; when B is greater than or equal to X, and the request is an expansion request, a shrinkage request, or a view request, a working instance with the corresponding instance function needs to be started.
  • the pod instance To improve the efficiency and stability of the viewing function, the specific implementation content is as follows:
  • the resources for this kind of request are not as many as creating a cluster.
  • the working pod instance is scheduled to an idle ordinary host server, and the instance of the working pod resource is allocated, which is affected by the cluster type factor, and the mode configuration of base*c is adopted, and base indicates that the cpu or memory is basically running Overhead, c indicates the different type factors of the cluster.
  • the cluster is a resource such as GPU, the factor c needs to be configured to be larger, so as to quickly complete the expansion and contraction of the container cluster.
  • the request here only needs to view the resource information related to the database. Therefore, no matter what type of cluster, they will be scheduled to the cheapest server, that is, a general-purpose host, and basic memory and CPU will be allocated. For overhead, after the request is completed, the work instance is quickly released. Such a request can be processed quickly, and resources are recovered in time, which improves the processing efficiency of the server.
  • Clusters of this scale support through work Expand, shrink, freeze, restore, etc. in the form of a pod instance, that is, when the request is for expanding, shrinking, freezing, restoring, etc., you need to start a working instance with the corresponding instance function to process the container cluster.
  • the specific method is as follows :
  • the resources for this kind of request are not as many as creating a cluster.
  • the working pod instance is scheduled to the server of the corresponding host model, and the instance of the working pod resource is allocated, affected by the cluster type factor, the mode configuration of base*c is adopted, and base indicates the basic operation of cpu or memory Overhead, c indicates the different type factors of the cluster.
  • the cluster is a resource such as GPU, the factor c needs to be configured to be larger, so as to quickly complete the expansion and contraction of the container cluster.
  • the request here only needs to view the resource information related to the database. Therefore, due to the large scale of the cluster, they are scheduled to the server of the corresponding cluster model, and the basic memory and CPU overhead are allocated. After the request is completed, the work Instances are quickly released, such requests can be processed quickly, and resources are recovered in a timely manner, improving the processing efficiency of the server.
  • the request here only needs to view the resource information related to the database. Therefore, due to the large scale of the cluster, they are scheduled to high-performance servers, and a certain amount of memory and cpu overhead is set according to the operation factor. Request After completion, the working instance is quickly released, such requests can be processed quickly, and resources are recovered in time, which improves the processing efficiency of the server.
  • Container clusters only provide viewing, unsubscribing, and shrinking functions. Due to the large size of the cluster, the processing of small instances usually affects the stability of the cluster construction, and the cluster size has reached, and the operation of freezing and restoring such unstable clusters is not supported. When performing operations such as viewing, unsubscribing, and shrinking, it is necessary to start an instance of the corresponding instance function.
  • the system provided by the embodiment of the present invention can ease the distribution of user requests in a large-scale scenario by introducing a container message queue to cache messages.
  • the order system is designed to process the order business and the cluster business monitoring process.
  • the pod construction process depends on other resource interfaces.
  • the principle of separation of orders and back-end business is also carried out to realize fast and stable processing of other resources, thus ensuring the fast and stable construction of container clusters.
  • the order takes effect, reducing the frequent interaction process with the order business, and greatly improving the success rate and performance of container cluster construction.
  • FIG. 2 shows a flowchart of a method for constructing a container cluster provided by an embodiment of the present invention. As shown in Figure 2, the method includes the following steps:
  • Step S201 the order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue.
  • Step S202 if the cluster business monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it obtains the container cluster construction message from the container message queue, starts the working instance according to the container cluster construction message, and distributes the working instance to the corresponding work node.
  • step S203 the working node builds a container cluster according to the working instance.
  • the method further includes: the cluster service monitoring process destroys the working instance after the construction of the container cluster is completed.
  • the container cluster construction request includes: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, container cluster parameter information, wherein the container cluster parameter information includes : The first container cluster scale value.
  • the cluster service monitoring process starts the work instance according to the container cluster construction message, and distributing the work instance to the corresponding work node further includes:
  • the working node constructing the container cluster according to the working instance further includes: constructing the container cluster according to the working instance having the corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.
  • the instance function includes one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.
  • the method further includes: the order system receives the container cluster processing request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster processing message according to the container cluster construction request, and publishes the container cluster processing message to the container message queue;
  • the cluster business monitoring process detects that there are unconsumed container cluster processing messages in the container message queue, it will obtain container cluster processing messages from the container message queue, and start a working instance with corresponding instance functions according to the container cluster processing messages, which will have corresponding instance functions Distribute the working instances to the corresponding working nodes; or process the container cluster according to the container cluster processing message;
  • Work nodes process container clusters according to work instances with corresponding instance functions.
  • the method further includes: the order system returns a status message to the user by calling the cluster business logic interface, wherein the status message is used to identify that the container cluster is in a state of construction.
  • the method provided by the embodiment of the present invention can ease the distribution of user requests in a large-scale scenario by introducing a container message queue to cache messages.
  • the order system is designed to process the order business and the cluster business monitoring process.
  • the pod construction process depends on other resource interfaces.
  • the principle of separation of orders and back-end business is also carried out to realize fast and stable processing of other resources, thus ensuring the fast and stable construction of container clusters.
  • the order takes effect, which reduces the frequent interaction process with the order business, and greatly improves the success rate and performance of container cluster construction.
  • An embodiment of the present invention provides a non-volatile computer storage medium, where at least one executable instruction is stored on the computer storage medium, and the computer executable instruction can execute the method for constructing a container cluster in any of the above method embodiments.
  • FIG. 3 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.
  • the computing device may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), and a communication bus.
  • the processor, the communication interface, and the memory complete the mutual communication through the communication bus.
  • the communication interface is used to communicate with network elements of other devices such as clients or other servers.
  • the processor is configured to execute the program, specifically, may execute the relevant steps in the above embodiment of the method for constructing a container cluster for a computing device.
  • the program may include program code including computer operation instructions.
  • the processor may be a CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • the one or more processors included in the computing device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory.
  • RAM Random Access Memory
  • NVM non-volatile memory
  • the program may specifically be used to cause the processor to execute the container cluster construction method in any of the above method embodiments.
  • each step in the program refer to the corresponding steps and the corresponding descriptions in the units in the above container cluster construction embodiment, and details are not repeated here.
  • the specific working process of the above-described devices and modules can refer to the corresponding process description in the foregoing method embodiments, and details are not repeated here.
  • modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment.
  • Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies.
  • All features disclosed in this specification including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined.
  • Each feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
  • the various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor may be used in practice to implement some or all functions of some or all components according to the embodiments of the present invention.
  • Embodiments of the present invention can also be implemented as a device or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein.
  • a program implementing an embodiment of the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals.
  • Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present invention are a container cluster construction method and system. The system comprises: an order system, which is configured to receive a container cluster construction request sent by a user by means of calling a cluster service logic interface, call an order generation process to generate a container cluster construction message on the basis of the container cluster construction request, and publish the container cluster construction message to a container message queue; a cluster service monitoring process, which is configured to acquire the container cluster construction message from the container message queue if it is detected that there is an unconsumed container cluster construction message in the container message queue, start working instances on the basis of the container cluster construction message, and distribute the working instances to corresponding working nodes; and the working nodes, which are configured to construct container clusters on the basis of the working instances. By means of the solution of the present invention, service and order systems are separated, such that the pressure of large-scale requests can be effectively relieved, and the stable execution of the requests is guaranteed, thereby guaranteeing the success rate of the deployment of container clusters in a large-scale scenario.

Description

容器集群构建方法及系统Container cluster construction method and system
相关申请的交叉引用Cross References to Related Applications
本发明基于申请号为202111264499.3、申请日为2021年10月28日的中国专利申请提出,申请人为:中移(苏州)软件技术有限公司、中国移动通信集团有限公司,申请名称为“容器集群构建方法及系统”的技术方案,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本发明作为参考。The present invention is based on a Chinese patent application with the application number 202111264499.3 and the application date of October 28, 2021. The applicants are: China Mobile (Suzhou) Software Technology Co., Ltd. and China Mobile Communications Group Co., Ltd., and the application name is "container cluster construction" method and system” and claim the priority of the Chinese patent application, the entire content of the Chinese patent application is hereby incorporated by reference in the present invention.
技术领域technical field
本发明涉及容器集群技术领域,涉及但不限于一种容器集群构建方法及系统。The present invention relates to the technical field of container clusters, and relates to but not limited to a container cluster construction method and system.
背景技术Background technique
云原生技术离不开容器集群,但是面对大规模用户压力的场景下,容器集群的部署会显得很慢,分析其原因,主要是面向用户的容器业务处理能力不足及订单系统的处理业务频繁出错,会出现大量的滞缓及重试。Cloud-native technology is inseparable from container clusters, but under the pressure of large-scale users, the deployment of container clusters will appear very slow. Analysis of the reasons is mainly due to the insufficient processing capacity of the user-oriented container business and the frequent processing business of the order system Errors, there will be a lot of lag and retries.
发明内容Contents of the invention
鉴于上述问题,提出了本发明实施例以便提供一种克服上述问题或者至少部分地解决上述问题的容器集群构建系统和相应的容器集群构建方法。In view of the above problems, embodiments of the present invention are proposed to provide a container cluster construction system and a corresponding container cluster construction method that overcome the above problems or at least partially solve the above problems.
本发明实施例提供了一种容器集群构建系统,包括:订单系统、容器消息队列、多个集群业务监听进程、多个工作节点;An embodiment of the present invention provides a container cluster construction system, including: an order system, a container message queue, multiple cluster service monitoring processes, and multiple working nodes;
订单系统,配置为接收用户通过调用集群业务逻辑接口而发送的容器集群构建请求,调用订单生成进程根据容器集群构建请求生成容器集群构建消息,将容器集群构建消息发布至容器消息队列;The order system is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue;
集群业务监听进程,配置为若监听到容器消息队列中存在未消费的容器集群构建消息,则从容器消息队列获取容器集群构建消息,根据容器集群构建消息启动工作实例,将工作实例分发至对应的工作节点;The cluster business monitoring process is configured so that if there is an unconsumed container cluster construction message in the container message queue, it will obtain the container cluster construction message from the container message queue, start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding work node;
工作节点,配置为根据工作实例构建容器集群。Worker nodes, configured to build container clusters from worker instances.
本发明实施例提供了一种容器集群构建方法,包括:An embodiment of the present invention provides a container cluster construction method, including:
订单系统接收用户通过调用集群业务逻辑接口而发送的容器集群构建请求,调用订单生成进程根据容器集群构建请求生成容器集群构建消息,将容器集群构建消息发布至容器消息队列;The order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue;
集群业务监听进程若监听到容器消息队列中存在未消费的容器集群构建消息,则从容器消息队列获取容器集群构建消息,根据容器集群构建消息启动工作实例,将工作实例分发至对应的工作节点;If the cluster business monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it will obtain the container cluster construction message from the container message queue, start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding working node;
工作节点根据工作实例构建容器集群。Worker nodes build container clusters based on work instances.
本发明实施例提供了一种计算设备,包括:处理器、存储器、通信接口和通信总线,处理器、存储器和通信接口通过通信总线完成相互间的通信;An embodiment of the present invention provides a computing device, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus;
存储器配置为存放至少一可执行指令,可执行指令使处理器执行上述容器集群构建方法对应的操作。The memory is configured to store at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the above container cluster construction method.
本发明实施例提供了一种计算机存储介质,存储介质中存储有至少一可执行指令,可执行指令使处理器执行如上述容器集群构建方法对应的操作。An embodiment of the present invention provides a computer storage medium, at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the container cluster construction method described above.
根据本发明上述实施例提供的方案,分离业务和订单系统,能够有效缓解大规模请求的压力,并且保障了请求的稳定执行,保障了大规模场景下容器集群部署的成功率;引入消息缓存方式,能够缓解大规模请求到达时的压力,并且通过调度集群内多实例的消费逻辑,保障请求的快速处理。According to the solutions provided by the above embodiments of the present invention, the separation of business and order systems can effectively relieve the pressure of large-scale requests, and ensure the stable execution of requests and the success rate of container cluster deployment in large-scale scenarios; the introduction of message caching , which can alleviate the pressure when large-scale requests arrive, and ensure fast processing of requests by scheduling the consumption logic of multiple instances in the cluster.
上述说明仅是本发明实施例技术方案的概述,为了能够更清楚了解本发明实施例的技术手段,而可依照说明书的内容予以实施,并且为了让本发明实施例的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明实施例的具体实施方式。The above description is only an overview of the technical solutions of the embodiments of the present invention. In order to better understand the technical means of the embodiments of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and The advantages can be more obvious and understandable, and the specific implementation manners of the embodiments of the present invention are enumerated below.
附图说明Description of drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明实施例的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating the preferred embodiments and are not considered as limiting the embodiments of the present invention. Also throughout the drawings, the same reference numerals are used to designate the same parts. In the attached picture:
图1示出了本发明实施例提供的容器集群构建系统的示意图;FIG. 1 shows a schematic diagram of a container cluster construction system provided by an embodiment of the present invention;
图2示出了本发明实施例提供的容器集群构建方法的流程图;FIG. 2 shows a flow chart of a container cluster construction method provided by an embodiment of the present invention;
图3示出了本发明实施例提供的计算设备的结构示意图。Fig. 3 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本发明,并且能够将本发明的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.
本发明的发明人发现,相关技术在构建容器集群时,需要部署在云主机上,依赖云硬盘等资源。当大规模用户业务到达时,为了提升用户的满意度,主流厂商都会按照先建立订单,后开始容器业务的部署过程,一步一步完成集群构建。而为了减少订单的调用次数或者压力,通常需要通过定制化的方式,完成它依赖的资源创建,而这些资源受到容器集群的单独管理,即没有为这些资源创建订单。因此,这时用户无法操作这些主机的订单,部分厂商的业务会因为订单的压力影响,出现错误,重试耗费时间,或者受到串行的处理逻辑的时间影响。The inventors of the present invention found that when constructing a container cluster, related technologies need to be deployed on cloud hosts, relying on resources such as cloud hard disks. When large-scale user services arrive, in order to improve user satisfaction, mainstream manufacturers will first create orders, then start the deployment process of container services, and complete cluster construction step by step. In order to reduce the number of invocations or pressure of an order, it is usually necessary to complete the creation of the resources it depends on in a customized way, and these resources are managed separately by the container cluster, that is, no orders are created for these resources. Therefore, at this time, users cannot operate the orders of these hosts, and the business of some manufacturers will be affected by the pressure of orders, errors will occur, time-consuming retries, or be affected by the time of serial processing logic.
此外,当大规模业务到达时,除了集群内部的管理业务,比如查看等其他操作,现有的集群部署流程,从构建订单到业务构建完成,都是一气呵成。因此处理业务的进程比较久,会占用较多的文件句柄,甚至耗费较多的内存、中央处理器(Central Processing Unit,CPU)资源,使得多个进 程间相互影响,最终造成的结果是其中一步出错,会导致先创建出来的资源,再回滚,这将大大浪费了资源开销,同时降低了集群部署的成功率。In addition, when a large-scale business arrives, in addition to the internal management business of the cluster, such as viewing and other operations, the existing cluster deployment process, from building an order to completing business construction, is completed in one go. Therefore, the process of processing business takes a long time, takes up more file handles, and even consumes more memory and central processing unit (Central Processing Unit, CPU) resources, causing multiple processes to interact with each other, and the final result is one of them. Errors will cause the resources created first to be rolled back, which will greatly waste resource overhead and reduce the success rate of cluster deployment.
另外,现有容器集群的部署方案中,订单和容器集群的操作过程基本同步,即创建好了资源,即确认订单开始生效。对于能够保障资源开通成功率的厂家来说,效率是非常高的,但是通常很难做到在高压情况下,各资源的接口成功率还能那么高。可见,面对多用户压力的情况下,接口的合理性调用,尤为重要,高频度的订单生效调用,会影响集群整体的部署过程,尤其是在出现了错误的情况下。In addition, in the existing container cluster deployment scheme, the order and the operation process of the container cluster are basically synchronized, that is, the resource is created, and the order is confirmed to take effect. For manufacturers who can guarantee the success rate of resource activation, the efficiency is very high, but it is usually difficult to achieve such a high interface success rate of each resource under high pressure. It can be seen that in the face of multi-user pressure, the rational call of the interface is particularly important. High-frequency order validation calls will affect the overall deployment process of the cluster, especially in the case of errors.
再者,现有的容器集群构建方案,遵循一定的部署过程,通常采用订单、业务控制顺序执行的过程,因此当大规用户请求到达时,受到后端的业务服务器的CPU、内存资源抢占影响,上层的集群构建会显得很慢。Furthermore, the existing container cluster construction scheme follows a certain deployment process, and usually adopts the order and business control sequence execution process. Therefore, when a large-scale user request arrives, it is affected by the CPU and memory resource preemption of the back-end business server. The cluster construction of the upper layer will appear to be very slow.
在公有云场景下,用户构建的容器集群资源,原则上都是用户所有,不应该因为构建容器集群,而将订单定制化,即容器集群一套订单,管理所有资源的资费,从而影响用户对于部分资源的订单操作。另外,设置只允许用户在容器的界面上,通过集群的增删改查,来操作资源,缺乏资源自由控制的粒度。In the public cloud scenario, the container cluster resources built by the user are in principle owned by the user, and the order should not be customized because of the construction of the container cluster, that is, a set of container cluster orders manages the charges for all resources, thereby affecting the user's interest in Order operations for some resources. In addition, the settings only allow users to operate resources on the container interface through addition, deletion, modification, and query of the cluster, which lacks the granularity of free resource control.
在当前的容器集群构建方案中,通常订单系统不是针对一个系统服务,因此它存在理论上的阈值,即超过这个阈值,请求无法响应,会触发大量的错误。因此,大规模用户构建集群的场景下,订单系统的压力,会影响集群部署的成功率。In the current container cluster construction scheme, the order system usually does not target a system service, so it has a theoretical threshold, that is, if the threshold is exceeded, the request cannot be responded and a large number of errors will be triggered. Therefore, in the scenario where large-scale users build clusters, the pressure on the order system will affect the success rate of cluster deployment.
此外,容器后端的业务接口,对于系统是“平等”的,它们抢占资源的方式,受到所在服务器的CPU等资源的影响。因此,在大规模用户请求到达时,不能够保证集群构建过程的成功执行。即进程可能会因服务器处理能力的上限,从而引发大规模容器构建失败,进而影响了产品或者系统的口碑。In addition, the business interface of the container backend is "equal" to the system, and the way they seize resources is affected by the CPU and other resources of the server where they are located. Therefore, when large-scale user requests arrive, the successful execution of the cluster construction process cannot be guaranteed. That is, the process may cause large-scale container construction failure due to the upper limit of the server's processing capacity, which in turn affects the reputation of the product or system.
因此,提出了本发明实施例的容器集群构建方案,该方案满足大规模 业务到达的处理需求,并能够实现以下目的:1、设计合理的订单、业务分离方案,保障大规模场景下,容器集群部署的成功率。2、设计后端业务弹性调度功能实例至不同的工作节点,提升容器集群构建成功率。3、引入消息缓存方式,来缓解大规模请求到达时的压力,并且通过调度集群内多实例的消费逻辑,保障请求的快速处理。4、设计合理的订单操作流程,能够灵活地为用户的每一个资源都构建订单,并且订单的生效逻辑,在容器构建完成后,统一生效,减少与订单系统的交互次数。Therefore, the container cluster construction scheme of the embodiment of the present invention is proposed, which meets the processing requirements of large-scale business arrivals, and can achieve the following goals: 1. Design a reasonable order and business separation scheme to ensure that container clusters Deployment success rate. 2. Design instances of back-end business elastic scheduling functions to different working nodes to improve the success rate of container cluster construction. 3. Introduce message caching to alleviate the pressure when large-scale requests arrive, and ensure fast processing of requests by scheduling the consumption logic of multiple instances in the cluster. 4. Design a reasonable order operation process, which can flexibly build orders for each resource of the user, and the effective logic of the order will take effect uniformly after the container is built, reducing the number of interactions with the order system.
图1示出了本发明实施例提供的容器集群构建系统的示意图。如图1所示,该系统包括:订单系统101、容器消息队列102、多个集群业务监听进程103、多个工作节点104(图1仅是示意性说明)。FIG. 1 shows a schematic diagram of a system for building a container cluster provided by an embodiment of the present invention. As shown in Figure 1, the system includes: an order system 101, a container message queue 102, multiple cluster service monitoring processes 103, and multiple working nodes 104 (Figure 1 is only a schematic illustration).
订单系统101,配置为接收用户通过调用集群业务逻辑接口而发送的容器集群构建请求,调用订单生成进程根据容器集群构建请求生成容器集群构建消息,将容器集群构建消息发布至容器消息队列。The order system 101 is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue.
示例性地,本实施例向用户提供有集群业务逻辑接口(Cluster Bll),当用户希望构建容器集群时,可以调用集群业务逻辑接口向订单系统发送容器订单构建请求,一旦请求成功,则上层业务的流程结束,向用户返回容器集群构建中的状态。在一些实施例中,订单系统还配置为:通过调用集群业务逻辑接口向用户返回状态消息,其中,状态消息用于标识容器集群处于构建中的状态。由于这个请求,只涉及订单构建,并没有后端业务的资源抢占,通常不会出现问题。如果因为订单系统的压力,导致出现偶发错误,则在集群业务逻辑接口(Cluster Bll)这一层按照现有节点优选的策略分发重试,直到所有批次的订单构建成功。Exemplarily, this embodiment provides the user with a cluster business logic interface (Cluster Bll). When the user wants to build a container cluster, he can call the cluster business logic interface to send a container order construction request to the order system. Once the request is successful, the upper layer business The process ends, and the status of the container cluster construction is returned to the user. In some embodiments, the order system is further configured to: return a status message to the user by invoking the cluster business logic interface, wherein the status message is used to identify that the container cluster is in a state of construction. Because this request only involves order construction, and there is no resource preemption of the back-end business, there is usually no problem. If sporadic errors occur due to the pressure of the order system, at the layer of the cluster business logic interface (Cluster Bll), distribute and retry according to the strategy preferred by the existing nodes until all batches of orders are successfully constructed.
订单系统可以调用订单生成进程,订单生成进程是集群内多实例的,它能够快速处理用户创建的订单,并将它们以消息的载体,发布至容器消息队列,即,由订单生成进程负责容器集群构建消息的生成,消息产生的过程,依赖订单生成进程的处理能力,此处订单生成进程设置为多实例, 提升订单的处理能力。The order system can call the order generation process. The order generation process is multi-instance in the cluster. It can quickly process the orders created by users and publish them to the container message queue as a message carrier. That is, the order generation process is responsible for the container cluster. The generation of construction messages and the process of message generation depend on the processing capacity of the order generation process. Here, the order generation process is set to multi-instance to improve the order processing capacity.
示例性地,订单系统调用订单生成进程根据容器集群构建请求生成容器集群构建消息,将容器集群构建消息发布至容器消息队列,大规模场景下的容器构建,产生了大量的容器集群构建消息。Exemplarily, the order system calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue. Container construction in a large-scale scenario generates a large number of container cluster construction messages.
集群业务监听进程103,配置为接收用户通过调用集群业务逻辑接口而发送的容器集群构建请求,调用订单生成进程根据容器集群构建请求生成容器集群构建消息,将容器集群构建消息发布至容器消息队列。The cluster service monitoring process 103 is configured to receive the container cluster construction request sent by the user by calling the cluster service logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue.
示例性地,集群业务监听进程作为后端的业务逻辑的消费者,监听订单系统发送过来的容器集群构建消息,此时才开始后端容器集群的构建过程。Exemplarily, the cluster service monitoring process, as a consumer of the back-end business logic, listens to the container cluster construction message sent by the order system, and only then starts the construction process of the back-end container cluster.
示例性地,集群业务监听进程实时监听容器消息队列中是否存在未消费的容器集群构建消息,若监听到容器消息队列中存在未消费的容器集群构建消息,则立马从容器消息队列上取走容器集群构建消息,开始容器集群的构建流程。示例性地,集群业务监听进程可以根据容器集群构建消息启动工作实例,将工作实例分发至对应的工作节点。Exemplarily, the cluster service monitoring process monitors in real time whether there is an unconsumed container cluster construction message in the container message queue, and if it detects that there is an unconsumed container cluster construction message in the container message queue, it immediately removes the container from the container message queue The cluster construction message starts the construction process of the container cluster. Exemplarily, the cluster service monitoring process can start the working instance according to the container cluster construction message, and distribute the working instance to the corresponding working node.
其中,集群业务监听进程可以启动新的工作实例(Pod实例)来处理容器集群的构建请求。而Pod实例,可以根据节点优选调度策略,优先将实例分发至不同的工作节点,工作节点具体作为不同的服务器,彼此之间没有资源竞争的影响,即使调度至同一工作节点,可设置预留处理资源,提升处理Pod实例处理性能。另外Pod实例在Kubernetes的过程是轻量级的,它的启动特别快,因此集群业务监听进程可以通过启动不同的Pod实例达到批处理最快的速度,处理完队列上的所有消息。这种实现了订单和业务处理的分离,达到高效的集群构建效果。Among them, the cluster service monitoring process can start a new working instance (Pod instance) to process the construction request of the container cluster. On the other hand, Pod instances can be preferentially distributed to different working nodes according to the preferred scheduling strategy of the node. The working nodes are specifically used as different servers, and there is no impact on resource competition between each other. Even if they are scheduled to the same working node, reserved processing can be set. resources to improve the processing performance of Pod instances. In addition, the process of Pod instances in Kubernetes is lightweight, and its startup is very fast. Therefore, the cluster business monitoring process can achieve the fastest batch processing speed by starting different Pod instances, and process all the messages on the queue. This realizes the separation of order and business processing, and achieves efficient cluster construction.
需要说明的是,容器集群的构建流程中,可能会依赖其它资源,这里可以通过其它资源的调度接口完成。而其它资源的构建,也是通过订单、业务构建分离的原则,保证了其它资源构建的稳定性和效率,因此当所依 赖的资源构建完成后,再由容器构造的集群业务监听进程,将该过程中的所有资源的所有依赖订单,一批次向订单系统完成订单的生效时间。这将大大减少了其它资源订单与订单系统的交互次数,同时也将各资源以订单的条目,为用户生成。用户可以灵活操作容器集群中涉及的所有资源,不受容器集群订单业务本身的影响。而当在偶发情况下,容器集群依赖的资源构建失败,在pod的业务流程中增加了重试的机制,保证了资源开通的成功率,从而保障集群构建的成功率。It should be noted that in the construction process of the container cluster, other resources may be relied on, which can be completed through the scheduling interface of other resources. The construction of other resources is also based on the principle of separation of orders and business construction, which ensures the stability and efficiency of other resource construction. Therefore, after the dependent resource construction is completed, the cluster business monitoring process constructed by the container will monitor the process. All dependent orders of all resources, and the effective time when a batch of orders is completed to the order system. This will greatly reduce the number of interactions between other resource orders and the order system, and at the same time, each resource will be generated for the user as an item of the order. Users can flexibly operate all the resources involved in the container cluster without being affected by the container cluster order business itself. However, when the construction of the resources that the container cluster relies on fails occasionally, a retry mechanism is added to the pod business process to ensure the success rate of resource provisioning, thereby ensuring the success rate of cluster construction.
将工作实例弹性调度实例至不同的服务器节点,提升容器集群构建成功率,指的是在管理集群中,除了组件所在的节点之外,为了应对大规模用户请求,还配置了多台工作节点,可供不同的pod实例调度,充分使用工作节点资源,并且在工作实例完成之后,自动销毁,达到资源的回收。也就是说,集群业务监听进程还配置为:在容器集群构建完成后,销毁工作实例。Flexible scheduling of working instances to different server nodes to improve the success rate of container cluster construction means that in the management cluster, in addition to the nodes where the components are located, multiple working nodes are also configured in order to respond to large-scale user requests. It can be scheduled by different pod instances to make full use of working node resources, and after the work instance is completed, it will be automatically destroyed to achieve resource recovery. In other words, the cluster service monitoring process is also configured to: destroy the working instance after the container cluster is built.
工作节点104,配置为根据工作实例构建容器集群。The working node 104 is configured to build a container cluster according to the working instance.
工作节点根据工作实例构建容器集群,并且回调订单系统,以使所有相关的订单生效,减少了与订单业务频繁的交互过程,大大提升了容器集群构建的成功率及性能。Work nodes build container clusters based on work instances, and call back to the order system to make all related orders take effect, reducing frequent interaction with order business and greatly improving the success rate and performance of container cluster construction.
如图1所示,本实施例的容器集群构建系统包括三层架构,其中,最上层是集群业务逻辑接口(Cluster Bll),用于向用户开放业务逻辑接口;中间层包括容器消息队列、订单系统、订单生成进程)、集群业务监听进程等,用于分离业务和订单系统,能够有效缓解大规模容器构建请求的压力,并且保障了容器构建请求的稳定执行;最下层是服务器实例所在的工作节点,用于按照功能来分发处理业务的工作实例(pod),这一层是内部服务器,它不对外开放,可以设定调度策略,将业务处理进程,按照pod的粒度进行调度,这样当大规模容器构建请求过来时,不再受到单服务器处理进程的影响。As shown in Figure 1, the container cluster construction system of the present embodiment includes a three-layer architecture, wherein the uppermost layer is a cluster business logic interface (Cluster Bll), which is used to open the business logic interface to users; the middle layer includes a container message queue, an order system, order generation process), cluster business monitoring process, etc., are used to separate business and order systems, which can effectively relieve the pressure of large-scale container construction requests and ensure the stable execution of container construction requests; the lowest layer is where the server instance is located. Nodes are used to distribute working instances (pods) that process business according to their functions. This layer is an internal server, which is not open to the outside world. Scheduling policies can be set to schedule business processing processes according to the granularity of pods. Scale container build requests are no longer affected by single server processing when they come in.
容器集群构建请求的内容数据结构为(requestId,orderType,userId,poolId,orderId,productType,openParams)。The content data structure of the container cluster construction request is (requestId, orderType, userId, poolId, orderId, productType, openParams).
其中,requestId,表征请求的标识号(identity document,id),用于表征请求;Among them, requestId, the identification number (identity document, id) representing the request, is used to represent the request;
orderType,表述请求的操作类型,如创建;orderType, expressing the type of operation requested, such as creation;
userId,表示请求资源的用户;userId, indicating the user requesting the resource;
poolId,表示请求的资源所在的资源池;poolId, indicating the resource pool where the requested resource is located;
orderId,表示请求的订单信息,用于和资源绑定,在容器集群构建完成后,统一回调状态;orderId, which indicates the requested order information and is used to bind resources. After the container cluster is built, it will call back the status uniformly;
productType,表示请求的资源类型,如容器集群;productType, indicating the requested resource type, such as a container cluster;
openParams,表示请求的参数信息。openParams, indicates the parameter information of the request.
这样,在收到请求后,可以明确它的相关信息,可以包括来源、请求类型、订单信息、操作参数等重要信息,容器集群构建消息携带有容器集群构建请求中的全部数据,通过对容器集群构建消息的解析,调度至不同的服务节点。In this way, after receiving the request, you can clarify its relevant information, which can include important information such as source, request type, order information, and operating parameters. The container cluster construction message carries all the data in the container cluster construction request. Build message parsing and dispatch to different service nodes.
也就是说,容器集群构建消息包括以下至少一项:请求标识、请求操作类型、用户标识、所请求资源所在的资源池标识、所请求的订单信息、资源类型、容器集群参数信息。其中,容器集群参数信息包括:第一容器集群规模值、集群类型。That is to say, the container cluster construction message includes at least one of the following: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, and container cluster parameter information. Wherein, the container cluster parameter information includes: a first container cluster scale value and a cluster type.
在本发明一种可选实施方式中,集群业务监听进程进一步配置为:根据第一容器集群规模值确定实例功能,启动具有对应实例功能的工作实例,将具有对应实例功能的工作实例分发至对应的工作节点;In an optional implementation of the present invention, the cluster service monitoring process is further configured to: determine the instance function according to the scale value of the first container cluster, start the working instance with the corresponding instance function, and distribute the working instance with the corresponding instance function to the corresponding the working node;
工作节点进一步配置为:根据具有对应实例功能工作实例构建容器集群,其中,容器集群提供的集群功能与实例功能相对应。The working node is further configured to: build a container cluster according to the working instance with the corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.
其中,实例功能可以包括以下功能中的一个或多个:创建功能、扩容功能、缩容功能、查看功能、冻结功能、恢复功能、退订功能。Wherein, the instance function may include one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.
示例性地,基于用户的容器构建请求I=(requestId,orderType,userId,poolId,orderId,productType,openParams)来计算用户对应的请求信息Q。Exemplarily, based on the user's container construction request I=(requestId, orderType, userId, poolId, orderId, productType, openParams), the request information Q corresponding to the user is calculated.
Q=(b*baseM*c*q*a,b*baseC*c*q*a),其中,b为基础操作系数因子,base为基础操作系数,baseM为内存对应的基础操作系数,baseC为CPU对应的基础操作系数,c为容器类型因子,q为容器规模因子,a为容器操作方法因子。Q=(b*baseM*c*q*a, b*baseC*c*q*a), where b is the basic operating coefficient factor, base is the basic operating coefficient, baseM is the basic operating coefficient corresponding to the memory, and baseC is The basic operating coefficient corresponding to the CPU, c is the container type factor, q is the container scale factor, and a is the container operation method factor.
之后确定工作实例的实例功能,并基于Q的值进行工作实例的调度。可以包括以下几种情况:Then determine the instance function of the working instance, and schedule the working instance based on the value of Q. Can include the following situations:
(1)当q<10的集群规模;(1) When the cluster size of q<10;
按照正常的操作系数(1,1)(这里是经验值,为(内存计算值,CPU计算值))来设计,同时,基于原始的clusterType请求集群类型,再结合容器规模q,即A=a*clusterType+b*q,当A大于或等于X(X为预先设置的阈值)时,工作实例提供创建功能、扩容功能、缩容功能,若A小于X时,则提供具有创建功能的工作实例,在容器集群构建完成后,用户存在其他需求时,是直接在进程中提供服务,不需要额外创建实例去执行。这样的好处是,对于小规模的集群能够提供的包括扩容功能、缩容功能,由于规模小,不需要再创建实例去执行,反而执行速度快。Design according to the normal operating coefficient (1, 1) (here is the empirical value, which is (memory calculation value, CPU calculation value)), and at the same time, based on the original clusterType request cluster type, combined with the container scale q, that is, A=a *clusterType+b*q, when A is greater than or equal to X (X is the preset threshold), the work instance provides the creation function, capacity expansion function, and capacity reduction function; if A is less than X, the work instance with the creation function is provided , after the construction of the container cluster is completed, when the user has other needs, the service is provided directly in the process, and there is no need to create an additional instance to execute it. The advantage of this is that for small-scale clusters, it can provide expansion and shrinkage functions. Due to the small scale, there is no need to create an instance to execute it, but the execution speed is fast.
当orderType为创建时,它创建的依赖的处理资源逻辑比较复杂,包括容器集群内部的多个资源、容器集群部署的业务逻辑。根据集群类型的不同,将他们调度到不同型号的服务器上,并且根据不同类型的集群因子来配置,从而获取不同的资源分配规格,这类的调度过程采用相关技术的调度方法,只是这类需要根据算法Q公式,及各因子算出这个资源的值,然后执行调度,调度到不同型号的服务器上。如果是普通的小规模的集群,只需要调度到普通服务器上,如果是gpu集群,则需要调度到gpu主机上,从而提供更快的响应性能;如果是弹性裸金属集群,则调度到弹性裸金属机器上。(这里简单说明了一下调度的方案,本发明的重点是区分不同类 型的集群、及规模,在大规模场景下,通过调度策略,从而提供更优的性能)。When the orderType is created, the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. According to different types of clusters, they are scheduled to different types of servers, and configured according to different types of cluster factors, so as to obtain different resource allocation specifications. Calculate the value of this resource according to the algorithm Q formula and various factors, and then perform scheduling and schedule to different types of servers. If it is an ordinary small-scale cluster, it only needs to be scheduled to an ordinary server. If it is a GPU cluster, it needs to be scheduled to a GPU host to provide faster response performance; if it is an elastic bare metal cluster, it needs to be scheduled to an elastic bare metal cluster. on metal machines. (Here is a brief description of the scheduling scheme. The focus of the present invention is to distinguish different types of clusters and scales. In large-scale scenarios, scheduling strategies are used to provide better performance).
(2)q=10-50增大后的集群,在可控范围内需求的cpu、内存资源,按照比例增加,操作系数设定为(2.5,2.5);同时根据集群类型、集群规模q,即B=clusterType*q,当B小于X时,工作实例提供创建功能、扩容功能、缩容功能,若B大于或等于X时,工作实例提供创建功能、扩容功能、缩容功能、查看功能,当集群达到设定的阈值标准,通过pod工作实例的方式提高查看功能的效率及稳定性。(2) For the cluster after q=10-50 enlargement, the cpu and memory resources required within the controllable range are increased in proportion, and the operation coefficient is set to (2.5, 2.5); at the same time, according to the cluster type and cluster size q, That is, B=clusterType*q, when B is less than X, the work instance provides creation, expansion, and shrinkage functions; if B is greater than or equal to X, the work instance provides creation, expansion, shrinkage, and viewing functions, When the cluster reaches the set threshold standard, the efficiency and stability of the viewing function is improved through the pod work instance.
当orderType为创建时,它创建的依赖的处理资源逻辑比较复杂,包括容器集群内部的多个资源、容器集群部署的业务逻辑。根据集群类型的不同,将他们调度到不同型号的服务器上,并且根据不同类型的集群因子来配置,从而获取不同的资源分配规格。When the orderType is created, the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. According to different cluster types, they are scheduled to different types of servers and configured according to different types of cluster factors to obtain different resource allocation specifications.
(3)q=50-100的中大型集群规模,设置的CPU资源需求增大,内存开销几乎同10-50的开销,操作系数设置为(2.5,5),此时,工作实例提供创建功能、扩容功能、缩容功能、冻结功能、恢复功能。(3) When q=50-100 medium and large scale clusters, the set CPU resource requirements increase, the memory overhead is almost the same as that of 10-50, and the operating coefficient is set to (2.5, 5). At this time, the working instance provides the creation function , expansion function, shrink function, freeze function, restore function.
当orderType为创建时,它创建的依赖的处理资源逻辑比较复杂,包括容器集群内部的多个资源、容器集群部署的业务逻辑。根据集群类型的不同,将他们调度到不同型号的服务器上,并且根据不同类型的集群因子来配置,从而获取不同的资源分配规格。When the orderType is created, the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. According to different cluster types, they are scheduled to different types of servers and configured according to different types of cluster factors to obtain different resource allocation specifications.
(4)q=100-1000之间的大规模集群,集群内部交互的频次呈指数上涨,因此对于资源的需求因此,根据经验调整为(10,20),集群类型仅仅为通用云主机,其它类型的容器集群,在这个规模下,资源开销过大,此时工作实例提供创建功能、查看功能、冻结功能、恢复功能。(4) For large-scale clusters between q=100-1000, the frequency of internal interactions within the cluster increases exponentially, so the demand for resources is therefore adjusted to (10, 20) based on experience, and the cluster type is only general cloud hosts, others In this type of container cluster, the resource overhead is too large at this scale. At this time, the working instance provides the creation function, viewing function, freezing function, and recovery function.
当orderType为创建时,它创建的依赖的处理资源逻辑比较复杂,包括容器集群内部的多个资源、容器集群部署的业务逻辑。将他们调度到高性能服务器上,并且根据不同类型的集群因子来配置,从而获取不同的资源 分配规格。When the orderType is created, the dependent processing resource logic it creates is more complex, including multiple resources inside the container cluster and the business logic of the container cluster deployment. Schedule them on high-performance servers and configure them according to different types of cluster factors to obtain different resource allocation specifications.
(5)q=>1000以上的超大集群规模,针对大规模集群的构建,实际在业务上并不多见,因此,集群的影响不仅受到当前部署的进程影响,同时集群内部的流量受到带宽、时延、存储性能的影响,导致控制流程上加大了,因此这个参数设定为容器规模因子的上线值,根据经验可调整为100/100,并且设定了集群类型,仅仅为通用云主机,工作实例提供的仅仅是创建功能、查看功能、退订功能、缩容功能。由于集群规模太大,小实例的处理,通常会影响集群构建的稳定性,并且集群规模已经达到,不支持冻结、恢复这种集群不稳定的操作。(5) The ultra-large cluster scale with q=>1000 or more is actually rare in terms of business for the construction of large-scale clusters. Therefore, the impact of clusters is not only affected by the current deployment process, but also the traffic within the cluster is affected by bandwidth, The impact of delay and storage performance leads to an increase in the control process, so this parameter is set to the upper line value of the container scale factor, which can be adjusted to 100/100 according to experience, and the cluster type is set, only for general cloud hosts , the working example provides only the creation function, viewing function, unsubscribing function, and shrinking function. Due to the large size of the cluster, the processing of small instances usually affects the stability of the cluster construction, and the cluster size has reached, and the operation of freezing and restoring such unstable clusters is not supported.
基于用户请求信息数据创建匹配的容器及对应的容器功能,能够提升容器的适配性以及实用性,减少因容器功能过多带来的容器创建较慢,成功率较低的问题。Create matching containers and corresponding container functions based on user request information data, which can improve the adaptability and practicability of containers, and reduce the problems of slow container creation and low success rate caused by too many container functions.
本实施例的容器集群构建系统能够应用于大规模场景,根据用户不同的请求,将处理逻辑下发到多台不同的服务器,以pod实例的方式来处理逻辑,对于处理完的进程,自动销毁pod实例,由容器集群完成资源的回收。能够充分使用系统服务器的资源,在大规模场景之下,容器集群的构建流程更加顺畅,大大降低了资源抢占,而导致的时间损耗。因此,通过这样的流程设计,在大规模用户场景下,容器集群的构建,可通过多pod实例调度到不同的节点,大大提升了业务处理性能,从而提升了集群构建的成功率。The container cluster construction system in this embodiment can be applied to large-scale scenarios. According to different user requests, the processing logic is sent to multiple different servers, and the logic is processed in the form of pod instances. The processed processes are automatically destroyed. pod instance, resource recycling is completed by the container cluster. It can make full use of system server resources. In large-scale scenarios, the construction process of container clusters is smoother, which greatly reduces the time loss caused by resource preemption. Therefore, through such a process design, in large-scale user scenarios, the construction of container clusters can be scheduled to different nodes through multiple pod instances, which greatly improves the business processing performance and thus improves the success rate of cluster construction.
本实施例引入消息缓存方式,来缓解大规模请求到达时的压力,并且通过调度集群内多实例的消费逻辑,保障请求的快速处理,主要是指在容器管理层的后端启动多个Cluster Consumer进程(集群业务监听进程),来监听容器消息队列。而容器消息队列上面的消息,则由订单生成进程产生,即当大规模请求到达时,订单生成进程按照一定的批处理策略,分批次处理完订单逻辑,并将订单信息,通过消息的形式放到容器消息队列上,由 Cluster Consumer来处理消息。而Cluster Consumer的消息处理流程,并不由Cluster Consumer进程本身来完成,它是通过调度实例,将压力分发到不同的pod实例,而在本实施例中pod实例被调度到充沛资源的服务器上,性能稳定,有足够的保障,因此本实施例中引入的容器消息队列,能够充分提升系统的稳定性、处理性能。本实施例能够灵活地为用户的每一个资源都构建订单,并且订单的生效逻辑,在容器构建完成后,统一生效,减少与订单系统的交互次数。本实施例基于用户请求创建匹配的容器及对应的容器功能,能够提升容器的适配性以及实用性,减少因容器功能过多带来的容器创建较慢,成功率较低的问题。This embodiment introduces a message cache method to relieve the pressure when large-scale requests arrive, and ensures fast processing of requests by scheduling the consumption logic of multiple instances in the cluster, which mainly refers to starting multiple Cluster Consumers at the back end of the container management layer process (cluster business monitoring process) to monitor the container message queue. The messages on the container message queue are generated by the order generation process, that is, when a large-scale request arrives, the order generation process processes the order logic in batches according to a certain batch processing strategy, and sends the order information in the form of a message Put it on the container message queue, and the Cluster Consumer will process the message. The message processing flow of the Cluster Consumer is not completed by the Cluster Consumer process itself. It distributes the pressure to different pod instances by scheduling instances. In this embodiment, the pod instances are scheduled to servers with sufficient resources. Performance It is stable and has sufficient guarantees. Therefore, the container message queue introduced in this embodiment can fully improve the stability and processing performance of the system. This embodiment can flexibly build an order for each resource of the user, and the effective logic of the order will take effect uniformly after the container is built, reducing the number of interactions with the order system. This embodiment creates matching containers and corresponding container functions based on user requests, which can improve the adaptability and practicability of containers, and reduce the problems of slow container creation and low success rate caused by too many container functions.
在本发明一种可选实施方式中,订单系统还配置为:接收用户通过调用集群业务逻辑接口而发送的容器集群处理请求,调用订单生成进程根据容器集群构建请求生成容器集群处理消息,将容器集群处理消息发布至容器消息队列;In an optional implementation of the present invention, the order system is further configured to: receive the container cluster processing request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster processing message according to the container cluster construction request, and convert the container The cluster processing message is published to the container message queue;
集群业务监听进程还配置为:若监听到容器消息队列中存在未消费的容器集群处理消息,则从容器消息队列获取容器集群处理消息,根据容器集群处理消息启动具有对应实例功能的工作实例,将具有对应实例功能的工作实例分发至对应的工作节点;或者,根据容器集群处理消息对容器集群进行处理;The cluster business monitoring process is also configured as follows: if there is an unconsumed container cluster processing message in the container message queue, obtain the container cluster processing message from the container message queue, start a working instance with the corresponding instance function according to the container cluster processing message, and set The working instance with the corresponding instance function is distributed to the corresponding working node; or, the container cluster is processed according to the container cluster processing message;
工作节点还配置为:根据具有对应实例功能的工作实例对容器集群进行处理。The worker nodes are also configured to process the cluster of containers according to the worker instances with corresponding instance capabilities.
示例性地,在容器集群构建完成后,用户还可以对容器集群进行相应的处理,例如,扩容、缩容、查看、冻结、恢复等处理;示例性地,用户可以通过调用集群业务逻辑接口来发送容器集群处理请求,例如,可以是扩容请求、缩容请求、查看请求、冻结请求、恢复请求等任意请求;订单系统还配置为:接收用户通过调用集群业务逻辑接口而发送的容器集群处理请求,调用订单生成进程根据容器集群构建请求生成容器集群处理消息, 将容器集群处理消息发布至容器消息队列;Exemplarily, after the construction of the container cluster is completed, the user can also perform corresponding processing on the container cluster, such as expanding, shrinking, viewing, freezing, restoring, etc.; Exemplarily, the user can call the cluster business logic interface to Send container cluster processing requests, for example, any requests such as capacity expansion requests, capacity reduction requests, viewing requests, freezing requests, recovery requests, etc.; the order system is also configured to: receive container cluster processing requests sent by users by calling the cluster business logic interface , call the order generation process to generate a container cluster processing message according to the container cluster construction request, and publish the container cluster processing message to the container message queue;
集群业务监听进程还配置为:若监听到容器消息队列中存在未消费的容器集群处理消息,则从容器消息队列获取容器集群处理消息,根据容器集群处理消息启动具有对应实例功能的工作实例,将具有对应实例功能的工作实例分发至对应的工作节点;或者,根据容器集群处理消息对容器集群进行处理;The cluster business monitoring process is also configured as follows: if there is an unconsumed container cluster processing message in the container message queue, obtain the container cluster processing message from the container message queue, start a working instance with the corresponding instance function according to the container cluster processing message, and set The working instance with the corresponding instance function is distributed to the corresponding working node; or, the container cluster is processed according to the container cluster processing message;
针对不同的集群规模的请求,对容器集群进行处理时所采用的方式是不同的:For requests of different cluster sizes, the container clusters are processed in different ways:
(1)当q<10的集群规模;(1) When the cluster size of q<10;
按照正常的操作系数(1,1)(这里是经验值,为(内存计算值,CPU计算值))来设计,同时,基于原始的clusterType请求集群类型,再结合容器规模q,即A=a*clusterType+b*q,当A大于或等于X(X为预先设置的阈值)时,当请求为扩容请求或缩容请求时,需要启动具有对应实例功能的工作实例,通过将工作实例调度至对应的工作节点,使得工作节点根据具有对应实例功能的工作实例对容器集群进行处理,例如,扩容处理、缩容处理;若A小于X时,由于集群规模较小,当存在扩容需求或缩容需求时,集群业务监听进程可以直接提供服务,不需要额外创建实例去执行。这样的好处是,由于规模小,不需要再创建实例去执行,反而执行速度快。Design according to the normal operating coefficient (1, 1) (here is the empirical value, which is (memory calculation value, CPU calculation value)), and at the same time, based on the original clusterType request cluster type, combined with the container scale q, that is, A=a *clusterType+b*q, when A is greater than or equal to X (X is the preset threshold), when the request is an expansion request or a capacity reduction request, it is necessary to start a working instance with the corresponding instance function, by scheduling the working instance to Corresponding work nodes, so that the work nodes process the container cluster according to the work instance with the corresponding instance function, for example, expansion processing, shrinking processing; if A is less than X, due to the small size of the cluster, when there is a need for expansion or shrink When required, the cluster business monitoring process can directly provide services without creating additional instances to execute. The advantage of this is that due to the small scale, there is no need to create an instance to execute, but the execution speed is fast.
当orderType为扩容、缩容时,这种请求,操作的资源不像创建集群那么多。但是由于集群规模较小,将工作实例调度到普通型主机服务器上(调度的过程是通过(内存计算值,cpu计算值)这两个值来对比找到适合的值,服务器已经被打上标签,这类的集群请求会根据算法调度到普通的主机服务器上),并且分配工作pod资源的实例,受到集群类型因子影响,采用base*c的模式配置,base表明cpu或者内存基本的运行开销,c表明集群的不同类型因子,当集群为GPU等这类资源的因子c,需要配置更大,这样才能快速完成容器集群的扩、缩容。When the orderType is expanding or shrinking, the resources for this kind of request are not as many as creating a cluster. However, due to the small size of the cluster, the work instance is scheduled to the common host server (the scheduling process is to compare the two values (memory calculation value, cpu calculation value) to find a suitable value, and the server has been labeled, which means Class cluster requests will be scheduled to ordinary host servers according to the algorithm), and instances of work pod resources are allocated, affected by the cluster type factor, using the base*c mode configuration, base indicates the basic operating cost of cpu or memory, c indicates Factors of different types of clusters, when the cluster is a resource such as GPU, the factor c needs to be configured larger, so as to quickly complete the expansion and contraction of the container cluster.
(2)q=10-50增大后的集群,在可控范围内需求的cpu、内存资源,按照比例增加,操作系数设定为(2.5,2.5);同时根据集群类型、集群规模q,即B=clusterType*q,当B小于X时,当请求为扩容请求或者缩容请求时,需要启动具有对应实例功能的工作实例,而当请求为查看请求时,则无需启动工作实例,集群业务监听进程可以直接提供服务;当B大于或等于X时,请求为扩容请求或者缩容请求或查看请求时,需要启动具有对应实例功能的工作实例,当集群达到设定的阈值标准,通过pod实例的方式提高查看功能的效率及稳定性,具体的实现内容如下:(2) For the cluster after q=10-50 enlargement, the cpu and memory resources required within the controllable range are increased in proportion, and the operation coefficient is set to (2.5, 2.5); at the same time, according to the cluster type and cluster size q, That is, B=clusterType*q, when B is less than X, when the request is an expansion request or a capacity reduction request, it is necessary to start a working instance with the corresponding instance function, and when the request is a viewing request, there is no need to start a working instance, the cluster business The listening process can directly provide services; when B is greater than or equal to X, and the request is an expansion request, a shrinkage request, or a view request, a working instance with the corresponding instance function needs to be started. When the cluster reaches the set threshold standard, the pod instance To improve the efficiency and stability of the viewing function, the specific implementation content is as follows:
当orderType为扩容、缩容时,这种请求,操作的资源不像创建集群那么多。但是由于集群规模中等,将工作pod实例调度到空闲的普通型主机服务器上,并且分配工作pod资源的实例,受到集群类型因子影响,采用base*c的模式配置,base表明cpu或者内存基本的运行开销,c表明集群的不同类型因子,当集群为GPU等这类资源的因子c,需要配置更大,这样才能快速完成容器集群的扩、缩容。When the orderType is expanding or shrinking, the resources for this kind of request are not as many as creating a cluster. However, due to the medium scale of the cluster, the working pod instance is scheduled to an idle ordinary host server, and the instance of the working pod resource is allocated, which is affected by the cluster type factor, and the mode configuration of base*c is adopted, and base indicates that the cpu or memory is basically running Overhead, c indicates the different type factors of the cluster. When the cluster is a resource such as GPU, the factor c needs to be configured to be larger, so as to quickly complete the expansion and contraction of the container cluster.
当orderType为查看时,这里请求仅需要查看数据库相关的资源信息,因此,不管哪种类型的集群,都将它们调度到最廉价的服务器上,即通用型的主机,并且分配基本的内存、cpu开销,请求完成后,工作实例快速释放出来,这样的请求,能够被快速处理,并且资源被及时回收,提升了服务器的处理效率。When the orderType is View, the request here only needs to view the resource information related to the database. Therefore, no matter what type of cluster, they will be scheduled to the cheapest server, that is, a general-purpose host, and basic memory and CPU will be allocated. For overhead, after the request is completed, the work instance is quickly released. Such a request can be processed quickly, and resources are recovered in time, which improves the processing efficiency of the server.
(3)q=50-100的中大型集群规模,设置的cpu资源需求增大,内存开销几乎同10-50的开销,操作系数为(2.5,5),这种规模下的集群支持通过工作pod实例形式进行扩容、缩容,冻结、恢复等处理,即,请求为扩容、缩容,冻结、恢复等请求时,需要启动具有相应实例功能的工作实例来对容器集群进行处理,具体方式如下:(3) For medium and large scale clusters with q=50-100, the set cpu resource requirements increase, the memory overhead is almost the same as that of 10-50, and the operating coefficient is (2.5, 5). Clusters of this scale support through work Expand, shrink, freeze, restore, etc. in the form of a pod instance, that is, when the request is for expanding, shrinking, freezing, restoring, etc., you need to start a working instance with the corresponding instance function to process the container cluster. The specific method is as follows :
当orderType为扩容、缩容时,这种请求,操作的资源不像创建集群那么多。但是由于集群规模较大,将工作pod实例调度到对应主机型号的服 务器上,并且分配工作pod资源的实例,受到集群类型因子影响,采用base*c的模式配置,base表明cpu或者内存基本的运行开销,c表明集群的不同类型因子,当集群为GPU等这类资源的因子c,需要配置更大,这样才能快速完成容器集群的扩、缩容。When the orderType is expanding or shrinking, the resources for this kind of request are not as many as creating a cluster. However, due to the large scale of the cluster, the working pod instance is scheduled to the server of the corresponding host model, and the instance of the working pod resource is allocated, affected by the cluster type factor, the mode configuration of base*c is adopted, and base indicates the basic operation of cpu or memory Overhead, c indicates the different type factors of the cluster. When the cluster is a resource such as GPU, the factor c needs to be configured to be larger, so as to quickly complete the expansion and contraction of the container cluster.
当orderType为查看时,这里请求仅需要查看数据库相关的资源信息,因此,由于集群规模较大,将它们调度到对应集群型号的服务器上,并且分配基本的内存、cpu开销,请求完成后,工作实例快速释放出来,这样的请求,能够被快速处理,并且资源被及时回收,提升了服务器的处理效率。When the orderType is view, the request here only needs to view the resource information related to the database. Therefore, due to the large scale of the cluster, they are scheduled to the server of the corresponding cluster model, and the basic memory and CPU overhead are allocated. After the request is completed, the work Instances are quickly released, such requests can be processed quickly, and resources are recovered in a timely manner, improving the processing efficiency of the server.
当orderType为冻结、恢复时,这种请求不操作资源,仅需要调整状态,因此需要非常少的资源开销,但是规模较大,将它们调度到的对应集群类型的服务器上,处理完成后,资源自动被系统回收。When the orderType is frozen or resumed, this kind of request does not operate resources and only needs to adjust the state, so it requires very little resource overhead, but the scale is large, and they are scheduled to the corresponding cluster type server. After the processing is completed, the resource Automatically recovered by the system.
(4)q=100-1000之间的大规模集群,集群内部交互的频次呈指数上涨,因此对于资源的需求因此,根据经验调整为(10,20),集群类型仅仅为通用云主机,其它类型的容器集群,在这个规模下,资源开销过大,容器集群仅提供查看、冻结、恢复等功能,而进行查看、冻结、恢复等操作时,需要启动相应实例功能的实例,具体方式如下:(4) For large-scale clusters between q=100-1000, the frequency of internal interactions within the cluster increases exponentially, so the demand for resources is therefore adjusted to (10, 20) based on experience, and the cluster type is only general cloud hosts, others Type of container cluster, at this scale, the resource overhead is too large, and the container cluster only provides functions such as viewing, freezing, and restoring. When performing operations such as viewing, freezing, and restoring, you need to start an instance of the corresponding instance function. The specific method is as follows:
当orderType为查看时,这里请求仅需要查看数据库相关的资源信息,因此,由于集群规模较大,将它们调度到高性能的服务器上,并且根据操作因子,设定一定的内存、cpu开销,请求完成后,工作实例快速释放出来,这样的请求,能够被快速处理,并且资源被及时回收,提升了服务器的处理效率。When the orderType is view, the request here only needs to view the resource information related to the database. Therefore, due to the large scale of the cluster, they are scheduled to high-performance servers, and a certain amount of memory and cpu overhead is set according to the operation factor. Request After completion, the working instance is quickly released, such requests can be processed quickly, and resources are recovered in time, which improves the processing efficiency of the server.
当orderType为冻结、恢复时,这种请求不操作资源,仅需要调整状态,因此需要非常少的资源开销,但是规模较大,将它们调度到的高性能的服务器上,处理完成后,资源自动被系统回收。When the orderType is frozen or resumed, this kind of request does not operate resources, but only needs to adjust the state, so it requires very little resource overhead, but the scale is large, and they are scheduled to high-performance servers. After the processing is completed, the resources are automatically Recycled by the system.
(5)q=>1000以上的超大集群规模,针对大规模集群的构建,实际在业务上并不多见,因此,集群的影响不仅受到当前部署的进程影响,同时 集群内部的流量受到带宽、时延、存储性能的影响,导致控制流程上加大了,因此这个参数设定为容器规模因子的上线值,根据经验可调整为100/100,并且设定了集群类型,仅仅为通用云主机。容器集群仅提供查看、退订、缩容功能。由于集群规模太大,小实例的处理,通常会影响集群构建的稳定性,并且集群规模已经达到,不支持冻结、恢复这种集群不稳定的操作。而进行查看、退订、缩容等操作时,需要启动相应实例功能的实例,(5) The ultra-large cluster scale with q=>1000 or more is actually rare in terms of business for the construction of large-scale clusters. Therefore, the impact of clusters is not only affected by the current deployment process, but also the traffic within the cluster is affected by bandwidth, The impact of delay and storage performance leads to an increase in the control process, so this parameter is set to the upper line value of the container scale factor, which can be adjusted to 100/100 according to experience, and the cluster type is set, only for general cloud hosts . Container clusters only provide viewing, unsubscribing, and shrinking functions. Due to the large size of the cluster, the processing of small instances usually affects the stability of the cluster construction, and the cluster size has reached, and the operation of freezing and restoring such unstable clusters is not supported. When performing operations such as viewing, unsubscribing, and shrinking, it is necessary to start an instance of the corresponding instance function.
本发明实施例提供的系统,通过引入容器消息队列缓存消息,能够缓解大规模场景下,用户请求的分发。同时,设计了订单系统,处理订单的业务,集群业务监听进程,能够通过启动pod实例并调度到其它服务器,实现队列上的消息快速处理的目的,而pod的构建流程,依赖的其它资源接口,也进行了订单与后端业务分离的原则,实现其它资源处理的快速、稳定处理,从而保障了容器集群的快速、稳定构建,在集群所有资源构建成功后,批量回调订单系统,使所有相关的订单生效,减少了与订单业务频繁的交互过程,大大提升了容器集群构建的成功率及性能。The system provided by the embodiment of the present invention can ease the distribution of user requests in a large-scale scenario by introducing a container message queue to cache messages. At the same time, the order system is designed to process the order business and the cluster business monitoring process. By starting the pod instance and scheduling it to other servers, the purpose of fast processing of messages on the queue can be realized. The pod construction process depends on other resource interfaces. The principle of separation of orders and back-end business is also carried out to realize fast and stable processing of other resources, thus ensuring the fast and stable construction of container clusters. The order takes effect, reducing the frequent interaction process with the order business, and greatly improving the success rate and performance of container cluster construction.
图2示出了本发明实施例提供的容器集群构建方法的流程图。如图2所示,该方法包括以下步骤:FIG. 2 shows a flowchart of a method for constructing a container cluster provided by an embodiment of the present invention. As shown in Figure 2, the method includes the following steps:
步骤S201,订单系统接收用户通过调用集群业务逻辑接口而发送的容器集群构建请求,调用订单生成进程根据容器集群构建请求生成容器集群构建消息,将容器集群构建消息发布至容器消息队列。Step S201, the order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue.
步骤S202,集群业务监听进程若监听到容器消息队列中存在未消费的容器集群构建消息,则从容器消息队列获取容器集群构建消息,根据容器集群构建消息启动工作实例,将工作实例分发至对应的工作节点。Step S202, if the cluster business monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it obtains the container cluster construction message from the container message queue, starts the working instance according to the container cluster construction message, and distributes the working instance to the corresponding work node.
步骤S203,工作节点根据工作实例构建容器集群。In step S203, the working node builds a container cluster according to the working instance.
可选地,该方法还包括:集群业务监听进程在容器集群构建完成后,销毁工作实例。Optionally, the method further includes: the cluster service monitoring process destroys the working instance after the construction of the container cluster is completed.
可选地,容器集群构建请求包括:请求标识、请求操作类型、用户标识、所请求资源所在的资源池标识、所请求的订单信息、资源类型、容器集群参数信息,其中,容器集群参数信息包括:第一容器集群规模值。Optionally, the container cluster construction request includes: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, container cluster parameter information, wherein the container cluster parameter information includes : The first container cluster scale value.
可选地,集群业务监听进程根据容器集群构建消息启动工作实例,将工作实例分发至对应的工作节点进一步包括:Optionally, the cluster service monitoring process starts the work instance according to the container cluster construction message, and distributing the work instance to the corresponding work node further includes:
根据第一容器集群规模值确定实例功能,启动具有对应实例功能的工作实例,将具有对应实例功能的工作实例分发至对应的工作节点;Determine the instance function according to the scale value of the first container cluster, start the working instance with the corresponding instance function, and distribute the working instance with the corresponding instance function to the corresponding working node;
工作节点根据工作实例构建容器集群进一步包括:根据具有对应实例功能工作实例构建容器集群,其中,容器集群提供的集群功能与实例功能相对应。The working node constructing the container cluster according to the working instance further includes: constructing the container cluster according to the working instance having the corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.
可选地,实例功能包括以下功能中的一个或多个:创建功能、扩容功能、缩容功能、查看功能、冻结功能、恢复功能、退订功能。Optionally, the instance function includes one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.
可选地,该方法还包括:订单系统接收用户通过调用集群业务逻辑接口而发送的容器集群处理请求,调用订单生成进程根据容器集群构建请求生成容器集群处理消息,将容器集群处理消息发布至容器消息队列;Optionally, the method further includes: the order system receives the container cluster processing request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster processing message according to the container cluster construction request, and publishes the container cluster processing message to the container message queue;
集群业务监听进程若监听到容器消息队列中存在未消费的容器集群处理消息,则从容器消息队列获取容器集群处理消息,根据容器集群处理消息启动具有对应实例功能的工作实例,将具有对应实例功能的工作实例分发至对应的工作节点;或者,根据容器集群处理消息对容器集群进行处理;If the cluster business monitoring process detects that there are unconsumed container cluster processing messages in the container message queue, it will obtain container cluster processing messages from the container message queue, and start a working instance with corresponding instance functions according to the container cluster processing messages, which will have corresponding instance functions Distribute the working instances to the corresponding working nodes; or process the container cluster according to the container cluster processing message;
工作节点根据具有对应实例功能的工作实例对容器集群进行处理。Work nodes process container clusters according to work instances with corresponding instance functions.
可选地,该方法还包括:订单系统通过调用集群业务逻辑接口向用户返回状态消息,其中,状态消息用于标识容器集群处于构建中的状态。Optionally, the method further includes: the order system returns a status message to the user by calling the cluster business logic interface, wherein the status message is used to identify that the container cluster is in a state of construction.
本发明实施例提供的方法,通过引入容器消息队列缓存消息,能够缓解大规模场景下,用户请求的分发。同时,设计了订单系统,处理订单的业务,集群业务监听进程,能够通过启动pod实例并调度到其它服务器,实现队列上的消息快速处理的目的,而pod的构建流程,依赖的其它资源 接口,也进行了订单与后端业务分离的原则,实现其它资源处理的快速、稳定处理,从而保障了容器集群的快速、稳定构建,在集群所有资源构建成功后,批量回调订单系统,使所有相关的订单生效,减少了与订单业务频繁的交互过程,大大提升了容器集群构建的成功率及性能。The method provided by the embodiment of the present invention can ease the distribution of user requests in a large-scale scenario by introducing a container message queue to cache messages. At the same time, the order system is designed to process the order business and the cluster business monitoring process. By starting the pod instance and scheduling it to other servers, the purpose of fast processing of messages on the queue can be realized. The pod construction process depends on other resource interfaces. The principle of separation of orders and back-end business is also carried out to realize fast and stable processing of other resources, thus ensuring the fast and stable construction of container clusters. The order takes effect, which reduces the frequent interaction process with the order business, and greatly improves the success rate and performance of container cluster construction.
本发明实施例提供了一种非易失性计算机存储介质,计算机存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的容器集群构建方法。An embodiment of the present invention provides a non-volatile computer storage medium, where at least one executable instruction is stored on the computer storage medium, and the computer executable instruction can execute the method for constructing a container cluster in any of the above method embodiments.
图3示出了本发明实施例提供的计算设备的结构示意图,本发明具体实施例并不对计算设备的具体实现做限定。如图3所示,该计算设备可以包括:处理器(processor)、通信接口(Communications Interface)、存储器(memory)、以及通信总线。其中:处理器、通信接口、以及存储器通过通信总线完成相互间的通信。通信接口,用于与其它设备比如客户端或其它服务器等的网元通信。处理器,用于执行程序,具体可以执行上述用于计算设备的容器集群构建方法实施例中的相关步骤。FIG. 3 shows a schematic structural diagram of a computing device provided by an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device. As shown in FIG. 3, the computing device may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), and a communication bus. Wherein: the processor, the communication interface, and the memory complete the mutual communication through the communication bus. The communication interface is used to communicate with network elements of other devices such as clients or other servers. The processor is configured to execute the program, specifically, may execute the relevant steps in the above embodiment of the method for constructing a container cluster for a computing device.
示例性地,程序可以包括程序代码,该程序代码包括计算机操作指令。Exemplarily, the program may include program code including computer operation instructions.
处理器可能是CPU,或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本发明实施例的一个或多个集成电路。计算设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor may be a CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computing device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
存储器,用于存放程序。存储器可能包含高速随机存取存储器(Random Access Memory,RAM),也可能还包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。Memory for storing programs. The memory may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory.
程序具体可以用于使得处理器执行上述任意方法实施例中的容器集群构建方法。程序中各步骤的具体实现可以参见上述容器集群构建实施例中的相应步骤和单元中对应的描述,在此不赘述。所属领域的技术人员可以 清楚地了解到,为描述的方便和简洁,上述描述的设备和模块的具体工作过程,可以参考前述方法实施例中的对应过程描述,在此不再赘述。The program may specifically be used to cause the processor to execute the container cluster construction method in any of the above method embodiments. For the specific implementation of each step in the program, refer to the corresponding steps and the corresponding descriptions in the units in the above container cluster construction embodiment, and details are not repeated here. Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the foregoing method embodiments, and details are not repeated here.
在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明实施例的内容,并且上面对特定语言所做的描述是为了披露本发明实施例的最佳实施方式。The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present invention are not directed to any particular programming language. It should be understood that various programming languages can be used to implement the contents of the embodiments of the present invention described herein, and the above description of specific languages is for disclosing the best implementation mode of the embodiments of the present invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
类似地,应当理解,为了精简本发明实施例并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明实施例要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be understood that in the above description of the exemplary embodiments of the present invention, various features of the embodiments of the present invention are sometimes grouped together in order to simplify the embodiments of the present invention and facilitate understanding of one or more of the various inventive aspects. in a single embodiment, figure, or description thereof. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本 说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. And form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(Digital Signal Processing,DSP)来实现根据本发明实施例的一些或者全部部件的一些或者全部功能。本发明实施例还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明实施例的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (Digital Signal Processing, DSP) may be used in practice to implement some or all functions of some or all components according to the embodiments of the present invention. Embodiments of the present invention can also be implemented as a device or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein. Such a program implementing an embodiment of the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.
应该注意的是上述实施例对本发明实施例进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明实施例可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤,除有特殊说明外,不应理解为对执行顺序的限定。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names. The steps in the above embodiments, unless otherwise specified, should not be construed as limiting the execution order.

Claims (10)

  1. 一种容器集群构建系统,包括:订单系统、容器消息队列、多个集群业务监听进程、多个工作节点;A container cluster construction system, including: an order system, a container message queue, multiple cluster business monitoring processes, and multiple working nodes;
    订单系统,配置为接收用户通过调用集群业务逻辑接口而发送的容器集群构建请求,调用订单生成进程根据所述容器集群构建请求生成容器集群构建消息,将所述容器集群构建消息发布至容器消息队列;The order system is configured to receive the container cluster construction request sent by the user by calling the cluster business logic interface, call the order generation process to generate a container cluster construction message according to the container cluster construction request, and publish the container cluster construction message to the container message queue ;
    集群业务监听进程,配置为若监听到所述容器消息队列中存在未消费的容器集群构建消息,则从所述容器消息队列获取容器集群构建消息,根据所述容器集群构建消息启动工作实例,将所述工作实例分发至对应的工作节点;The cluster service monitoring process is configured to obtain a container cluster construction message from the container message queue if it detects that there is an unconsumed container cluster construction message in the container message queue, start a working instance according to the container cluster construction message, and set The work instances are distributed to corresponding work nodes;
    工作节点,配置为根据所述工作实例构建容器集群。A worker node configured to build a container cluster based on the worker instance.
  2. 根据权利要求1所述的系统,其中,所述集群业务监听进程还配置为:在容器集群构建完成后,销毁所述工作实例。The system according to claim 1, wherein the cluster service monitoring process is further configured to: destroy the working instance after the construction of the container cluster is completed.
  3. 根据权利要求1或2所述的系统,其中,所述容器集群构建请求包括:请求标识、请求操作类型、用户标识、所请求资源所在的资源池标识、所请求的订单信息、资源类型、容器集群参数信息,其中,所述容器集群参数信息包括:第一容器集群规模值。The system according to claim 1 or 2, wherein the container cluster construction request includes: request identifier, request operation type, user identifier, resource pool identifier where the requested resource is located, requested order information, resource type, container Cluster parameter information, wherein the container cluster parameter information includes: a first container cluster scale value.
  4. 根据权利要求3所述的系统,其中,所述集群业务监听进程进一步配置为:根据所述第一容器集群规模值确定实例功能,启动具有对应实例功能的工作实例,将具有对应实例功能的工作实例分发至对应的工作节点;The system according to claim 3, wherein the cluster service monitoring process is further configured to: determine the instance function according to the first container cluster scale value, start the working instance with the corresponding instance function, and set the working instance with the corresponding instance function The instance is distributed to the corresponding working node;
    所述工作节点进一步配置为:根据具有对应实例功能工作实例构建容器集群,其中,所述容器集群提供的集群功能与实例功能相对应。The working node is further configured to: construct a container cluster according to a working instance having a corresponding instance function, wherein the cluster function provided by the container cluster corresponds to the instance function.
  5. 根据权利要求4所述的系统,其中,所述实例功能包括以下功能中的一个或多个:创建功能、扩容功能、缩容功能、查看功能、冻结功能、恢复功能、退订功能。The system according to claim 4, wherein the instance function includes one or more of the following functions: creation function, capacity expansion function, capacity reduction function, viewing function, freezing function, recovery function, unsubscribe function.
  6. 根据权利要求1或2所述的系统,其中,所述订单系统还配置为: 接收用户通过调用集群业务逻辑接口而发送的容器集群处理请求,调用订单生成进程根据所述容器集群构建请求生成容器集群处理消息,将所述容器集群处理消息发布至容器消息队列;The system according to claim 1 or 2, wherein the order system is further configured to: receive the container cluster processing request sent by the user by calling the cluster business logic interface, and call the order generation process to generate a container according to the container cluster construction request Cluster processing messages, publishing the container cluster processing messages to the container message queue;
    所述集群业务监听进程还配置为:若监听到所述容器消息队列中存在未消费的容器集群处理消息,则从所述容器消息队列获取容器集群处理消息,根据所述容器集群处理消息启动具有对应实例功能的工作实例,将具有对应实例功能的工作实例分发至对应的工作节点;或者,根据容器集群处理消息对容器集群进行处理;The cluster service monitoring process is further configured to: if there is an unconsumed container cluster processing message in the container message queue, obtain a container cluster processing message from the container message queue, and start a For the working instance corresponding to the instance function, distribute the working instance with the corresponding instance function to the corresponding working node; or process the container cluster according to the container cluster processing message;
    所述工作节点还配置为:根据具有对应实例功能的工作实例对容器集群进行处理。The working node is further configured to: process the container cluster according to the working instance having the corresponding instance function.
  7. 根据权利要求1或2所述的系统,其中,所述订单系统还配置为:通过调用集群业务逻辑接口向所述用户返回状态消息,其中,所述状态消息用于标识容器集群处于构建中的状态。The system according to claim 1 or 2, wherein the order system is further configured to: return a status message to the user by calling the cluster business logic interface, wherein the status message is used to identify that the container cluster is under construction state.
  8. 一种容器集群构建方法,包括:A container cluster construction method, comprising:
    订单系统接收用户通过调用集群业务逻辑接口而发送的容器集群构建请求,调用订单生成进程根据所述容器集群构建请求生成容器集群构建消息,将所述容器集群构建消息发布至容器消息队列;The order system receives the container cluster construction request sent by the user by calling the cluster business logic interface, calls the order generation process to generate a container cluster construction message according to the container cluster construction request, and publishes the container cluster construction message to the container message queue;
    集群业务监听进程若监听到所述容器消息队列中存在未消费的容器集群构建消息,则从所述容器消息队列获取容器集群构建消息,根据所述容器集群构建消息启动工作实例,将所述工作实例分发至对应的工作节点;If the cluster service monitoring process detects that there is an unconsumed container cluster construction message in the container message queue, it obtains a container cluster construction message from the container message queue, starts a work instance according to the container cluster construction message, and converts the work The instance is distributed to the corresponding working node;
    工作节点根据所述工作实例构建容器集群。The working nodes build container clusters according to the working instances.
  9. 一种计算设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;A computing device, comprising: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface complete mutual communication through the communication bus;
    所述存储器配置为存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求8所述的容器集群构建方法对应的操作。The memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform operations corresponding to the method for constructing a container cluster according to claim 8 .
  10. 一种计算机存储介质,所述存储介质中存储有至少一可执行指令, 所述可执行指令使处理器执行如权利要求8所述的容器集群构建方法对应的操作。A computer storage medium, wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the method for constructing a container cluster according to claim 8 .
PCT/CN2022/118653 2021-10-28 2022-09-14 Container cluster construction method and system WO2023071576A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111264499.3A CN116048825A (en) 2021-10-28 2021-10-28 Container cluster construction method and system
CN202111264499.3 2021-10-28

Publications (1)

Publication Number Publication Date
WO2023071576A1 true WO2023071576A1 (en) 2023-05-04

Family

ID=86131860

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118653 WO2023071576A1 (en) 2021-10-28 2022-09-14 Container cluster construction method and system

Country Status (2)

Country Link
CN (1) CN116048825A (en)
WO (1) WO2023071576A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909780A (en) * 2023-09-12 2023-10-20 天津卓朗昆仑云软件技术有限公司 Memory-based local distributed queue plug-in, system and queue processing method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117707794A (en) * 2024-02-05 2024-03-15 之江实验室 Heterogeneous federation-oriented multi-class job distribution management method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453564A (en) * 2016-10-18 2017-02-22 北京京东尚科信息技术有限公司 Elastic cloud distributed massive request processing method, device and system
US9621643B1 (en) * 2015-07-31 2017-04-11 Parallels IP Holdings GmbH System and method for joining containers running on multiple nodes of a cluster
US9760400B1 (en) * 2015-07-31 2017-09-12 Parallels International Gmbh System and method for joining containers running on multiple nodes of a cluster
CN109992415A (en) * 2019-03-15 2019-07-09 上海拍拍贷金融信息服务有限公司 A kind of container dispatching method and scheduling system
CN113239118A (en) * 2021-05-31 2021-08-10 广州宏算信息科技有限公司 Block chain training system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9621643B1 (en) * 2015-07-31 2017-04-11 Parallels IP Holdings GmbH System and method for joining containers running on multiple nodes of a cluster
US9760400B1 (en) * 2015-07-31 2017-09-12 Parallels International Gmbh System and method for joining containers running on multiple nodes of a cluster
CN106453564A (en) * 2016-10-18 2017-02-22 北京京东尚科信息技术有限公司 Elastic cloud distributed massive request processing method, device and system
CN109992415A (en) * 2019-03-15 2019-07-09 上海拍拍贷金融信息服务有限公司 A kind of container dispatching method and scheduling system
CN113239118A (en) * 2021-05-31 2021-08-10 广州宏算信息科技有限公司 Block chain training system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909780A (en) * 2023-09-12 2023-10-20 天津卓朗昆仑云软件技术有限公司 Memory-based local distributed queue plug-in, system and queue processing method
CN116909780B (en) * 2023-09-12 2023-11-17 天津卓朗昆仑云软件技术有限公司 Memory-based local distributed queue plug-in, system and queue processing method

Also Published As

Publication number Publication date
CN116048825A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
WO2023071576A1 (en) Container cluster construction method and system
EP3073374B1 (en) Thread creation method, service request processing method and related device
JP6669682B2 (en) Cloud server scheduling method and apparatus
CN108449410B (en) Message management method, system and related device in cloud platform
CN107087019B (en) Task scheduling method and device based on end cloud cooperative computing architecture
CN111338774B (en) Distributed timing task scheduling system and computing device
CN111338773B (en) Distributed timing task scheduling method, scheduling system and server cluster
JP2019522293A (en) Acceleration resource processing method and apparatus
US9104488B2 (en) Support server for redirecting task results to a wake-up server
WO2011124077A1 (en) Method and system for virtual machine management, virtual machine management server
WO2013107012A1 (en) Task processing system and task processing method for distributed computation
CN106656525B (en) Data broadcasting system, data broadcasting method and equipment
CN105187327A (en) Distributed message queue middleware
CN107623731B (en) Task scheduling method, client, service cluster and system
WO2023169175A1 (en) Request processing method and device, computer equipment, and storage device
WO2021120633A1 (en) Load balancing method and related device
CN112925607A (en) System capacity expansion and contraction method and device and electronic equipment
CN111427675A (en) Data processing method and device and computer readable storage medium
CN110806928A (en) Job submitting method and system
CN111200606A (en) Deep learning model task processing method, system, server and storage medium
CN106911741B (en) Method for balancing virtual network management file downloading load and network management server
CN111586140A (en) Data interaction method and server
WO2022257247A1 (en) Data processing method and apparatus, and computer-readable storage medium
CN111371848A (en) Request processing method, device, equipment and storage medium
CN109388501B (en) Communication matching method, device, equipment and medium based on face recognition request

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885463

Country of ref document: EP

Kind code of ref document: A1