Detailed Description
In order to explain in detail possible application scenarios, technical principles, practical embodiments, and the like of the present application, the following detailed description is given with reference to the accompanying drawings in conjunction with the listed embodiments. The embodiments described herein are merely for more clearly illustrating the technical solutions of the present application, and therefore, the embodiments are only used as examples, and the scope of the present application is not limited thereby.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or related to other embodiments specifically defined. In principle, in the present application, the technical features mentioned in the embodiments can be combined in any manner to form a corresponding implementable technical solution as long as there is no technical contradiction or conflict.
Unless defined otherwise, technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the use of relational terms herein is intended to describe specific embodiments only and is not intended to limit the present application.
In the description of the present application, the term "and/or" is a expression for describing a logical relationship between objects, meaning that three relationships may exist, for example a and/or B, meaning: there are three cases of A, B, and both A and B. In addition, the character "/" herein generally indicates that the former and latter associated objects are in a logical relationship of "or".
In this application, terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Without further limitation, in this application, the use of "including," "comprising," "having," or other similar open-ended expressions in phrases and expressions of "including," "comprising," or other similar expressions, is intended to encompass a non-exclusive inclusion, and such expressions do not exclude the presence of additional elements in a process, method, or article that includes the recited elements, such that a process, method, or article that includes a list of elements may include not only those elements but also other elements not expressly listed or inherent to such process, method, or article.
As is understood in the "review guidelines," in this application, the terms "greater than," "less than," "more than," and the like are to be understood as excluding the number; the expressions "above", "below", "within" and the like are understood to include the present numbers. In addition, in the description of the embodiments of the present application, "a plurality" means two or more (including two), and expressions related to "a plurality" similar thereto are also understood, for example, "a plurality of groups", "a plurality of times", and the like, unless specifically defined otherwise.
In a first aspect, a resource scheduling system (as shown in fig. 2) is provided, where the scheduling system includes a platform adapter (platform adapter) deployed in a central computer room, a scheduler (scheduler), and a distributed state machine (distributed state machine) deployed in an IDC computer room, an instance (instance), where the platform adapter manages the scheduler, and the distributed state machine manages a corresponding instance. In one embodiment, a cloud computing cluster has a plurality of server devices, and the servers may be deployed in different places, for example, in a central computer room and an IDC computer room respectively, and generally, the central computer room and the IDC computer room are not in the same geographical region. At least one distributed scheduler is deployed in a server of the central machine room, the central machine rooms can be multiple, the schedulers can be deployed respectively, the central machine rooms can be partitioned in different geographies respectively, and the IDC machine rooms can be partitioned in different geographies respectively. The schedulers can be respectively located in 2 central machine rooms, or at least 2 schedulers in the schedulers are located in the same central machine room; the schedulers work independently, when the scheduling resources are selected, the scheduler does not depend on the decision of other schedulers, and each scheduler stores data of the full resource object of the cloud computing cluster system. And when the state of the resource object changes, the distributed state machine synchronizes the state of the resource object to the scheduler for storage, for example, as shown in fig. 3, the user of the acquisition module (fetch) of the scheduler synchronizes the state of the resource object from the distributed state machine to the scheduler, and loads the resource state into the resource state database accessed by the filter module through the loading module (load).
In one embodiment, the distributed state machines are located in an IDC room, and for each distributed state machine, all host nodes belonging to the cloud computing cluster in the room are maintained. Different distributed state machines are independent of each other; independent from each other between the entities (distributed state machines or schedulers) in this context means that there can be concurrent execution between the same kind of entities. The distributed state machine manages the state of the host nodes, the state of the virtualized resources (resource objects in use by the user) allocated to the user, and the like.
In an embodiment, as shown in fig. 3, a queue of scheduling requests is stored in a scheduling request queue (scheduler request), a filter module (filter) of a scheduler obtains scheduling requests from the scheduling request queue (as will be understood by those skilled in the art, the figure shows a case where only one scheduler obtains scheduling requests, but the step of obtaining scheduling requests from queues by multiple schedulers may be high concurrency in the solution of the present invention), the filter module obtains a candidate result set by filtering resource objects according to scheduling requests, a score module (score) scores the candidate result set, and binds a candidate result with an optimal score through a binding module (bind), and sends a verification condition to a distributed state machine, and the distributed state machine verifies the determination result. For example, in one embodiment, the scheduling process includes the scheduler accepting a scheduling request and filtering resource objects according to the scheduling request to obtain an alternative result set. Resource objects in a cloud computing cluster include a Node (host), a Pod (a virtualized resource object), a Service, an RC (replication controller), and the like. In some embodiments, an alternative result set refers to a set of hosts that meet a preselected policy. A host that meets the preselected policy may, in some embodiments, be a host that meets the conditions (e.g., CPU, memory, storage conditions) for deploying the virtualized hardware resources corresponding to the scheduling request. In some embodiments, the process of filtering includes scoring each host and its resource conditions, thereby filtering out candidate result sets that meet the conditions. For example, the resource object is filtered to obtain a filtering result set according to at least one parameter of CPU, GPU, memory, storage volume, tag labels, hostname, namespace, image downloading speed and data transmission speed included in the scheduler request expectation. In some embodiments, the screening may be divided into pre-selection and preferred steps, i.e. a set of resource objects is obtained according to the pre-screening, and then the resource objects in the previous step are scored (score) to screen the optimal resource objects that meet the resource scheduling request.
Different optimal resource objects can be selected by different scoring modes. The screening conditions in some embodiments include port values, number of host resources (CPU, storage, etc.), volume (storage volume), lables values, hostname (hostname), namespace (namespace), image download speed, data transfer speed, etc. If the scores of a plurality of host machines are parallel to be the first, one host machine node can be randomly selected for scheduling.
In some embodiments, the score of the host is evaluated according to the virtualized hardware resources already run by the host and the virtualized hardware resources to be applied, for example, if the remaining resources of the host can satisfy the virtualized hardware resources to be applied, the next step, otherwise, the score is 0. In the next step of calculation, the score is inversely related to the operated virtualized hardware resource, so that the resource request can be dispersed as much as possible, and the load of the host machine is balanced.
In some embodiments, resources in the cloud computing cluster all have a metadata. When a scheduler needs to apply for virtualized hardware resources in a host, it will typically attempt to schedule the appropriate resource objects and be considered successful. Because the schedulers are independent, a plurality of schedulers may try to schedule the resource object at the same time, and it is uncertain which scheduler successfully schedules the resource object actually, the scheduler modifies the field during scheduling, and sends the modified field value as a verification condition to the distributed state machine for comparison and determination, if the resource object metadata. And after the scheduling fails, trying the resource object with the next sequential score according to the score of the resource object, and traversing the judgment respectively obtained by the distributed state machines corresponding to all the resource objects in the screening result set until the judgment is consistent in the traversing process or the distributed state machines corresponding to all the resource objects are traversed. In some embodiments, this creates and allocates a corresponding virtualized hardware resource to the user in the host after the scheduling is successful. In other embodiments, the scheduling may be an add-drop-and-modify-check of the resource object.
Compared with other cloud computing cluster systems, the method has the advantages that the lock of the resource object is directly obtained firstly during scheduling, and the resource object is scheduled after the lock of the resource object is obtained, so that other schedulers are prevented from scheduling the resource object. Such methods are prone to operation failure or locking due to the additional resources required for locking and unlocking operations, especially when call requests are intensive. Other cloud computing cluster systems operate scheduling resources through a unique central node (such as a master node), so that scheduling capacity is limited by processing capacity of the master node, and cloud computing cluster scale cannot be enlarged and scheduling efficiency cannot be guaranteed. The method avoids the scheme of wasting CPU resources by continuously retrying to obtain the lock or continuously trying to obtain the resource object after failing to acquire the scheduling resource in an optimistic mode, and meanwhile, the schedulers can respectively and independently realize scheduling by using a mode of respectively deploying the schedulers in double centers, so that the scheduling efficiency of the schedulers is improved, and the problem of scheduling failure of the cloud computing cluster caused by large concurrency and competition of scheduling requests is solved. In particular, in order to enable a user to access a cloud computing cluster more conveniently, a center machine room is usually distributed in different geographical regions, and a plurality of IDC machine rooms are also distributed in different geographical regions, so that the cloud computing cluster layout mode is beneficial to reducing the spatial distance between a server providing specific processing in the cloud computing cluster and the user, and reducing the time delay of the user for accessing the cloud computing cluster, but a higher requirement is provided for scheduling.
In addition, because the cloud computing cluster system in the text adopts a structure of a plurality of central machine rooms, when one central machine room stops working due to an accident, the whole system can maintain the functions of the cloud computing cluster system by depending on the other central machine room; meanwhile, the dispatching requests can be processed by the plurality of central machine rooms in a concurrent mode, the capability of horizontal expansion of the cloud computing cluster system is improved, and the expandable scale of the cloud computing cluster is improved.
In one embodiment of the present invention, the distributed state machines may be deployed in a central room, and the central room state machines communicate through Link bus services deployed in each room. Each distributed state machine can select which central machine room to deploy specifically according to the distance of an actual physical network. The deployment is such that the communication between the scheduler and the distributed state machine becomes an intranet communication. If the same resource is scheduled among the schedulers and a conflict or competition occurs, the retry cost is low.
In an embodiment of the invention, data of resource objects in the cloud computing cluster are compressed, so that resource metadata to be scheduled can be compressed and optimized in a scheduler memory, memory occupation is reduced, screening and scheduling efficiency of the scheduler is reduced, and the effect of supporting container resource scale of more than million orders is achieved. The compression may be by representing the data of the resource object with short characters or by making short codes for the data representation of the resource object.
In an embodiment of the method, the method further includes using the snapshot to rapidly implement the restart of the scheduler, specifically including configuring a resource object for creating the snapshot, monitoring the modification of the configuration by a CSI snapshot controller (CSI-snapshot controller) and calling a CSI plug-in (CSI-plugin) through a gRPC interface, and specifically implementing the action of storing the snapshot by the CSI plug-in through an OpenAPI interface. When the scheduler is restarted, the data is quickly restored through the snapshot, which specifically includes: when the cloud computing cluster scheduler is restarted, the state of the resource object needs to be restored, snapshot data obtained through the associated snapshot ID is restored to newly created storage, and the newly created storage is used for constructing a user server environment.
In one embodiment of the method, the method comprises allowing the scheduling request of the user to contain user-defined resource scheduling policy information, and screening and deciding scheduling according to the user-defined resource scheduling policy. The method specifically comprises the following steps:
receiving a scheduling request, analyzing and obtaining the resource scheduling requirement description,
generating a scheduler plug-in and a plug-in configuration file according to the scheduling requirement description,
registering the plug-in with the dispatcher by declaring the plug-in configuration file to the cloud computing cluster,
the dispatcher uses the plug-in to realize resource dispatching for the dispatching request.
The content in the resource scheduling requirement description can comprise a topology constraint description, a namespace description, a network state description and the like.
The function of the scheduling plug-in is to call an interface of a filtering and screening process of the scheduler to complete user-defined scheduling logic. The interface to the screening process is an extension to the plug-in the scheduler. The plug-in board can declare the implementation mode of at least one interface of a QueueSort interface, a Pre-Filter interface, a Filter interface, a Post-Filter interface, a Pre-Score interface, a normalized screening interface, a Reserve interface, a Permit interface, a Pre-Bind interface, a Bind interface and an Unreserve interface. After the plug-in is registered, the scheduler calls a specific interface implementation mode of the plug-in statement in the filtering and screening process of executing the scheduling request, so that the purpose of customizing a resource scheduling strategy by a user is realized, and the resource scheduling of the cloud computing cluster can more flexibly adapt to the requirements of the user in terms of deployment and performance.
As known to those skilled in the art, the resource objects in the cloud computing cluster include nodes, pod, service, RC, and the like. The scheduling process of the cloud computing cluster is described herein only by taking an example of how the cloud computing cluster screens out nodes meeting the conditions to run the Pod, but those skilled in the art can also schedule resources in other scenarios after knowing the relevant knowledge of the cloud computing cluster.
In this document, where technical terms are referred to, for the avoidance of doubt, their meanings are explained below:
a volume (storage volume) is defined on a Pod and is part of a computing resource, whereas network storage is, in fact, a physical resource that exists relatively independent of the computing resource. For example, in the case of using a virtual machine, we will usually define a network storage, and then scratch out a "network disk" from it and attach to the virtual machine.
namespace is another very important concept in cloud computing cluster systems, and namespace is used in many cases to achieve multi-tenant resource isolation. The namespaces form different logically grouped projects, groups or user groups by 'distributing' resource objects in the cloud computing cluster to different namespaces, so that different groups can be respectively managed while sharing and using resources of the whole cloud computing cluster.
label is another core concept in cloud computing clusters. A label is a key-value pair of key and value, where key and value are specified by the user himself. labels may be attached to various resource objects, a resource object may define any number of labels, and the same label may be attached to any number of resource objects. The label is usually determined when the resource object is defined, and can be dynamically added or deleted after the object is created.
The Service defines an access portal address of the Service, an application (Pod) at the front end accesses a set of cloud computing cluster instances composed of Pod copies behind the application through the portal address, and seamless docking is realized between the Service and the cloud computing cluster composed of the Pod copies at the back end through a Label Selector.
The role of the Replication Controller (RC) is to ensure that the Service capability and the Service quality of the Service always meet the expected standards.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
Finally, it should be noted that, although the above embodiments have been described in the text and drawings of the present application, the scope of the patent protection of the present application is not limited thereby. All technical solutions which are generated by replacing or modifying the equivalent structure or the equivalent flow according to the contents described in the text and the drawings of the present application, and which are directly or indirectly implemented in other related technical fields, are included in the scope of protection of the present application.