CN115686802B - Cloud computing cluster scheduling system - Google Patents

Cloud computing cluster scheduling system Download PDF

Info

Publication number
CN115686802B
CN115686802B CN202310000528.8A CN202310000528A CN115686802B CN 115686802 B CN115686802 B CN 115686802B CN 202310000528 A CN202310000528 A CN 202310000528A CN 115686802 B CN115686802 B CN 115686802B
Authority
CN
China
Prior art keywords
resource
scheduler
scheduling
interface
cloud computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310000528.8A
Other languages
Chinese (zh)
Other versions
CN115686802A (en
Inventor
夏之斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Haima Cloud Technology Co ltd
Original Assignee
Haima Cloud Tianjin Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haima Cloud Tianjin Information Technology Co Ltd filed Critical Haima Cloud Tianjin Information Technology Co Ltd
Priority to CN202310000528.8A priority Critical patent/CN115686802B/en
Publication of CN115686802A publication Critical patent/CN115686802A/en
Application granted granted Critical
Publication of CN115686802B publication Critical patent/CN115686802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a cloud computing cluster scheduling system, which comprises schedulers respectively deployed in at least two central machine rooms, wherein each central machine room is at least provided with one scheduler; comprises a distributed state machine; the distributed state machine synchronizes the resource object state to the scheduler in real time; the schedulers are mutually independent, each scheduler stores the state data of the resource object in full quantity and synchronizes the state change of the resource object in real time; after the scheduler receives the resource scheduling request, screening a scheduling object according to the scheduling request to obtain a screening result set, and generating a verification condition according to the scheduling request; the distributed state machine corresponding to the resource object A judges according to the screening result set and the verification conditions, and if the judgment is consistent, the scheduling is successful; the method and the device can meet the higher requirements of the cloud computing cluster in the aspects of scale growth, convenience of expansion, deployment efficiency, stability and the like.

Description

Cloud computing cluster scheduling system
Technical Field
The application relates to the technical field of cloud platform management, in particular to a cloud computing cluster scheduling system.
Background
With the development of the cloud computing technology, higher requirements are put forward on a scheduling system of a large-scale cloud computing cluster. On the one hand, higher cloud computing cluster resource utilization rate is needed. The cloud platform generally virtualizes hardware resources by using a virtualization technology to realize mixed deployment of different tasks, so as to improve resource utilization rate. However, when the scale is larger, a more suitable resource scheduling system needs to be designed.
A cloud computing cluster system is a parallel or distributed system of interconnected computers. For a cloud computing cluster system, it is most important to manage, schedule and allocate computing, storage and network resources in the system according to requirements.
This paragraph explains scheduling of a cloud computing cluster in a specific case, and a developer (i.e., a cloud computing cluster user) applies for a cloud host, a cloud storage, and the like, which involves submitting a resource request to the cloud computing cluster, and the cloud computing cluster needs to automatically select a specific host and a storage location inside the cluster in addition to allocating a computing resource access channel, a host, a storage space, a network address, and the like to the developer. In other clusters, it is also necessary to automatically download the image to the target cloud host. Through the above automation steps, a series of automation processes such as deploying/initializing a server for a developer can be realized. The foregoing all require the cloud computing cluster to effectively manage and allocate host resources, storage resources, and networked resources to achieve maximum utilization of cloud computing cluster resources, and these processes are all achieved by requiring cloud computing cluster automated management scheduling.
In the prior art, a scheduling system of a cloud computing cluster is shown in fig. 1-1 or fig. 1-2; in the cloud computing cluster structure shown in fig. 1-1, the Master node is responsible for managing and controlling the entire cloud computing cluster, and basically, all control commands of the cloud computing cluster are sent to the Master node, which is responsible for specific execution processes. The Master Node runs some key control programs, such as a discovery module and a scheduler, and the discovery module is used for communicating with the Node nodes and acquiring the state of the Node nodes and the state of a docker container in the Node nodes in real time. The Docker daemon manages the running Docker containers. The scheduling framework in the cloud computing cluster structure shown in fig. 1-2 replaces the function of the scheduler in the cloud computing cluster structure shown in fig. 1-1, the function of the master Node includes the maintenance of resource state information in the cloud computing cluster, and the Slave Node and the task in the cloud computing cluster structure are respectively equivalent to the Node and the docker instance in the cloud computing cluster structure shown in fig. 1-1.
The scheduling mode shown in fig. 1-1 is also referred to as centralized scheduling, the cloud computing cluster stores relevant information of cloud computing cluster resources on a scheduler, and the scheduler is unique, when the cloud computing cluster is explosively increased, the burden of the scheduler is too heavy, and the throughput of the cloud computing cluster is lowered, wherein the scheduling mode shown in fig. 1-2 is also referred to as two-layer scheduling, a master is responsible for providing status information of the resources to a scheduling framework, a scheduling decision is made by the scheduling framework, and the scale of resource management of the cloud computing cluster is limited by the processing capacity of a master node due to the fact that the master is unique.
In order to reasonably and effectively use the resources of the cloud computing cluster and meet the resource requests of differentiated data services and various tasks, a cloud computing cluster resource scheduling system is needed to manage and schedule the container cloud computing cluster.
The existing resource scheduling system is mainly designed for batch operation, but with the development of the internet, higher requirements are put forward on the aspects of scale growth, convenience of expansion, deployment efficiency, stability and the like of a cloud computing cluster.
Disclosure of Invention
The above description of the present invention is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clearly understood by those skilled in the art, the present invention may be further implemented according to the content described in the text and drawings of the present application, and in order to make the above objects, other objects, features, and advantages of the present application more easily understood, the following description is made in conjunction with the detailed description of the present application and the drawings.
In one aspect, the invention provides a cloud computing cluster scheduling system, which comprises schedulers respectively deployed in at least two central computer rooms, wherein each central computer room is at least provided with one scheduler;
comprises a distributed state machine;
the distributed state machine synchronizes the resource object state to the scheduler in real time;
the schedulers are independent from each other, each scheduler fully stores the state data of the resource object and synchronizes the state change of the resource object in real time;
after the scheduler receives the resource scheduling request, screening a scheduling object according to the scheduling request to obtain a screening result set, and generating a verification condition according to the scheduling request;
the distributed state machine corresponding to the resource object A judges according to the screening result set and the verification conditions, and if the judgment is consistent, the scheduling is successful;
resource object a is one of the resource objects in the screening result set.
Compared with other cloud computing cluster systems, the method has the advantages that the lock of the resource object is directly obtained firstly during scheduling, and the resource object is scheduled after the lock of the resource object is obtained, so that other schedulers are prevented from scheduling the resource object. Such methods are prone to operation failure or locking due to the additional resources required for locking and unlocking operations, especially when call requests are intensive. Other cloud computing cluster systems operate scheduling resources through a unique central node (such as a master node), so that scheduling capacity is limited by processing capacity of the master node, and cloud computing cluster scale cannot be enlarged and scheduling efficiency cannot be guaranteed. The method avoids the scheme of wasting CPU resources by continuously retrying to obtain the lock or continuously trying to obtain the resource object after failing to acquire the scheduling resource in an optimistic mode, and meanwhile, the schedulers can respectively and independently realize scheduling by using a mode of respectively deploying the schedulers in double centers, so that the scheduling efficiency of the schedulers is improved, and the problem of scheduling failure of the cloud computing cluster caused by large concurrency and competition of scheduling requests is solved.
In addition, because the cloud computing cluster system in the text adopts a structure of a plurality of central machine rooms, when one central machine room stops working due to an accident, the whole system can maintain the functions of the cloud computing cluster system by depending on the other central machine room; meanwhile, the dispatching requests can be processed by the plurality of central machine rooms in a concurrent mode, the capability of horizontal expansion of the cloud computing cluster system is improved, and the expandable scale of the cloud computing cluster is improved.
Drawings
The drawings are only for purposes of illustrating the principles, implementations, applications, features, and effects of particular embodiments of the present application, as well as others related thereto, and are not to be construed as limiting the application.
In the drawings of the specification:
FIG. 1-1 is a prior art cloud computing cluster scheduling system;
FIGS. 1-2 illustrate another prior art cloud computing cluster scheduling system;
fig. 2 is a schematic structural diagram of a cloud computing cluster scheduling system according to the present application;
fig. 3 is a schematic diagram of a scheduler according to the present application.
Detailed Description
In order to explain in detail possible application scenarios, technical principles, practical embodiments, and the like of the present application, the following detailed description is given with reference to the accompanying drawings in conjunction with the listed embodiments. The embodiments described herein are merely for more clearly illustrating the technical solutions of the present application, and therefore, the embodiments are only used as examples, and the scope of the present application is not limited thereby.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or related to other embodiments specifically defined. In principle, in the present application, the technical features mentioned in the embodiments can be combined in any manner to form a corresponding implementable technical solution as long as there is no technical contradiction or conflict.
Unless defined otherwise, technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the use of relational terms herein is intended to describe specific embodiments only and is not intended to limit the present application.
In the description of the present application, the term "and/or" is a expression for describing a logical relationship between objects, meaning that three relationships may exist, for example a and/or B, meaning: there are three cases of A, B, and both A and B. In addition, the character "/" herein generally indicates that the former and latter associated objects are in a logical relationship of "or".
In this application, terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Without further limitation, in this application, the use of "including," "comprising," "having," or other similar open-ended expressions in phrases and expressions of "including," "comprising," or other similar expressions, is intended to encompass a non-exclusive inclusion, and such expressions do not exclude the presence of additional elements in a process, method, or article that includes the recited elements, such that a process, method, or article that includes a list of elements may include not only those elements but also other elements not expressly listed or inherent to such process, method, or article.
As is understood in the "review guidelines," in this application, the terms "greater than," "less than," "more than," and the like are to be understood as excluding the number; the expressions "above", "below", "within" and the like are understood to include the present numbers. In addition, in the description of the embodiments of the present application, "a plurality" means two or more (including two), and expressions related to "a plurality" similar thereto are also understood, for example, "a plurality of groups", "a plurality of times", and the like, unless specifically defined otherwise.
In a first aspect, a resource scheduling system (as shown in fig. 2) is provided, where the scheduling system includes a platform adapter (platform adapter) deployed in a central computer room, a scheduler (scheduler), and a distributed state machine (distributed state machine) deployed in an IDC computer room, an instance (instance), where the platform adapter manages the scheduler, and the distributed state machine manages a corresponding instance. In one embodiment, a cloud computing cluster has a plurality of server devices, and the servers may be deployed in different places, for example, in a central computer room and an IDC computer room respectively, and generally, the central computer room and the IDC computer room are not in the same geographical region. At least one distributed scheduler is deployed in a server of the central machine room, the central machine rooms can be multiple, the schedulers can be deployed respectively, the central machine rooms can be partitioned in different geographies respectively, and the IDC machine rooms can be partitioned in different geographies respectively. The schedulers can be respectively located in 2 central machine rooms, or at least 2 schedulers in the schedulers are located in the same central machine room; the schedulers work independently, when the scheduling resources are selected, the scheduler does not depend on the decision of other schedulers, and each scheduler stores data of the full resource object of the cloud computing cluster system. And when the state of the resource object changes, the distributed state machine synchronizes the state of the resource object to the scheduler for storage, for example, as shown in fig. 3, the user of the acquisition module (fetch) of the scheduler synchronizes the state of the resource object from the distributed state machine to the scheduler, and loads the resource state into the resource state database accessed by the filter module through the loading module (load).
In one embodiment, the distributed state machines are located in an IDC room, and for each distributed state machine, all host nodes belonging to the cloud computing cluster in the room are maintained. Different distributed state machines are independent of each other; independent from each other between the entities (distributed state machines or schedulers) in this context means that there can be concurrent execution between the same kind of entities. The distributed state machine manages the state of the host nodes, the state of the virtualized resources (resource objects in use by the user) allocated to the user, and the like.
In an embodiment, as shown in fig. 3, a queue of scheduling requests is stored in a scheduling request queue (scheduler request), a filter module (filter) of a scheduler obtains scheduling requests from the scheduling request queue (as will be understood by those skilled in the art, the figure shows a case where only one scheduler obtains scheduling requests, but the step of obtaining scheduling requests from queues by multiple schedulers may be high concurrency in the solution of the present invention), the filter module obtains a candidate result set by filtering resource objects according to scheduling requests, a score module (score) scores the candidate result set, and binds a candidate result with an optimal score through a binding module (bind), and sends a verification condition to a distributed state machine, and the distributed state machine verifies the determination result. For example, in one embodiment, the scheduling process includes the scheduler accepting a scheduling request and filtering resource objects according to the scheduling request to obtain an alternative result set. Resource objects in a cloud computing cluster include a Node (host), a Pod (a virtualized resource object), a Service, an RC (replication controller), and the like. In some embodiments, an alternative result set refers to a set of hosts that meet a preselected policy. A host that meets the preselected policy may, in some embodiments, be a host that meets the conditions (e.g., CPU, memory, storage conditions) for deploying the virtualized hardware resources corresponding to the scheduling request. In some embodiments, the process of filtering includes scoring each host and its resource conditions, thereby filtering out candidate result sets that meet the conditions. For example, the resource object is filtered to obtain a filtering result set according to at least one parameter of CPU, GPU, memory, storage volume, tag labels, hostname, namespace, image downloading speed and data transmission speed included in the scheduler request expectation. In some embodiments, the screening may be divided into pre-selection and preferred steps, i.e. a set of resource objects is obtained according to the pre-screening, and then the resource objects in the previous step are scored (score) to screen the optimal resource objects that meet the resource scheduling request.
Different optimal resource objects can be selected by different scoring modes. The screening conditions in some embodiments include port values, number of host resources (CPU, storage, etc.), volume (storage volume), lables values, hostname (hostname), namespace (namespace), image download speed, data transfer speed, etc. If the scores of a plurality of host machines are parallel to be the first, one host machine node can be randomly selected for scheduling.
In some embodiments, the score of the host is evaluated according to the virtualized hardware resources already run by the host and the virtualized hardware resources to be applied, for example, if the remaining resources of the host can satisfy the virtualized hardware resources to be applied, the next step, otherwise, the score is 0. In the next step of calculation, the score is inversely related to the operated virtualized hardware resource, so that the resource request can be dispersed as much as possible, and the load of the host machine is balanced.
In some embodiments, resources in the cloud computing cluster all have a metadata. When a scheduler needs to apply for virtualized hardware resources in a host, it will typically attempt to schedule the appropriate resource objects and be considered successful. Because the schedulers are independent, a plurality of schedulers may try to schedule the resource object at the same time, and it is uncertain which scheduler successfully schedules the resource object actually, the scheduler modifies the field during scheduling, and sends the modified field value as a verification condition to the distributed state machine for comparison and determination, if the resource object metadata. And after the scheduling fails, trying the resource object with the next sequential score according to the score of the resource object, and traversing the judgment respectively obtained by the distributed state machines corresponding to all the resource objects in the screening result set until the judgment is consistent in the traversing process or the distributed state machines corresponding to all the resource objects are traversed. In some embodiments, this creates and allocates a corresponding virtualized hardware resource to the user in the host after the scheduling is successful. In other embodiments, the scheduling may be an add-drop-and-modify-check of the resource object.
Compared with other cloud computing cluster systems, the method has the advantages that the lock of the resource object is directly obtained firstly during scheduling, and the resource object is scheduled after the lock of the resource object is obtained, so that other schedulers are prevented from scheduling the resource object. Such methods are prone to operation failure or locking due to the additional resources required for locking and unlocking operations, especially when call requests are intensive. Other cloud computing cluster systems operate scheduling resources through a unique central node (such as a master node), so that scheduling capacity is limited by processing capacity of the master node, and cloud computing cluster scale cannot be enlarged and scheduling efficiency cannot be guaranteed. The method avoids the scheme of wasting CPU resources by continuously retrying to obtain the lock or continuously trying to obtain the resource object after failing to acquire the scheduling resource in an optimistic mode, and meanwhile, the schedulers can respectively and independently realize scheduling by using a mode of respectively deploying the schedulers in double centers, so that the scheduling efficiency of the schedulers is improved, and the problem of scheduling failure of the cloud computing cluster caused by large concurrency and competition of scheduling requests is solved. In particular, in order to enable a user to access a cloud computing cluster more conveniently, a center machine room is usually distributed in different geographical regions, and a plurality of IDC machine rooms are also distributed in different geographical regions, so that the cloud computing cluster layout mode is beneficial to reducing the spatial distance between a server providing specific processing in the cloud computing cluster and the user, and reducing the time delay of the user for accessing the cloud computing cluster, but a higher requirement is provided for scheduling.
In addition, because the cloud computing cluster system in the text adopts a structure of a plurality of central machine rooms, when one central machine room stops working due to an accident, the whole system can maintain the functions of the cloud computing cluster system by depending on the other central machine room; meanwhile, the dispatching requests can be processed by the plurality of central machine rooms in a concurrent mode, the capability of horizontal expansion of the cloud computing cluster system is improved, and the expandable scale of the cloud computing cluster is improved.
In one embodiment of the present invention, the distributed state machines may be deployed in a central room, and the central room state machines communicate through Link bus services deployed in each room. Each distributed state machine can select which central machine room to deploy specifically according to the distance of an actual physical network. The deployment is such that the communication between the scheduler and the distributed state machine becomes an intranet communication. If the same resource is scheduled among the schedulers and a conflict or competition occurs, the retry cost is low.
In an embodiment of the invention, data of resource objects in the cloud computing cluster are compressed, so that resource metadata to be scheduled can be compressed and optimized in a scheduler memory, memory occupation is reduced, screening and scheduling efficiency of the scheduler is reduced, and the effect of supporting container resource scale of more than million orders is achieved. The compression may be by representing the data of the resource object with short characters or by making short codes for the data representation of the resource object.
In an embodiment of the method, the method further includes using the snapshot to rapidly implement the restart of the scheduler, specifically including configuring a resource object for creating the snapshot, monitoring the modification of the configuration by a CSI snapshot controller (CSI-snapshot controller) and calling a CSI plug-in (CSI-plugin) through a gRPC interface, and specifically implementing the action of storing the snapshot by the CSI plug-in through an OpenAPI interface. When the scheduler is restarted, the data is quickly restored through the snapshot, which specifically includes: when the cloud computing cluster scheduler is restarted, the state of the resource object needs to be restored, snapshot data obtained through the associated snapshot ID is restored to newly created storage, and the newly created storage is used for constructing a user server environment.
In one embodiment of the method, the method comprises allowing the scheduling request of the user to contain user-defined resource scheduling policy information, and screening and deciding scheduling according to the user-defined resource scheduling policy. The method specifically comprises the following steps:
receiving a scheduling request, analyzing and obtaining the resource scheduling requirement description,
generating a scheduler plug-in and a plug-in configuration file according to the scheduling requirement description,
registering the plug-in with the dispatcher by declaring the plug-in configuration file to the cloud computing cluster,
the dispatcher uses the plug-in to realize resource dispatching for the dispatching request.
The content in the resource scheduling requirement description can comprise a topology constraint description, a namespace description, a network state description and the like.
The function of the scheduling plug-in is to call an interface of a filtering and screening process of the scheduler to complete user-defined scheduling logic. The interface to the screening process is an extension to the plug-in the scheduler. The plug-in board can declare the implementation mode of at least one interface of a QueueSort interface, a Pre-Filter interface, a Filter interface, a Post-Filter interface, a Pre-Score interface, a normalized screening interface, a Reserve interface, a Permit interface, a Pre-Bind interface, a Bind interface and an Unreserve interface. After the plug-in is registered, the scheduler calls a specific interface implementation mode of the plug-in statement in the filtering and screening process of executing the scheduling request, so that the purpose of customizing a resource scheduling strategy by a user is realized, and the resource scheduling of the cloud computing cluster can more flexibly adapt to the requirements of the user in terms of deployment and performance.
As known to those skilled in the art, the resource objects in the cloud computing cluster include nodes, pod, service, RC, and the like. The scheduling process of the cloud computing cluster is described herein only by taking an example of how the cloud computing cluster screens out nodes meeting the conditions to run the Pod, but those skilled in the art can also schedule resources in other scenarios after knowing the relevant knowledge of the cloud computing cluster.
In this document, where technical terms are referred to, for the avoidance of doubt, their meanings are explained below:
a volume (storage volume) is defined on a Pod and is part of a computing resource, whereas network storage is, in fact, a physical resource that exists relatively independent of the computing resource. For example, in the case of using a virtual machine, we will usually define a network storage, and then scratch out a "network disk" from it and attach to the virtual machine.
namespace is another very important concept in cloud computing cluster systems, and namespace is used in many cases to achieve multi-tenant resource isolation. The namespaces form different logically grouped projects, groups or user groups by 'distributing' resource objects in the cloud computing cluster to different namespaces, so that different groups can be respectively managed while sharing and using resources of the whole cloud computing cluster.
label is another core concept in cloud computing clusters. A label is a key-value pair of key and value, where key and value are specified by the user himself. labels may be attached to various resource objects, a resource object may define any number of labels, and the same label may be attached to any number of resource objects. The label is usually determined when the resource object is defined, and can be dynamically added or deleted after the object is created.
The Service defines an access portal address of the Service, an application (Pod) at the front end accesses a set of cloud computing cluster instances composed of Pod copies behind the application through the portal address, and seamless docking is realized between the Service and the cloud computing cluster composed of the Pod copies at the back end through a Label Selector.
The role of the Replication Controller (RC) is to ensure that the Service capability and the Service quality of the Service always meet the expected standards.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
Finally, it should be noted that, although the above embodiments have been described in the text and drawings of the present application, the scope of the patent protection of the present application is not limited thereby. All technical solutions which are generated by replacing or modifying the equivalent structure or the equivalent flow according to the contents described in the text and the drawings of the present application, and which are directly or indirectly implemented in other related technical fields, are included in the scope of protection of the present application.

Claims (8)

1. A cloud computing cluster scheduling system is characterized in that,
comprises at least 2 schedulers;
comprises at least 2 distributed state machines;
the distributed state machine synchronizes the resource object state to the scheduler in real time;
the schedulers are mutually independent, each scheduler stores the state data of the resource object in full quantity and synchronizes the state change of the resource object in real time;
after the scheduler receives the resource scheduling request, screening a scheduling object according to the scheduling request to obtain a screening result set, and generating a verification condition according to the scheduling request;
the distributed state machine corresponding to the resource object A judges according to the screening result set and the verification conditions, and if the judgment is consistent, the scheduling is successful;
resource object a is one of the resource objects in the screening result set.
2. The system of claim 1, wherein the distributed state machine is deployed in an IDC room.
3. The system of claim 1, wherein the distributed state machines are deployed in a central room, and wherein the distributed state machines and the scheduler communicate with each other via Link buses deployed in the rooms.
4. The system of claim 1, further comprising, before the step of the distributed state machine making the decision based on the set of screening results and the validation criteria: the dispatcher selects a distributed state machine corresponding to a resource object with the highest score in the screening result set according to the screening result to execute the judgment;
after the determination is performed, further comprising the steps of: and traversing the judgment executed by the distributed state machines corresponding to all the resource objects in the screening result set in sequence according to the scores of the resource objects until the judgment is consistent in the traversing process or the judgment is executed by the distributed state machines corresponding to all the resource objects in the screening result set in a traversing way.
5. The system of claim 1, wherein the address representation of the resource object is compressed, and wherein the compressed address representation is used to represent the resource object in the scheduler and the distributed state machine.
6. The system according to claim 1, wherein said screening the scheduling object according to the scheduling request to obtain a screening result set, comprises the steps of: and screening the resource object to obtain a screening result set according to at least one parameter of CPU, GPU, memory, storage volume, tag labels, hostname, namespace of name space, image downloading speed and data transmission speed included in the request expectation of the scheduler.
7. The system according to claim 1, further comprising, after the scheduler receives the resource scheduling request, the steps of:
analyzing to obtain a resource scheduling requirement description;
generating a scheduler plug-in and a plug-in configuration file according to the scheduling requirement description;
registering a plug-in to a dispatcher by declaring the plug-in configuration file to a cloud computing cluster;
the dispatcher uses the plug-in to realize resource dispatching for the dispatching request.
8. The system of claim 1, wherein the distributed state machine is configured to manage the state of the host node and the state of the virtualized resources allocated to the user;
the schedulers are respectively positioned in 2 central machine rooms, or at least 2 schedulers in the schedulers are positioned in the same central machine room;
the scheduler calculates the score of a resource object according to at least one of the port value, the number of host resources, the volume of a storage volume, the label lables, the hostname and the name space namespace, and traverses the judgment respectively obtained by the distributed state machines corresponding to all resource objects in the screening result set according to the equal dividing sequence of the resource objects until the judgment is consistent in the traversing process or the distributed state machines corresponding to all the resource objects are traversed;
the method comprises the steps that a test condition is that a scheduler tries to schedule a metadata and ResourceVersion value of a corresponding resource object, and the judging process comprises the steps of comparing the metadata and ResourceVersion value of the resource object processed by the scheduler with the metadata and ResourceVersion value of the resource object obtained by query of a distributed state machine, and if the values are consistent, judging that the judgment is successful;
further comprising: configuring a resource object for creating a snapshot, monitoring modification of the configuration through a CSI snapshot controller, calling the resource object to a CSI plug-in through a gRPC interface, and realizing snapshot storage action through an OpenAPI interface by the CSI plug-in; when the scheduler is restarted, the snapshot data obtained through the associated snapshot ID is restored to the newly created storage, and the newly created storage is used for constructing a user server environment;
further comprising: analyzing to obtain resource scheduling requirement description, generating a scheduler plug-in and a plug-in configuration file according to the scheduling requirement description, registering the plug-in to a scheduler by declaring the plug-in configuration file to a cloud computing cluster, and using the plug-in to realize resource scheduling for the scheduling request, wherein an implementation mode of at least one interface of a QueueSort interface, a Pre-Filter interface, a Filter interface, a Post-Filter interface, a Prescore interface, a Score interface, a normalized screening interface, a Reserve interface, a Permit interface, a Pre-Bind interface, a Bind interface and an Unreserve interface is declared in the generated scheduler plug-in.
CN202310000528.8A 2023-01-03 2023-01-03 Cloud computing cluster scheduling system Active CN115686802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310000528.8A CN115686802B (en) 2023-01-03 2023-01-03 Cloud computing cluster scheduling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310000528.8A CN115686802B (en) 2023-01-03 2023-01-03 Cloud computing cluster scheduling system

Publications (2)

Publication Number Publication Date
CN115686802A CN115686802A (en) 2023-02-03
CN115686802B true CN115686802B (en) 2023-03-21

Family

ID=85057395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310000528.8A Active CN115686802B (en) 2023-01-03 2023-01-03 Cloud computing cluster scheduling system

Country Status (1)

Country Link
CN (1) CN115686802B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214330A (en) * 2020-11-04 2021-01-12 腾讯科技(深圳)有限公司 Method and device for deploying master nodes in cluster and computer-readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9154288B2 (en) * 2011-03-25 2015-10-06 Nokia Solutions And Networks Oy Apparatus and method for allocating resources for coordinated transmissions from multiple cells
US9870215B2 (en) * 2015-11-30 2018-01-16 International Business Machines Corporation Tracking an application installation state
CN106874115A (en) * 2017-01-20 2017-06-20 杭州虚核科技有限公司 A kind of resources of virtual machine distribution method and distributed virtual machine resource scheduling system
CN109408229B (en) * 2018-09-30 2021-06-04 华为技术有限公司 Scheduling method and device
CN113918270A (en) * 2020-07-08 2022-01-11 电科云(北京)科技有限公司 Cloud resource scheduling method and system based on Kubernetes

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214330A (en) * 2020-11-04 2021-01-12 腾讯科技(深圳)有限公司 Method and device for deploying master nodes in cluster and computer-readable storage medium

Also Published As

Publication number Publication date
CN115686802A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
CN103049334B (en) A kind of method of task process and virtual machine
CN109565515B (en) System, apparatus, and process for dynamic tenant fabric adjustment in a distributed resource management system
US8117641B2 (en) Control device and control method for information system
US7447693B2 (en) Dynamic cluster database architecture
EP0747832A2 (en) Customer information control system and method in a loosely coupled parallel processing environment
KR20140119090A (en) Dynamic load balancing in a scalable environment
KR20140122240A (en) Managing partitions in a scalable environment
Wang et al. Pigeon: An effective distributed, hierarchical datacenter job scheduler
CN111343219B (en) Computing service cloud platform
WO2012039053A1 (en) Method of managing computer system operations, computer system and computer-readable medium storing program
CN105786603B (en) Distributed high-concurrency service processing system and method
CN112379971B (en) Application container management method, device and equipment
CN112862098A (en) Method and system for processing cluster training task
CN111459684A (en) Cloud computing resource fusion scheduling management method, system and medium for multiprocessor architecture
JP2011118864A (en) Resource control method and resource control system
CN115964176B (en) Cloud computing cluster scheduling method, electronic equipment and storage medium
CN107528871A (en) Data analysis in storage system
CN115686802B (en) Cloud computing cluster scheduling system
CN116954816A (en) Container cluster control method, device, equipment and computer storage medium
CN114816272B (en) Magnetic disk management system under Kubernetes environment
CN115102851B (en) Fusion platform for HPC and AI fusion calculation and resource management method thereof
CN104794000A (en) Work scheduling method and system
CN114995971A (en) Method and system for realizing pod batch scheduling in kubernets
CN114721824A (en) Resource allocation method, medium and electronic device
CN114157569A (en) Cluster system and construction method and construction device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240122

Address after: 230031 Room 672, 6/F, Building A3A4, Zhong'an Chuanggu Science Park, No. 900, Wangjiang West Road, High-tech Zone, Hefei, Anhui

Patentee after: Anhui Haima Cloud Technology Co.,Ltd.

Guo jiahuodiqu after: Zhong Guo

Address before: 301700 room 2d25, Building 29, No.89 Heyuan Road, Jingjin science and Technology Valley Industrial Park, Wuqing District, Tianjin

Patentee before: HAIMAYUN (TIANJIN) INFORMATION TECHNOLOGY CO.,LTD.

Guo jiahuodiqu before: Zhong Guo

TR01 Transfer of patent right