CN108512890B - Container cloud platform resource scheduling method and system based on rack sensing - Google Patents
Container cloud platform resource scheduling method and system based on rack sensing Download PDFInfo
- Publication number
- CN108512890B CN108512890B CN201810074298.9A CN201810074298A CN108512890B CN 108512890 B CN108512890 B CN 108512890B CN 201810074298 A CN201810074298 A CN 201810074298A CN 108512890 B CN108512890 B CN 108512890B
- Authority
- CN
- China
- Prior art keywords
- node
- scheduling
- resource
- data
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1042—Peer-to-peer [P2P] networks using topology management mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/02—Standardisation; Integration
- H04L41/0213—Standardised network management protocols, e.g. simple network management protocol [SNMP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/50—Queue scheduling
- H04L47/62—Queue scheduling characterised by scheduling criteria
- H04L47/625—Queue scheduling characterised by scheduling criteria for service slots or service orders
- H04L47/6275—Queue scheduling characterised by scheduling criteria for service slots or service orders based on priority
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1074—Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Computer And Data Communications (AREA)
- Information Transfer Between Computers (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a container cloud platform resource scheduling method and system based on rack perception, wherein the method comprises the following steps: A. acquiring a service request of a user, analyzing and defining the number of required copies; B. acquiring states of all nodes; C. scoring according to a resource scoring algorithm, and selecting a node with the highest resource scoring priority for scheduling; D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E; E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2; F. and C, acquiring all node states with the distance of 2 to the certain node in the step E, and re-entering the step C. The problem that container copies are distributed in the same rack in the scheduling process is avoided, the risk that applications are unavailable is reduced, network traffic in a cluster is dispersed, and the problem that the traffic of a single network port is overlarge is solved.
Description
Technical Field
The invention relates to the technical field of cloud computing resource scheduling, in particular to a container cloud platform resource scheduling method and system based on rack perception.
Background
In the container cloud platform, an application runs in the container cloud platform as a container and provides services to the outside, in order to achieve load balance and high availability, the same application needs to run a plurality of containers simultaneously as copies to work together, and in order to avoid that the services are unavailable after a certain node of the container cloud platform is down, the plurality of copies of the containers need to be run on different nodes which are not interfered with each other as much as possible.
In the prior art, two methods for scheduling containers are mainly used, one is a random scheduling method, and the other is a priority scheduling method based on node resource scoring, but the two methods do not consider the physical distribution of nodes, and in an actual environment, when a certain rack or an internal switch has a problem, the problem that an application is unavailable due to insufficient dispersion degree of application copies easily occurs.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a container cloud platform resource scheduling method and system based on rack perception, and aims to solve the problem that container copies are distributed in the same rack to cause high risk of unavailable application in the prior art.
The invention provides a container cloud platform resource scheduling method based on rack perception, which comprises the following steps:
A. acquiring a service request of a user, analyzing the acquired service request and defining the number of required copies;
B. acquiring all nodes and state information thereof;
C. scoring all the acquired nodes according to a resource scoring method, and selecting the node with the highest resource scoring priority for scheduling;
D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E;
E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2;
F. and E, acquiring all nodes with the distance of 2 to the certain node and state information thereof, and re-entering the step C.
As a further improvement of the invention, the step A comprises the following substeps:
A1. acquiring a service request of a user for applying for resources;
A2. analyzing the acquired resource service application request of the user to obtain identity information of the user and resource information applied;
A3. and performing identity authentication on the obtained user identity information, and after the authentication is passed, analyzing the resource information applied by the user and defining the required copy number.
As a further improvement of the present invention, the node status in step B includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, network I O throughput, etc.
As a further improvement of the invention, step C comprises the following substeps:
C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk I O throughput and network I O throughput data of the nodes as scoring index data;
C2. and sequencing the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node.
As a further improvement of the present invention, the node distance described in step E, F is a distance value based on the number of hops from the machine to the external device.
As a further improvement of the present invention, the implementation manner of the cluster node network topology in step E is to directly record topology data into a management node of the cluster, and when the management node performs scheduling, directly read the relevant data to obtain the node distance.
As a further improvement of the present invention, the cluster node network topology in step E is implemented by traversing all network interfaces by using the management node as an initial node through an SNMP protocol, and obtaining network topology data for the management node to use when scheduling.
A container cloud platform resource scheduling system based on rack sensing comprises an AP I server module, a resource scheduling control module, a node server cluster module and a node network topology information data module; the AP I server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after passing the identity verification; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.
Further, the node state information in the node server cluster module includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput, and performs resource scoring according to the data.
Further, the node distance is a hop count from the machine to the external device as a distance value.
According to the invention, on the basis of a node resource scoring method, rack perception is expanded, before scheduling, nodes distributed correspondingly to copies are calculated according to a rack perception algorithm, then, according to the scoring priority of the node resources in a rack, the node with the highest priority is selected for scheduling, then, a cluster node network topological graph is obtained, the distance between the selected nodes is larger than 2, and the node with the highest resource scoring priority is selected for scheduling, so that the situation that container copies are distributed in the same rack in the scheduling process is avoided, the application risk is reduced, the risk that the application is unavailable due to the failure of equipment in a machine room is reduced, the network flow in a cluster is dispersed, and the problem that the flow of a single network port is overlarge is avoided.
Drawings
FIG. 1 is a schematic block diagram of a container cloud platform resource scheduling method provided by the present invention;
FIG. 2 is a network topology diagram of a cluster node provided by the present invention;
fig. 3 is a schematic structural diagram of a container cloud platform resource scheduling system provided in the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the invention discloses a container cloud platform resource scheduling method based on rack sensing, which comprises the following steps:
A. acquiring a service request of a user, analyzing the acquired service request and defining the number of required copies;
B. acquiring all nodes and state information thereof;
C. scoring all the acquired nodes according to a resource scoring method, and selecting the node with the highest resource scoring priority for scheduling;
D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E;
E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2;
F. and E, acquiring all nodes with the distance of 2 to the certain node and state information thereof, and re-entering the step C.
Further, the step A comprises the following sub-steps:
A1. acquiring a service request of a user for applying for resources;
A2. analyzing the acquired resource service application request of the user to obtain identity information of the user and resource information applied;
A3. and performing identity authentication on the obtained user identity information, and after the authentication is passed, analyzing the resource information applied by the user and defining the required copy number.
Further, the node status in step B includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput.
Further, step C includes the following substeps:
C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk IO throughput and network IO throughput data of the node as scoring index data;
C2. and sequencing the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node.
Further, the node distance described in step E, F is a distance value based on the number of hops from the machine to the external device. For convenience of understanding, node distances are described herein based on a network topology case, and as shown in fig. 2, D1 and R1 are switches, and the bottom layer is dataode. Then the rackid of H1 is/D1/R1/H1, the parent of H1 is R1, and the parent of R1 is D1. The distance among H1, H2 and H3 is 2, namely 2 hops pass through the middle of H1-R1-H2, and the distance from H1 to H4, H5 and H6 is 4, namely 4 hops pass through the middle of H1-R1-D1-R2-H4.
Further, the cluster node network topology implementation manner in step E is to directly record topology data into a management node of the cluster, and when the management node performs scheduling, directly read related data to obtain a node distance.
Furthermore, the cluster node network topology in step E is implemented by traversing all network interfaces by using the management node as an initial node through an SNMP protocol, and obtaining network topology data for the management node to use when scheduling.
As shown in fig. 3, a container cloud platform resource scheduling system based on rack sensing includes an AP I server module, a resource scheduling control module, a node server cluster module, and a node network topology information data module; the AP I server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after passing the identity verification; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.
Further, the node state information in the node server cluster module includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput, and performs resource scoring according to the data.
Further, the node distance is a distance value which is the hop count from a machine to an external device, the method for acquiring the node network topology is the same as that mentioned in the container cloud platform resource scheduling method, and the acquired data is stored in the node network topology information data module.
According to the invention, on the basis of a node resource scoring method, rack perception is expanded, before scheduling, nodes distributed correspondingly to copies are calculated according to a rack perception algorithm, then, according to the scoring priority of the node resources in a rack, the node with the highest priority is selected for scheduling, then, a cluster node network topological graph is obtained, the distance between the selected nodes is larger than 2, and the node with the highest resource scoring priority is selected for scheduling, so that the situation that container copies are distributed in the same rack in the scheduling process is avoided, the application risk is reduced, the risk that the application is unavailable due to the failure of equipment in a machine room is reduced, the network flow in a cluster is dispersed, and the problem that the flow of a single network port is overlarge is avoided.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (4)
1. A container cloud platform resource scheduling method based on rack perception is characterized by comprising the following steps:
A. acquiring a service request of a user, analyzing the acquired service request and defining the number of required copies;
B. acquiring all nodes and state information thereof;
C. scoring all the acquired nodes according to a resource scoring method, and selecting the node with the highest resource scoring priority for scheduling;
D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E;
E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2;
F. acquiring all nodes and state information thereof with the distance of 2 from a certain node in the step E, and re-entering the step C;
the step A comprises the following sub-steps:
A1. acquiring a service request of a user for applying for resources;
A2. analyzing the acquired resource service application request of the user to obtain identity information of the user and resource information applied;
A3. carrying out identity verification on the obtained user identity information, and after the user identity information passes the verification, analyzing the resource information applied by the user and defining the number of required copies;
the node state information in the step B comprises machine load, CPU occupancy rate, memory occupancy rate, disk IO throughput and network IO throughput-data;
the step C comprises the following sub-steps:
C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk IO throughput and network IO throughput data of the node as scoring index data;
C2. sorting the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node;
the node distance in step E, F is a distance value obtained by taking the number of hops from the machine to the external device, the cluster node network topology implementation manner in step E is to directly record topology data into the management node of the cluster, and when the management node performs scheduling, directly read the relevant data to obtain the node distance, and the cluster node network topology implementation manner in step E is to traverse all network interfaces by taking the management node as an initial node through an SNMP protocol to obtain network topology data for the management node to use when scheduling.
2. The utility model provides a container cloud platform resource scheduling system based on frame perception which characterized in that: the system comprises an API server module, a resource scheduling control module, a node server cluster module and a node network topology information data module; the API server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after the identity verification is passed; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.
3. The rack-aware-based container cloud platform resource scheduling system of claim 2, wherein: the node state information in the node server cluster module comprises machine load, CPU occupancy rate, memory occupancy rate, disk IO throughput and network IO throughput-data, and resource scoring is carried out according to the data.
4. The rack-aware-based container cloud platform resource scheduling system of claim 2, wherein: the node distance is a distance value of the number of hops from the machine to the external device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810074298.9A CN108512890B (en) | 2018-01-25 | 2018-01-25 | Container cloud platform resource scheduling method and system based on rack sensing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810074298.9A CN108512890B (en) | 2018-01-25 | 2018-01-25 | Container cloud platform resource scheduling method and system based on rack sensing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108512890A CN108512890A (en) | 2018-09-07 |
CN108512890B true CN108512890B (en) | 2020-12-29 |
Family
ID=63374844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810074298.9A Active CN108512890B (en) | 2018-01-25 | 2018-01-25 | Container cloud platform resource scheduling method and system based on rack sensing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108512890B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109889370B (en) * | 2019-01-10 | 2021-12-21 | 中国移动通信集团海南有限公司 | Network equipment position determining method and device and computer readable storage medium |
CN110221915B (en) * | 2019-05-21 | 2020-11-10 | 新华三大数据技术有限公司 | Node scheduling method and device |
CN110187974A (en) * | 2019-05-31 | 2019-08-30 | 北京宝兰德软件股份有限公司 | A kind of processing method and processing device of load balancing |
CN110460647B (en) * | 2019-07-23 | 2021-10-22 | 平安科技(深圳)有限公司 | Network node scheduling method and device, electronic equipment and storage medium |
CN110597701B (en) * | 2019-09-12 | 2021-03-05 | 上海道客网络科技有限公司 | System and method for scoring health stable operation degree of container cloud platform |
CN112148461A (en) * | 2020-10-14 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Application scheduling method and device |
CN112445575B (en) * | 2020-11-27 | 2024-01-26 | 中国工商银行股份有限公司 | Multi-cluster resource scheduling method, device and system |
CN113313280B (en) * | 2021-03-31 | 2023-09-19 | 阿里巴巴新加坡控股有限公司 | Cloud platform inspection method, electronic equipment and nonvolatile storage medium |
US11997022B2 (en) * | 2021-06-21 | 2024-05-28 | International Business Machines Corporation | Service-to-service scheduling in container orchestrators |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095573A (en) * | 2016-06-08 | 2016-11-09 | 北京大学 | The Storm platform operations of a kind of work nest perception divides equally dispatching method |
CN107370802A (en) * | 2017-07-10 | 2017-11-21 | 中国人民解放军国防科学技术大学 | A kind of collaboration storage dispatching method based on alternating direction multiplier method |
-
2018
- 2018-01-25 CN CN201810074298.9A patent/CN108512890B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095573A (en) * | 2016-06-08 | 2016-11-09 | 北京大学 | The Storm platform operations of a kind of work nest perception divides equally dispatching method |
CN107370802A (en) * | 2017-07-10 | 2017-11-21 | 中国人民解放军国防科学技术大学 | A kind of collaboration storage dispatching method based on alternating direction multiplier method |
Also Published As
Publication number | Publication date |
---|---|
CN108512890A (en) | 2018-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108512890B (en) | Container cloud platform resource scheduling method and system based on rack sensing | |
US11646972B2 (en) | Dynamic allocation of network resources using external inputs | |
CN104092756B (en) | A kind of resource dynamic distributing method of the cloud storage system based on DHT mechanism | |
US10419437B2 (en) | Quasi-agentless cloud resource management | |
CN104869151A (en) | Business unloading method and system | |
CN111614657B (en) | Mobile edge security service method and system based on mode selection | |
CN108512672B (en) | Service arranging method, service management method and device | |
US11652720B2 (en) | Allocating cloud resources in accordance with predicted deployment growth | |
CN109298937A (en) | Document analysis method and the network equipment | |
CN113485792A (en) | Pod scheduling method in kubernets cluster, terminal equipment and storage medium | |
WO2023091215A1 (en) | Mapping an application signature to designated cloud resources | |
CN105872082B (en) | Fine granularity resource response system based on container cluster load-balancing algorithm | |
CN108228752B (en) | Data total export method, data export task allocation device and data export node device | |
CN102546652B (en) | System and method for server load balancing | |
CN115913550A (en) | Password resource allocation method, device and equipment | |
CN107104829B (en) | Physical equipment matching distribution method and device based on network topology data | |
US12039075B2 (en) | Methods and systems for data management in communication network | |
CN113190347A (en) | Edge cloud system and task management method | |
CN110391929B (en) | Fault-tolerant control method and device and fault-tolerant component | |
CN111327666A (en) | Service management method, device and system, computer equipment and storage medium | |
CN117176728B (en) | Industrial Internet of things dispatching method and dispatching system based on cloud edge cooperative technology | |
WO2024047775A1 (en) | Determination of machine learning model to be used for given predictive purpose for communication system | |
WO2024047774A1 (en) | Determination of machine learning model used for given predictive purpose relating to communication system | |
US20230128199A1 (en) | Telemetry data filter | |
KR20170124136A (en) | Adaptive control plane management method for software defined network and apparatus thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |