CN108512890B

CN108512890B - Container cloud platform resource scheduling method and system based on rack sensing

Info

Publication number: CN108512890B
Application number: CN201810074298.9A
Authority: CN
Inventors: 丁建军; 覃路; 曾志刚
Original assignee: Chalco Steering Intelligent Technology Co ltd
Current assignee: Chalco Steering Intelligent Technology Co ltd
Priority date: 2018-01-25
Filing date: 2018-01-25
Publication date: 2020-12-29
Anticipated expiration: 2038-01-25
Also published as: CN108512890A

Abstract

The invention provides a container cloud platform resource scheduling method and system based on rack perception, wherein the method comprises the following steps: A. acquiring a service request of a user, analyzing and defining the number of required copies; B. acquiring states of all nodes; C. scoring according to a resource scoring algorithm, and selecting a node with the highest resource scoring priority for scheduling; D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E; E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2; F. and C, acquiring all node states with the distance of 2 to the certain node in the step E, and re-entering the step C. The problem that container copies are distributed in the same rack in the scheduling process is avoided, the risk that applications are unavailable is reduced, network traffic in a cluster is dispersed, and the problem that the traffic of a single network port is overlarge is solved.

Description

Container cloud platform resource scheduling method and system based on rack sensing

Technical Field

The invention relates to the technical field of cloud computing resource scheduling, in particular to a container cloud platform resource scheduling method and system based on rack perception.

Background

In the container cloud platform, an application runs in the container cloud platform as a container and provides services to the outside, in order to achieve load balance and high availability, the same application needs to run a plurality of containers simultaneously as copies to work together, and in order to avoid that the services are unavailable after a certain node of the container cloud platform is down, the plurality of copies of the containers need to be run on different nodes which are not interfered with each other as much as possible.

In the prior art, two methods for scheduling containers are mainly used, one is a random scheduling method, and the other is a priority scheduling method based on node resource scoring, but the two methods do not consider the physical distribution of nodes, and in an actual environment, when a certain rack or an internal switch has a problem, the problem that an application is unavailable due to insufficient dispersion degree of application copies easily occurs.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a container cloud platform resource scheduling method and system based on rack perception, and aims to solve the problem that container copies are distributed in the same rack to cause high risk of unavailable application in the prior art.

The invention provides a container cloud platform resource scheduling method based on rack perception, which comprises the following steps:

A. acquiring a service request of a user, analyzing the acquired service request and defining the number of required copies;

B. acquiring all nodes and state information thereof;

C. scoring all the acquired nodes according to a resource scoring method, and selecting the node with the highest resource scoring priority for scheduling;

D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E;

E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2;

F. and E, acquiring all nodes with the distance of 2 to the certain node and state information thereof, and re-entering the step C.

As a further improvement of the invention, the step A comprises the following substeps:

A1. acquiring a service request of a user for applying for resources;

A2. analyzing the acquired resource service application request of the user to obtain identity information of the user and resource information applied;

A3. and performing identity authentication on the obtained user identity information, and after the authentication is passed, analyzing the resource information applied by the user and defining the required copy number.

As a further improvement of the present invention, the node status in step B includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, network I O throughput, etc.

As a further improvement of the invention, step C comprises the following substeps:

C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk I O throughput and network I O throughput data of the nodes as scoring index data;

C2. and sequencing the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node.

As a further improvement of the present invention, the node distance described in step E, F is a distance value based on the number of hops from the machine to the external device.

As a further improvement of the present invention, the implementation manner of the cluster node network topology in step E is to directly record topology data into a management node of the cluster, and when the management node performs scheduling, directly read the relevant data to obtain the node distance.

As a further improvement of the present invention, the cluster node network topology in step E is implemented by traversing all network interfaces by using the management node as an initial node through an SNMP protocol, and obtaining network topology data for the management node to use when scheduling.

A container cloud platform resource scheduling system based on rack sensing comprises an AP I server module, a resource scheduling control module, a node server cluster module and a node network topology information data module; the AP I server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after passing the identity verification; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.

Further, the node state information in the node server cluster module includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput, and performs resource scoring according to the data.

Further, the node distance is a hop count from the machine to the external device as a distance value.

According to the invention, on the basis of a node resource scoring method, rack perception is expanded, before scheduling, nodes distributed correspondingly to copies are calculated according to a rack perception algorithm, then, according to the scoring priority of the node resources in a rack, the node with the highest priority is selected for scheduling, then, a cluster node network topological graph is obtained, the distance between the selected nodes is larger than 2, and the node with the highest resource scoring priority is selected for scheduling, so that the situation that container copies are distributed in the same rack in the scheduling process is avoided, the application risk is reduced, the risk that the application is unavailable due to the failure of equipment in a machine room is reduced, the network flow in a cluster is dispersed, and the problem that the flow of a single network port is overlarge is avoided.

Drawings

FIG. 1 is a schematic block diagram of a container cloud platform resource scheduling method provided by the present invention;

FIG. 2 is a network topology diagram of a cluster node provided by the present invention;

fig. 3 is a schematic structural diagram of a container cloud platform resource scheduling system provided in the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the invention discloses a container cloud platform resource scheduling method based on rack sensing, which comprises the following steps:

B. acquiring all nodes and state information thereof;

Further, the step A comprises the following sub-steps:

A1. acquiring a service request of a user for applying for resources;

Further, the node status in step B includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput.

Further, step C includes the following substeps:

C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk IO throughput and network IO throughput data of the node as scoring index data;

Further, the node distance described in step E, F is a distance value based on the number of hops from the machine to the external device. For convenience of understanding, node distances are described herein based on a network topology case, and as shown in fig. 2, D1 and R1 are switches, and the bottom layer is dataode. Then the rackid of H1 is/D1/R1/H1, the parent of H1 is R1, and the parent of R1 is D1. The distance among H1, H2 and H3 is 2, namely 2 hops pass through the middle of H1-R1-H2, and the distance from H1 to H4, H5 and H6 is 4, namely 4 hops pass through the middle of H1-R1-D1-R2-H4.

Further, the cluster node network topology implementation manner in step E is to directly record topology data into a management node of the cluster, and when the management node performs scheduling, directly read related data to obtain a node distance.

Furthermore, the cluster node network topology in step E is implemented by traversing all network interfaces by using the management node as an initial node through an SNMP protocol, and obtaining network topology data for the management node to use when scheduling.

As shown in fig. 3, a container cloud platform resource scheduling system based on rack sensing includes an AP I server module, a resource scheduling control module, a node server cluster module, and a node network topology information data module; the AP I server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after passing the identity verification; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.

Further, the node distance is a distance value which is the hop count from a machine to an external device, the method for acquiring the node network topology is the same as that mentioned in the container cloud platform resource scheduling method, and the acquired data is stored in the node network topology information data module.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A container cloud platform resource scheduling method based on rack perception is characterized by comprising the following steps:

B. acquiring all nodes and state information thereof;

F. acquiring all nodes and state information thereof with the distance of 2 from a certain node in the step E, and re-entering the step C;

the step A comprises the following sub-steps:

A1. acquiring a service request of a user for applying for resources;

A3. carrying out identity verification on the obtained user identity information, and after the user identity information passes the verification, analyzing the resource information applied by the user and defining the number of required copies;

the node state information in the step B comprises machine load, CPU occupancy rate, memory occupancy rate, disk IO throughput and network IO throughput-data;

the step C comprises the following sub-steps:

C2. sorting the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node;

the node distance in step E, F is a distance value obtained by taking the number of hops from the machine to the external device, the cluster node network topology implementation manner in step E is to directly record topology data into the management node of the cluster, and when the management node performs scheduling, directly read the relevant data to obtain the node distance, and the cluster node network topology implementation manner in step E is to traverse all network interfaces by taking the management node as an initial node through an SNMP protocol to obtain network topology data for the management node to use when scheduling.

2. The utility model provides a container cloud platform resource scheduling system based on frame perception which characterized in that: the system comprises an API server module, a resource scheduling control module, a node server cluster module and a node network topology information data module; the API server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after the identity verification is passed; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.

3. The rack-aware-based container cloud platform resource scheduling system of claim 2, wherein: the node state information in the node server cluster module comprises machine load, CPU occupancy rate, memory occupancy rate, disk IO throughput and network IO throughput-data, and resource scoring is carried out according to the data.

4. The rack-aware-based container cloud platform resource scheduling system of claim 2, wherein: the node distance is a distance value of the number of hops from the machine to the external device.