CN108512890B - Container cloud platform resource scheduling method and system based on rack sensing - Google Patents

Container cloud platform resource scheduling method and system based on rack sensing Download PDF

Info

Publication number
CN108512890B
CN108512890B CN201810074298.9A CN201810074298A CN108512890B CN 108512890 B CN108512890 B CN 108512890B CN 201810074298 A CN201810074298 A CN 201810074298A CN 108512890 B CN108512890 B CN 108512890B
Authority
CN
China
Prior art keywords
node
scheduling
resource
data
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810074298.9A
Other languages
Chinese (zh)
Other versions
CN108512890A (en
Inventor
丁建军
覃路
曾志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chalco Steering Intelligent Technology Co ltd
Original Assignee
Chalco Steering Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chalco Steering Intelligent Technology Co ltd filed Critical Chalco Steering Intelligent Technology Co ltd
Priority to CN201810074298.9A priority Critical patent/CN108512890B/en
Publication of CN108512890A publication Critical patent/CN108512890A/en
Application granted granted Critical
Publication of CN108512890B publication Critical patent/CN108512890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1042Peer-to-peer [P2P] networks using topology management mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0213Standardised network management protocols, e.g. simple network management protocol [SNMP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • H04L47/6275Queue scheduling characterised by scheduling criteria for service slots or service orders based on priority
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer And Data Communications (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a container cloud platform resource scheduling method and system based on rack perception, wherein the method comprises the following steps: A. acquiring a service request of a user, analyzing and defining the number of required copies; B. acquiring states of all nodes; C. scoring according to a resource scoring algorithm, and selecting a node with the highest resource scoring priority for scheduling; D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E; E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2; F. and C, acquiring all node states with the distance of 2 to the certain node in the step E, and re-entering the step C. The problem that container copies are distributed in the same rack in the scheduling process is avoided, the risk that applications are unavailable is reduced, network traffic in a cluster is dispersed, and the problem that the traffic of a single network port is overlarge is solved.

Description

Container cloud platform resource scheduling method and system based on rack sensing
Technical Field
The invention relates to the technical field of cloud computing resource scheduling, in particular to a container cloud platform resource scheduling method and system based on rack perception.
Background
In the container cloud platform, an application runs in the container cloud platform as a container and provides services to the outside, in order to achieve load balance and high availability, the same application needs to run a plurality of containers simultaneously as copies to work together, and in order to avoid that the services are unavailable after a certain node of the container cloud platform is down, the plurality of copies of the containers need to be run on different nodes which are not interfered with each other as much as possible.
In the prior art, two methods for scheduling containers are mainly used, one is a random scheduling method, and the other is a priority scheduling method based on node resource scoring, but the two methods do not consider the physical distribution of nodes, and in an actual environment, when a certain rack or an internal switch has a problem, the problem that an application is unavailable due to insufficient dispersion degree of application copies easily occurs.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a container cloud platform resource scheduling method and system based on rack perception, and aims to solve the problem that container copies are distributed in the same rack to cause high risk of unavailable application in the prior art.
The invention provides a container cloud platform resource scheduling method based on rack perception, which comprises the following steps:
A. acquiring a service request of a user, analyzing the acquired service request and defining the number of required copies;
B. acquiring all nodes and state information thereof;
C. scoring all the acquired nodes according to a resource scoring method, and selecting the node with the highest resource scoring priority for scheduling;
D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E;
E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2;
F. and E, acquiring all nodes with the distance of 2 to the certain node and state information thereof, and re-entering the step C.
As a further improvement of the invention, the step A comprises the following substeps:
A1. acquiring a service request of a user for applying for resources;
A2. analyzing the acquired resource service application request of the user to obtain identity information of the user and resource information applied;
A3. and performing identity authentication on the obtained user identity information, and after the authentication is passed, analyzing the resource information applied by the user and defining the required copy number.
As a further improvement of the present invention, the node status in step B includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, network I O throughput, etc.
As a further improvement of the invention, step C comprises the following substeps:
C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk I O throughput and network I O throughput data of the nodes as scoring index data;
C2. and sequencing the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node.
As a further improvement of the present invention, the node distance described in step E, F is a distance value based on the number of hops from the machine to the external device.
As a further improvement of the present invention, the implementation manner of the cluster node network topology in step E is to directly record topology data into a management node of the cluster, and when the management node performs scheduling, directly read the relevant data to obtain the node distance.
As a further improvement of the present invention, the cluster node network topology in step E is implemented by traversing all network interfaces by using the management node as an initial node through an SNMP protocol, and obtaining network topology data for the management node to use when scheduling.
A container cloud platform resource scheduling system based on rack sensing comprises an AP I server module, a resource scheduling control module, a node server cluster module and a node network topology information data module; the AP I server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after passing the identity verification; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.
Further, the node state information in the node server cluster module includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput, and performs resource scoring according to the data.
Further, the node distance is a hop count from the machine to the external device as a distance value.
According to the invention, on the basis of a node resource scoring method, rack perception is expanded, before scheduling, nodes distributed correspondingly to copies are calculated according to a rack perception algorithm, then, according to the scoring priority of the node resources in a rack, the node with the highest priority is selected for scheduling, then, a cluster node network topological graph is obtained, the distance between the selected nodes is larger than 2, and the node with the highest resource scoring priority is selected for scheduling, so that the situation that container copies are distributed in the same rack in the scheduling process is avoided, the application risk is reduced, the risk that the application is unavailable due to the failure of equipment in a machine room is reduced, the network flow in a cluster is dispersed, and the problem that the flow of a single network port is overlarge is avoided.
Drawings
FIG. 1 is a schematic block diagram of a container cloud platform resource scheduling method provided by the present invention;
FIG. 2 is a network topology diagram of a cluster node provided by the present invention;
fig. 3 is a schematic structural diagram of a container cloud platform resource scheduling system provided in the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the invention discloses a container cloud platform resource scheduling method based on rack sensing, which comprises the following steps:
A. acquiring a service request of a user, analyzing the acquired service request and defining the number of required copies;
B. acquiring all nodes and state information thereof;
C. scoring all the acquired nodes according to a resource scoring method, and selecting the node with the highest resource scoring priority for scheduling;
D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E;
E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2;
F. and E, acquiring all nodes with the distance of 2 to the certain node and state information thereof, and re-entering the step C.
Further, the step A comprises the following sub-steps:
A1. acquiring a service request of a user for applying for resources;
A2. analyzing the acquired resource service application request of the user to obtain identity information of the user and resource information applied;
A3. and performing identity authentication on the obtained user identity information, and after the authentication is passed, analyzing the resource information applied by the user and defining the required copy number.
Further, the node status in step B includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput.
Further, step C includes the following substeps:
C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk IO throughput and network IO throughput data of the node as scoring index data;
C2. and sequencing the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node.
Further, the node distance described in step E, F is a distance value based on the number of hops from the machine to the external device. For convenience of understanding, node distances are described herein based on a network topology case, and as shown in fig. 2, D1 and R1 are switches, and the bottom layer is dataode. Then the rackid of H1 is/D1/R1/H1, the parent of H1 is R1, and the parent of R1 is D1. The distance among H1, H2 and H3 is 2, namely 2 hops pass through the middle of H1-R1-H2, and the distance from H1 to H4, H5 and H6 is 4, namely 4 hops pass through the middle of H1-R1-D1-R2-H4.
Further, the cluster node network topology implementation manner in step E is to directly record topology data into a management node of the cluster, and when the management node performs scheduling, directly read related data to obtain a node distance.
Furthermore, the cluster node network topology in step E is implemented by traversing all network interfaces by using the management node as an initial node through an SNMP protocol, and obtaining network topology data for the management node to use when scheduling.
As shown in fig. 3, a container cloud platform resource scheduling system based on rack sensing includes an AP I server module, a resource scheduling control module, a node server cluster module, and a node network topology information data module; the AP I server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after passing the identity verification; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.
Further, the node state information in the node server cluster module includes data such as machine load, CPU occupancy, memory occupancy, disk I O throughput, and network I O throughput, and performs resource scoring according to the data.
Further, the node distance is a distance value which is the hop count from a machine to an external device, the method for acquiring the node network topology is the same as that mentioned in the container cloud platform resource scheduling method, and the acquired data is stored in the node network topology information data module.
According to the invention, on the basis of a node resource scoring method, rack perception is expanded, before scheduling, nodes distributed correspondingly to copies are calculated according to a rack perception algorithm, then, according to the scoring priority of the node resources in a rack, the node with the highest priority is selected for scheduling, then, a cluster node network topological graph is obtained, the distance between the selected nodes is larger than 2, and the node with the highest resource scoring priority is selected for scheduling, so that the situation that container copies are distributed in the same rack in the scheduling process is avoided, the application risk is reduced, the risk that the application is unavailable due to the failure of equipment in a machine room is reduced, the network flow in a cluster is dispersed, and the problem that the flow of a single network port is overlarge is avoided.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (4)

1. A container cloud platform resource scheduling method based on rack perception is characterized by comprising the following steps:
A. acquiring a service request of a user, analyzing the acquired service request and defining the number of required copies;
B. acquiring all nodes and state information thereof;
C. scoring all the acquired nodes according to a resource scoring method, and selecting the node with the highest resource scoring priority for scheduling;
D. judging whether residual copies need to be scheduled, if not, finishing scheduling, and if so, entering the step E;
E. c, acquiring a cluster node network topological graph, and selecting a certain node with the distance to the node with the highest resource scoring priority in the step C larger than 2;
F. acquiring all nodes and state information thereof with the distance of 2 from a certain node in the step E, and re-entering the step C;
the step A comprises the following sub-steps:
A1. acquiring a service request of a user for applying for resources;
A2. analyzing the acquired resource service application request of the user to obtain identity information of the user and resource information applied;
A3. carrying out identity verification on the obtained user identity information, and after the user identity information passes the verification, analyzing the resource information applied by the user and defining the number of required copies;
the node state information in the step B comprises machine load, CPU occupancy rate, memory occupancy rate, disk IO throughput and network IO throughput-data;
the step C comprises the following sub-steps:
C1. taking the collected machine load, CPU occupancy rate, memory occupancy, disk IO throughput and network IO throughput data of the node as scoring index data;
C2. sorting the nodes from small to large according to the grading index data, and selecting a first node as a duplicate scheduling node;
the node distance in step E, F is a distance value obtained by taking the number of hops from the machine to the external device, the cluster node network topology implementation manner in step E is to directly record topology data into the management node of the cluster, and when the management node performs scheduling, directly read the relevant data to obtain the node distance, and the cluster node network topology implementation manner in step E is to traverse all network interfaces by taking the management node as an initial node through an SNMP protocol to obtain network topology data for the management node to use when scheduling.
2. The utility model provides a container cloud platform resource scheduling system based on frame perception which characterized in that: the system comprises an API server module, a resource scheduling control module, a node server cluster module and a node network topology information data module; the API server module acquires a service request of a user, analyzes the service request, verifies the identity, and defines the number of copies and resource scheduling requests required by the service request of the user after the identity verification is passed; the node network topology information data module acquires a node network topology map and stores data; the node server cluster module comprises all nodes and state data information thereof, scores all the nodes according to a resource scoring method, and sorts all the nodes in sequence from high to low according to the priority of resource scoring; the resource scheduling control module receives the defined copy number and resource scheduling request, acquires node information in a node server cluster module, calls a node with the highest priority as a first copy scheduling node, then judges whether the rest copies need to be scheduled, if not, finishes scheduling, if yes, calls data of the node server cluster module and a node network topology information data module, acquires all nodes with the distance larger than 2 from the calling node, selects the node with the highest priority as a second copy scheduling node, then continuously judges whether the copies need to be scheduled, and so on until no rest copies need to be scheduled, and finishes scheduling.
3. The rack-aware-based container cloud platform resource scheduling system of claim 2, wherein: the node state information in the node server cluster module comprises machine load, CPU occupancy rate, memory occupancy rate, disk IO throughput and network IO throughput-data, and resource scoring is carried out according to the data.
4. The rack-aware-based container cloud platform resource scheduling system of claim 2, wherein: the node distance is a distance value of the number of hops from the machine to the external device.
CN201810074298.9A 2018-01-25 2018-01-25 Container cloud platform resource scheduling method and system based on rack sensing Active CN108512890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810074298.9A CN108512890B (en) 2018-01-25 2018-01-25 Container cloud platform resource scheduling method and system based on rack sensing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810074298.9A CN108512890B (en) 2018-01-25 2018-01-25 Container cloud platform resource scheduling method and system based on rack sensing

Publications (2)

Publication Number Publication Date
CN108512890A CN108512890A (en) 2018-09-07
CN108512890B true CN108512890B (en) 2020-12-29

Family

ID=63374844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810074298.9A Active CN108512890B (en) 2018-01-25 2018-01-25 Container cloud platform resource scheduling method and system based on rack sensing

Country Status (1)

Country Link
CN (1) CN108512890B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109889370B (en) * 2019-01-10 2021-12-21 中国移动通信集团海南有限公司 Network equipment position determining method and device and computer readable storage medium
CN110221915B (en) * 2019-05-21 2020-11-10 新华三大数据技术有限公司 Node scheduling method and device
CN110187974A (en) * 2019-05-31 2019-08-30 北京宝兰德软件股份有限公司 A kind of processing method and processing device of load balancing
CN110460647B (en) * 2019-07-23 2021-10-22 平安科技(深圳)有限公司 Network node scheduling method and device, electronic equipment and storage medium
CN110597701B (en) * 2019-09-12 2021-03-05 上海道客网络科技有限公司 System and method for scoring health stable operation degree of container cloud platform
CN112148461A (en) * 2020-10-14 2020-12-29 腾讯科技(深圳)有限公司 Application scheduling method and device
CN112445575B (en) * 2020-11-27 2024-01-26 中国工商银行股份有限公司 Multi-cluster resource scheduling method, device and system
CN113313280B (en) * 2021-03-31 2023-09-19 阿里巴巴新加坡控股有限公司 Cloud platform inspection method, electronic equipment and nonvolatile storage medium
US11997022B2 (en) * 2021-06-21 2024-05-28 International Business Machines Corporation Service-to-service scheduling in container orchestrators

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095573A (en) * 2016-06-08 2016-11-09 北京大学 The Storm platform operations of a kind of work nest perception divides equally dispatching method
CN107370802A (en) * 2017-07-10 2017-11-21 中国人民解放军国防科学技术大学 A kind of collaboration storage dispatching method based on alternating direction multiplier method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095573A (en) * 2016-06-08 2016-11-09 北京大学 The Storm platform operations of a kind of work nest perception divides equally dispatching method
CN107370802A (en) * 2017-07-10 2017-11-21 中国人民解放军国防科学技术大学 A kind of collaboration storage dispatching method based on alternating direction multiplier method

Also Published As

Publication number Publication date
CN108512890A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN108512890B (en) Container cloud platform resource scheduling method and system based on rack sensing
US11646972B2 (en) Dynamic allocation of network resources using external inputs
CN104092756B (en) A kind of resource dynamic distributing method of the cloud storage system based on DHT mechanism
US10419437B2 (en) Quasi-agentless cloud resource management
CN104869151A (en) Business unloading method and system
CN111614657B (en) Mobile edge security service method and system based on mode selection
CN108512672B (en) Service arranging method, service management method and device
US11652720B2 (en) Allocating cloud resources in accordance with predicted deployment growth
CN109298937A (en) Document analysis method and the network equipment
CN113485792A (en) Pod scheduling method in kubernets cluster, terminal equipment and storage medium
WO2023091215A1 (en) Mapping an application signature to designated cloud resources
CN105872082B (en) Fine granularity resource response system based on container cluster load-balancing algorithm
CN108228752B (en) Data total export method, data export task allocation device and data export node device
CN102546652B (en) System and method for server load balancing
CN115913550A (en) Password resource allocation method, device and equipment
CN107104829B (en) Physical equipment matching distribution method and device based on network topology data
US12039075B2 (en) Methods and systems for data management in communication network
CN113190347A (en) Edge cloud system and task management method
CN110391929B (en) Fault-tolerant control method and device and fault-tolerant component
CN111327666A (en) Service management method, device and system, computer equipment and storage medium
CN117176728B (en) Industrial Internet of things dispatching method and dispatching system based on cloud edge cooperative technology
WO2024047775A1 (en) Determination of machine learning model to be used for given predictive purpose for communication system
WO2024047774A1 (en) Determination of machine learning model used for given predictive purpose relating to communication system
US20230128199A1 (en) Telemetry data filter
KR20170124136A (en) Adaptive control plane management method for software defined network and apparatus thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant