CN111865714A - Cluster management method based on multi-cloud environment - Google Patents

Cluster management method based on multi-cloud environment Download PDF

Info

Publication number
CN111865714A
CN111865714A CN202010585865.4A CN202010585865A CN111865714A CN 111865714 A CN111865714 A CN 111865714A CN 202010585865 A CN202010585865 A CN 202010585865A CN 111865714 A CN111865714 A CN 111865714A
Authority
CN
China
Prior art keywords
node
framework
service
thread
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010585865.4A
Other languages
Chinese (zh)
Other versions
CN111865714B (en
Inventor
伏伟任
蒋秋明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shangshi Longchuang Intelligent Technology Co Ltd
Original Assignee
Shanghai Shangshi Longchuang Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shangshi Longchuang Intelligent Technology Co Ltd filed Critical Shanghai Shangshi Longchuang Intelligent Technology Co Ltd
Priority to CN202010585865.4A priority Critical patent/CN111865714B/en
Publication of CN111865714A publication Critical patent/CN111865714A/en
Application granted granted Critical
Publication of CN111865714B publication Critical patent/CN111865714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention relates to a cluster management method based on a multi-cloud environment, which comprises the following steps: the development framework design is the core of a multi-cloud framework and is also the part with the highest abstraction degree, the operation and the main components of the micro-service operation framework are known, for most middle-platform systems, the dependence on the framework operation is an RPC framework, the service management capability based on the RPC framework comprises mechanisms of service registration discovery, fusing fault tolerance, flow control and the like, and the service logic core code is decoupled from the micro-service framework capability. The invention establishes a detailed monitoring to the cluster node information based on the multi-cloud environment, can specify the nodes, and can compare the single data of each node in a graph mode so as to process specific faults.

Description

Cluster management method based on multi-cloud environment
Technical Field
The invention relates to cloud architecture management, in particular to a cluster management method based on a multi-cloud environment.
Background
The multi-cloud environment is a cloud architecture, is formed by combining a plurality of cloud services provided by a plurality of cloud providers, and can be a public cloud or a private cloud, wherein the multi-cloud is that the same type of cloud scheme is deployed on the plurality of providers, the mixed cloud is that a plurality of cloud deployment types are combined through integration or orchestration, and the multi-cloud scheme may relate to 2 public cloud environments or 2 private cloud environments.
A hybrid cloud scenario may involve 1 public cloud environment and 1 private cloud environment, and an infrastructure (implemented by application programming interfaces, middleware, or containers) that facilitates workload portability, with more and more enterprises selecting a multi-cloud deployment (including public and private clouds) with the desire to improve security and performance by expanding more environments.
The existing management modes of the cloud environment are various, and systematic clustering management is inconvenient to carry out.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a cluster management method based on a multi-cloud environment.
The purpose of the invention can be realized by the following technical scheme:
a cluster management method based on a multi-cloud environment comprises the following steps:
Step 1: setting a development framework based on the operation requirements and main components of a micro-service operation framework and decoupling corresponding core codes from the micro-service operation framework capacity;
step 2: designing a framework within the development framework;
and step 3: defining a micro service interface in the framework to complete the whole deployment;
and 4, step 4: and performing different cluster management operations based on the deployed structure.
Further, the development framework in the step 1 is an RPC framework and service management capability based on the RPC framework.
Further, the service governance capability based on the RPC framework comprises service registration discovery, fusing fault tolerance and flow control.
Further, the technical base of the framework in the step 2 adopts the technical bases of Spring, Spring Boot, ServiceComb, HSF and Spring Cloud micro-service frameworks.
Further, the different cluster management operations in step 4 include node joining, node leaving, normal operation of the node, node configuration, and thread synchronization.
Further, the process of node joining specifically includes: each node reads the configuration file of the node when starting, and sends the joining request message according to the period until receiving the joining confirmation messages of all other nodes.
Further, the leaving process of the node specifically includes: monitoring the states of all nodes, monitoring through heartbeat messages sent by the opposite side, if the heartbeat messages of a certain node are not received in a set period, the node is considered to leave, and when the node is the leaving or failure of a backup node, the node is directly deleted from a node list, and when the node is the leaving or failure of a main node, a new main node is selected from the rest nodes again.
Further, the normal operation process of the node specifically includes: a certain node sends heartbeat messages periodically to identify the existence of the node, and other nodes receive the heartbeat messages of the node periodically to jointly maintain a cluster node list.
Further, the process of node configuration specifically includes: each node starts and reads the initialized self node and the message to be sent in the configuration file, and then adds the self after configuration into the cluster node list.
Further, the threads in the thread synchronization implementation include a gm _ listener thread, a heartbeat thread, an add _ flag thread, and a test thread, where:
the gm _ listener thread is used for monitoring the received multicast messages and performing corresponding processing;
The heartbeat thread is used for sending a join request message or a heartbeat message every other heartbeat cycle by inquiring the state;
the add _ flag thread is used for carrying out periodic subtraction operation on a flag variable flag for marking the state of each node;
and the test thread is used for periodically detecting whether the flag variable is less than 0 or not for the nodes in each list.
Compared with the prior art, the invention has the following advantages:
(1) the method comprises the following steps: step 1: setting a development framework based on the operation requirements and main components of a micro-service operation framework and decoupling corresponding core codes from the micro-service operation framework capacity; step 2: designing a framework within the development framework; and step 3: defining a micro service interface in the framework to complete the whole deployment; and 4, step 4: different cluster management operations are carried out based on the deployed structure, a detailed monitoring for cluster node information is established based on a multi-cloud environment, nodes can be specified, and single data of each node can be compared in a graph mode so as to process specific faults;
(2) by adopting the steps of the method, a system administrator can set the automatic response of the system to the event through the event service, realize task distribution, load balance and high availability, develop a friendly management interface and improve the safety and convenience of management.
Drawings
FIG. 1 is a flow chart of the method steps of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Fig. 1 shows a cluster management method based on a multi-cloud environment, which includes the following steps:
s1, selecting a development framework: the development framework design is the core of a multi-cloud framework and is also the part with the highest abstraction degree, the operation and the main components of a micro-service operation framework are known, for most middle-platform systems, the dependence on the framework operation is an RPC framework, the service management capability based on the RPC framework comprises mechanisms of service registration discovery, fusing fault tolerance, flow control and the like, and the service logic core code is decoupled from the micro-service framework capability;
s2, architecture design: the technical base uses the technical base of Spring, Spring Boot, ServiceComb, HSF and Spring cloud micro-service framework, and is constructed based on the Spring and Spring Boot technical stack;
S3, defining a micro-service interface: the method comprises the following steps that a microservice-endpoint-ServiceComb is issued as a ServiceComb micro-service project, a microservice-endpoint-HSF is issued as an HSF micro-service project, and a microservice-endpoint-Spring is issued as a Spring Cloud micro-service project, wherein Integer, String and Boolean definition parameters and return values are used for defining parameters and return values by using a POJO Bean which conforms to Java Bean specifications, i nterface and abstrate class are not used, a plurality of realized base classes exist, template classes are used as parameters and return values, and objects which are strongly related to the operating environment are not used as interface parameters and return values;
and S4, managing the architecture of the nodes.
A cluster management method based on a multi-cloud environment, wherein step S4 includes:
and (3) adding the nodes: each node reads its own configuration file when starting, the configuration file comprises a node ID number, its own IP address, a multicast IP address and a port number, an initialization message and its own node, and then periodically sends a join request message until receiving a join confirmation message of other nodes;
departure of a node: firstly, monitoring the state of a node through a heartbeat message sent by the opposite side, namely, if the heartbeat message of a certain node is not received in three periods, the node is considered to leave, and under two basic conditions, the leaving/failure of a backup node is realized, the node is directly deleted from a node list, and if the main node leaves/fails, a new main node is selected from the rest nodes again, namely, the main node with the smallest ID number is selected from the rest nodes as the new main node, and the main node leaving or having the failure is deleted;
And (4) normal operation: because the node normally operates, the node periodically sends heartbeat messages to identify the existence of the node, and other nodes periodically receive the heartbeat messages of the node, thereby maintaining a cluster node list;
and (3) node configuration: each node is configured with a configuration file, the configuration file is stored in a node type in a configuration directory, the node starts to read the configuration file firstly, initializes the node and a message to be sent by a node ID number, a self IP address, a multicast IP address, a port number and the like, and adds the node into a node list firstly;
thread synchronization is realized: all threads in a process share the same global memory so that the threads share information, all threads in a process not only share global variables, but also share process instructions, most data, open files (e.g., descriptors), signal handlers and signal settings, current working directory, user ID and group lD, which relate to multiple threads running simultaneously, such as a gm _ listener thread, which is responsible for monitoring received multicast messages and performing corresponding processing, such as receiving a join message to determine if it is in the node list, joining it if it is not, and sending a join acknowledgement message, receiving a join acknowledgement message, determining if it is in the node list, joining it if it is not, receiving heartbeat information, adding a corresponding node flag variable, the heartbeat thread, sending a join request message or heartbeat message every other heartbeat cycle by querying for status, the add _ flag thread periodically decrements a flag variable flag that identifies the status of each node, while the test thread periodically detects for each node in the list whether the flag variable is less than 0, i.e., whether the node in the list is dead or away.
The method establishes a detailed monitoring for the cluster node information based on the multi-cloud environment, can specify the nodes, and can compare the single data of each node in a graph mode so as to process specific faults, and a system administrator can set the automatic response of the system to the event through the event service, thereby realizing task distribution, load balancing, high availability, developing a friendly management interface and improving the safety and convenience of management.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A cluster management method based on a multi-cloud environment is characterized by comprising the following steps:
step 1: setting a development framework based on the operation requirements and main components of a micro-service operation framework and decoupling corresponding core codes from the micro-service operation framework capacity;
Step 2: designing a framework within the development framework;
and step 3: defining a micro service interface in the framework to complete the whole deployment;
and 4, step 4: and performing different cluster management operations based on the deployed structure.
2. The method of claim 1, wherein the development framework in step 1 is an RPC framework and service governance capacity based on the RPC framework.
3. The method of claim 2, wherein the service administration capabilities based on RPC framework include service registration discovery, fuse fault tolerance and flow control.
4. The method according to claim 1, wherein the technical base of the framework in step 2 is Spring, Spring Boot, ServiceComb, HSF, and Spring Cloud micro-service framework.
5. The method according to claim 1, wherein the different cluster management operations in step 4 include node joining, node leaving, node normal operation, node configuration, and thread synchronization.
6. The method according to claim 5, wherein the process of joining nodes specifically comprises: each node reads the configuration file of the node when starting, and sends the joining request message according to the period until receiving the joining confirmation messages of all other nodes.
7. The method according to claim 5, wherein the leaving of the node specifically comprises: monitoring the states of all nodes, monitoring through heartbeat messages sent by the opposite side, if the heartbeat messages of a certain node are not received in a set period, the node is considered to leave, and when the node is the leaving or failure of a backup node, the node is directly deleted from a node list, and when the node is the leaving or failure of a main node, a new main node is selected from the rest nodes again.
8. The method according to claim 5, wherein the normal operation process of the nodes specifically includes: a certain node sends heartbeat messages periodically to identify the existence of the node, and other nodes receive the heartbeat messages of the node periodically to jointly maintain a cluster node list.
9. The method according to claim 5, wherein the node configuration process specifically comprises: each node starts and reads the initialized self node and the message to be sent in the configuration file, and then adds the self after configuration into the cluster node list.
10. The method of claim 5, wherein the threads in the thread synchronization implementation comprise a gm _ listener thread, a heartbeat thread, an add _ flag thread, and a test thread, wherein:
the gm _ listener thread is used for monitoring the received multicast messages and performing corresponding processing;
the heartbeat thread is used for sending a join request message or a heartbeat message every other heartbeat cycle by inquiring the state;
the add _ flag thread is used for carrying out periodic subtraction operation on a flag variable flag for marking the state of each node;
and the test thread is used for periodically detecting whether the flag variable is less than 0 or not for the nodes in each list.
CN202010585865.4A 2020-06-24 2020-06-24 Cluster management method based on multi-cloud environment Active CN111865714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010585865.4A CN111865714B (en) 2020-06-24 2020-06-24 Cluster management method based on multi-cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010585865.4A CN111865714B (en) 2020-06-24 2020-06-24 Cluster management method based on multi-cloud environment

Publications (2)

Publication Number Publication Date
CN111865714A true CN111865714A (en) 2020-10-30
CN111865714B CN111865714B (en) 2022-08-02

Family

ID=72988464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010585865.4A Active CN111865714B (en) 2020-06-24 2020-06-24 Cluster management method based on multi-cloud environment

Country Status (1)

Country Link
CN (1) CN111865714B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607297A (en) * 2013-11-07 2014-02-26 上海爱数软件有限公司 Fault processing method of computer cluster system
CN103780497A (en) * 2013-12-30 2014-05-07 华中科技大学 Expandable distributed coordination service management method under cloud platform
CN104506357A (en) * 2014-12-22 2015-04-08 国云科技股份有限公司 High-usability cluster node management method
CN107688322A (en) * 2017-08-31 2018-02-13 天津中新智冠信息技术有限公司 A kind of containerization management system
CN108712464A (en) * 2018-04-13 2018-10-26 中国科学院信息工程研究所 A kind of implementation method towards cluster micro services High Availabitity
US20190034263A1 (en) * 2017-07-27 2019-01-31 International Business Machines Corporation Optimized incident management using hierarchical clusters of metrics
CN110661841A (en) * 2019-08-06 2020-01-07 江阴逐日信息科技有限公司 Data consistency method for distributed service discovery cluster in micro-service architecture
CN110958311A (en) * 2019-11-27 2020-04-03 北京大学 YARN-based shared cluster elastic expansion system and method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607297A (en) * 2013-11-07 2014-02-26 上海爱数软件有限公司 Fault processing method of computer cluster system
CN103780497A (en) * 2013-12-30 2014-05-07 华中科技大学 Expandable distributed coordination service management method under cloud platform
CN104506357A (en) * 2014-12-22 2015-04-08 国云科技股份有限公司 High-usability cluster node management method
US20190034263A1 (en) * 2017-07-27 2019-01-31 International Business Machines Corporation Optimized incident management using hierarchical clusters of metrics
CN107688322A (en) * 2017-08-31 2018-02-13 天津中新智冠信息技术有限公司 A kind of containerization management system
CN108712464A (en) * 2018-04-13 2018-10-26 中国科学院信息工程研究所 A kind of implementation method towards cluster micro services High Availabitity
CN110661841A (en) * 2019-08-06 2020-01-07 江阴逐日信息科技有限公司 Data consistency method for distributed service discovery cluster in micro-service architecture
CN110958311A (en) * 2019-11-27 2020-04-03 北京大学 YARN-based shared cluster elastic expansion system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张晶,黄小锋: "《一种基于微服务的应用框架》", 《计算机系统应用》 *

Also Published As

Publication number Publication date
CN111865714B (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN111522628B (en) Kubernetes cluster building deployment method, framework and storage medium based on OpenStack
WO2020147466A1 (en) Method for invoking server and proxy server
CN107590072B (en) Application development and test method and device
CN110262902B (en) Information processing method and system, medium, and computing device
US7137040B2 (en) Scalable method of continuous monitoring the remotely accessible resources against the node failures for very large clusters
US9723110B2 (en) System and method for supporting a proxy model for across-domain messaging in a transactional middleware machine environment
CN101207517B (en) Method for reliability maintenance of distributed enterprise service bus node
EP1493250A2 (en) Methods for communication in a multi-cluster network, device for connection to a network of clusters and bridge for connecting clusters
CN112394947A (en) Information system based on micro-service architecture
CN112698992B (en) Disaster recovery management method and related device for cloud cluster
CN112953982B (en) Service processing method, service configuration method and related device
US20070162478A1 (en) Method of achieving service configurability within telecommunication devices
CN114900449B (en) Resource information management method, system and device
CN112583630B (en) Device management method, device, system, device and storage medium
Dustdar et al. Dynamic replication and synchronization of web services for high availability in mobile ad-hoc networks
CN115499447A (en) Cluster master node confirmation method and device, electronic equipment and storage medium
CN111865714B (en) Cluster management method based on multi-cloud environment
CN116781564B (en) Network detection method, system, medium and electronic equipment of container cloud platform
CN112073499A (en) Dynamic service method of multi-machine type cloud physical server
CN115378944B (en) Network system, service grid configuration method, storage medium and electronic equipment
CN116708266A (en) Cloud service topological graph real-time updating method, device, equipment and medium
WO2022267688A1 (en) Method and apparatus for discovering standby smf, and electronic device and medium
CN107888491A (en) HSB standby systems and the AC double hot standby methods based on two layers of networking VRRP agreements
CN115550371B (en) Pod scheduling method and system based on Kubernetes and cloud platform
CN115314557B (en) Global cross-region service calling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant