CN106100894A

CN106100894A - A kind of highly reliable cluster operation management method

Info

Publication number: CN106100894A
Application number: CN201610542731.8A
Authority: CN
Inventors: 向友君; 张莉婷; 吴宗泽; 张勰; 蔡旭坤; 李凯鑫; 苏春晨
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2016-07-11
Filing date: 2016-07-11
Publication date: 2016-11-09
Anticipated expiration: 2036-07-11
Also published as: CN106100894B

Abstract

The invention discloses a kind of highly reliable cluster operation management platform method, specifically include: the web of (1) highly reliable cluster management and control order accesses the scheduling with http form and issues: build cluster O&M web-based management platform, realize remotely management and the visualized management of cluster, load-balancing technique is passed through from Access Layer, dispatch layer, middle control layer, redundancy fault-tolerant, it is achieved the reliability of cluster O&M web-based management；(2) transmission of highly reliable cluster management and control order with issue: in data transmission procedure, AES, RC4 algorithm is used respectively transmission data to be encrypted with AES key, by ssh tunnel transmission after data base64 coding after encryption, it is achieved the data reliability of cluster operation management.(3) execution of highly reliable cluster management and control order and feedback: build expansible cluster O&M central authorities O&M control system, support various configurations Governance framework, support User Defined Configuration Framework, it is achieved the middle control reliability of cluster operation management.

Description

A kind of highly reliable cluster operation management method

Technical field

The present invention relates to the technical field of IT operation management, particularly to a kind of highly reliable cluster operation management method.

Background technology

High speed development along with Internet technology and the competing product with type of service emerge in an endless stream, and user is to service quality Require tightened up.In the face of the pressure from user, Internet firm has been usually taken distributed type assemblies deployment services, utilizes it high Performance, high reliability, high scalability solve the challenge that this is huge.With distributed type assemblies popularization, distributed type assemblies Internal correlation is complicated, and cluster management increasingly becomes the sane service key core of offer, becomes academia with the research of engineering circles One of hot issue.If manually building deployment cluster environment by operation maintenance personnel, manage service configuration, not only inefficiency, Reliability is low, and is difficult to migrate extension, is not easy to management.

The deployment workload brought to solve cluster scale to expand increases severely, configuration variance between heterogeneous server main frame, collection The configuration management of group rings border and extension, it is necessary to design new cluster operation management mode to carry out large-scale cluster automatization fortune Dimension.Cluster operation management method specifically should contain automatization's deployment, Host Status monitoring, Portable Batch System, machine configuration pipe The functions such as reason, log audit.

Summary of the invention

It is an object of the invention to the shortcoming overcoming prior art with not enough, it is provided that a kind of highly reliable cluster operation management Method.The method, on the redundancy disaster tolerance technology basis of load balancing, designs also in conjunction with configuration management framework SaltStack Realize safe and reliable cluster operation management method.A simple and effective manager is provided for middle-size and small-size scale cluster management Case, it is achieved safe and reliable remote-control cluster.According to hosted environment automatic deployment, reduce the artificial mistake disposed and cause, shorten Deployment time, improve comprehensively and dispose efficiency, and the mechanism persistently managing service configuration for a long time is provided.

The purpose of the present invention is achieved through the following technical solutions:

A kind of highly reliable cluster operation management method, described method comprises the following steps:

S1, the web of highly reliable cluster management and control order access the scheduling with http form and issue, based on LVS+Keepalive Load technology builds the HTTP server of two-node cluster hot backup, supports that artificial hot-swap, fault automatically switch, based on Nginx+ Tornado network frame technology builds cluster operation management Web platform, Nginx realize load balancing and reverse proxy；

S2, highly reliable cluster management and control order transmission with issue, management and control data transmission time encrypted number by aes algorithm respectively According to, RC4 encryption key, transmitted by SSH secure tunnel after being encoded by base64, be suitable in Tornado network frame The central O&M control system of RPYC telecommunication technique management；

S3, the execution of highly reliable cluster management and control order and feedback, central authorities' O&M control system compatibility various configurations framework, tool Body includes Satlstack, Func, and supports to custom-configure framework, and Saltstack platform realizes carrying out clustered node main frame Management and control.

Further, described step S1, the web of highly reliable cluster management and control order access the scheduling with http form and issue bag Include:

S1.1, configuration LVS, it is achieved build cluster operation management platform Access Layer, it is achieved the load balancing of Access Layer；Configuration Keepalive builds the two-node cluster hot backup of cluster operation platform Access Layer, and amendment Keepalive key configuration also designs shell foot This realizes semi-artificial automatic switchover principal and subordinate HTTP server；

S1.2, configuration Nginx build cluster operation platform dispatch layer HTTP server, revise Nginx reverse proxy part Key configuration, it is achieved the load balancing of rear end Web server and request scheduling；Design tornado program builds cluster O&M Platform Web server layer, based on MVC exploitation Web server administration interface with service logic.

Further, described step S2, the transmission of highly reliable cluster management and control order include with issuing:

S2.1, tcp data segment use AES, RC4, base64 mode that data are encrypted coding；

Set up SSH trusting relationship between S2.2, cluster operation management platform and central authorities' O&M control system, pacified by SSH Full tunnel transmission encrypted data.

Further, described step S3, the execution of highly reliable cluster management and control order include with feedback:

S3.1, cluster service node deployment salt-minion, func-minion client, revise key configuration, to taking The cluster central authorities O&M control system built up sends certificate；

S3.2, central authorities' O&M control system manage the certificate accepting all trusted node of cluster internal, it is achieved to all letters Appoint the management and control of node, and the execution result of management and control order feeds back to upstream Web.

Further, described step S1, the web of highly reliable cluster management and control order access in the scheduling issue with http form Build LVS+Keepalive two-node cluster hot backup module, Nginx Http direction scheduler module, Tornado Web service degradation scheduling Pattern, the multi-level Load Balancing Model that above-mentioned three's simultaneous is formed.

Further, described cluster operation platform uses AES+ with the data communication mode of described central O&M control system The RPYC remote scheduling mode of RC4 AES and base64 coded system, stochastic generation session key and by the safe tunnel of SSH Road transmits.

Further, described step S3, the execution of highly reliable cluster management and control order realize with Saltstack platform in feedback Clustered node main frame is carried out management and control specifically include: remote command calls, automatization of service deployment, service configuration management, service Performance monitoring, log audit.

Further, described step S3, the execution of highly reliable cluster management and control order use Saltstack to constitute in feedback Automatization's deployment, data acquisition monitoring, service configuration management, wherein said automatization disposes to use based on yaml form joins Put file to manage concentratedly.

Further, described step S3, the execution of highly reliable cluster management and control order control with central authorities' O&M described in feedback System compatible volume Configuration Framework includes Satlstack, Func.

The present invention has such advantages as relative to prior art and effect:

(1) this paper presents multilamellar Load Balancing Model, both avoided unit overload causing trouble, and ensured again cluster system The redundancy disaster tolerance of system, ensure that the high reliability of O&M Visualization Platform.

(2) this paper presents multi-platform distributed central control system model, by multi-platform each other for road by the way of ensure base Plinth O&M function highly reliable, it is ensured that the high reliability of O&M central control system.

(3) there is employed herein the model of multi-enciphering, encryption tunnel, it is to avoid management and control data being transmitted across at untrusted network Journey is ravesdropping, distorts, it is ensured that data communications security reliability in operational system.

Accompanying drawing explanation

Fig. 1 is the process step figure of the cluster operation management of the inventive method；

Fig. 2 is the flow chart that the inventive method realizes cluster operation management safety.

Detailed description of the invention

For making the purpose of the present invention, technical scheme and advantage clearer, clear and definite, develop simultaneously embodiment pair referring to the drawings The present invention further describes.Should be appreciated that specific embodiment described herein, and need not only in order to explain the present invention In limiting the present invention.

Embodiment one

Refer to the process step figure that Fig. 1, Fig. 1 are cluster operation managements in the present embodiment.Highly reliable collection shown in Fig. 1 Group's operation management method, specifically includes following steps:

S1, the web of highly reliable cluster management and control order access the scheduling with http form and issue, based on LVS+Keepalive Load technology builds the HTTP server of two-node cluster hot backup, supports that artificial hot-swap, fault automatically switch, based on Nginx+ Tornado network frame technology builds cluster operation management Web platform, Nginx realize load balancing and reverse proxy.

This step specifically includes:

In this step S1, the LVS+Keepalive technology composition of employing can the highly reliable Access Layer of artificial hot-swap, have Effect accesses；Load balancing and reverse proxy is realized, it then follows after request is uniformly distributed to by the dispatching principle of local first by Nginx The web services of end, carries out visualization and issues.

The scheduling issue that the web of described highly reliable cluster management and control order accesses with http form includes building LVS+ Keepalive two-node cluster hot backup module, Nginx Http direction scheduler module, Tornado Web service degradation scheduling method.Wherein The multi-level Load Balancing Model that three's simultaneous is formed, emphasis solves the integrity problem in cluster O&M method.

S2, highly reliable cluster management and control order transmission with issue, management and control data transmission time encrypted number by aes algorithm respectively According to, RC4 encryption key, transmitted by SSH secure tunnel after being encoded by base64, be suitable in Tornado network frame The central O&M control system of RPYC telecommunication technique management.

This step specifically includes:

In this step S2, cluster operation platform is that RYPC remotely adjusts with the data communication mode of central authorities' O&M control system With.The data of transmission are encrypted by AES, RC4, base64, and carry out safe transmission by SSH secure tunnel.

The transmission of described highly reliable cluster management and control order with issue middle employing AES+RC4 AES and base64 coding staff The RPYC remote scheduling mode of formula, stochastic generation session key and being transmitted by SSH secure tunnel, safe and reliable can realize cluster Management and control data are transmitted.

Emphasis solves the safety issue in O&M method.

S3, the execution of highly reliable cluster management and control order and feedback, central authorities' O&M control system compatibility various configurations framework, tool Body includes Satlstack, Func, and supports to custom-configure framework, and Saltstack platform realizes carrying out clustered node main frame Management and control, specifically includes: remote command calls, automatization of service deployment, service configuration management, service performance monitoring, log audit.

This step specifically includes:

Wherein, central authorities' O&M control system compatible various configurations Governance framework design, and the various of cluster O&M are provided Basic management function；Execution and the execution result thereof of management and control order feed back to upstream Web.

Perform and the feedback of described highly reliable cluster management and control order mainly have employed the Automation that Saltstack is constituted The module compositions such as administration, data acquisition monitoring, service configuration management, automatization's deployment module mainly uses based on yaml form joining Put file to manage concentratedly.

Embodiment two

The present embodiment specifically gives the implementation process of a kind of highly reliable cluster operation management method, specifically comprises the following steps that

1) basic environment is disposed.

According to cluster operational system master-plan, build prototype system herein and be divided into O&M Web platform and the central authorities of this locality Two sub-networks of O&M control system.Gateway gateway function is to LVS virtual IP address by external public network address IP port mapping On；WebNode function O&M Web platform service node, is deployed on local physical host, and the system that minimizes is by two WebNode main frame achieves load balancing layer and all functions of O&M Web platform；ControlNode refers to control in O&M System Service Node, ClusterNode refers to group system internal service node.

2) access load layer to dispose.

First source code is installed the Keepalived service software of latest edition and carries out simple environment configurations.Then create Build Keepalived global configuration file/etc/Keepalived/Keepalived.conf, be broadly divided into the automatic mistake of VRRP Lose switching (vrrp_instance) and Vitural Server load balancing (virtual_server) two parts.

The major function of the Nginx_check.sh script in configuration is every 10s detection Nginx service, if Nginx loses Effect is then restarted.If it is unavailable to restart unsuccessfully explanation native service, then stopping the machine Keepalived, switching flow is to another On main frame, it is to avoid invalid traffic.When arranging Virtual Server, the main frame weight of configuration this locality is 2, it is ensured that request is preferential Forward this locality, it is possible to effectively reduce unnecessary network traffics.

3) Nginx reverse proxy is disposed.

Installation and deployment Nginx on WebNode1, WebNode2 server, establishment/etc/ after configuration software running environment Nginx/Nginx.conf file.The principle forwarded according to local first, arranging local load weight is 2.

4) operation layer is disposed.

The mode that Tornado uses one process single-threaded starts, and WebNode1, WebNode2 server is all opened port and divided Be not three threads of 8886～8888, wherein 8886,8887 respectively from different O&Ms control machine communication, 8888 as standby host line Journey, when all thread states are that busy is just used.Nginx will request according to upstream loading rule after receiving HTTP request The concrete business module of Tornado giving rear end processes.

5) RPYC server disposition.O&M central control system is the tie connecting O&M Web platform with cluster service node, main The function wanted is the access of O&M management and control order and forwards execution, real by RPYC server and service configuration management platform two parts Existing.

RPYC server is O&M central control system access dispatching functional module based on the exploitation of far call agreement RPYC, The member method of exposed_XX defined in Server class, then can be realized remotely by root method attribute at clinet end Call.

6) service configuration management Platform deployment.Salt-Master service, ClusterNode portion is disposed at ControlNode Administration's Salt-Minion service, amendment service profiles such as node identities, node IP, node grains information etc., then leading Salt-Master visa Slat-Minion certificate.Then, use Rsync synchronizing software same between multiple stage Slat-Master main frame Step Master main frame common configuration, can realize SaltStack Configuration Framework basic environment.

In sum, the present invention starts with from the technical scheme that investigation industry is common, the most appropriate for cluster operation management The key issue of solution: O&M Web platform high reliability, O&M central control system high reliability, management and control data transmission security can By property, provide corresponding solution, propose on this basis multilamellar load balancing, many cluster configuration management platform height can By cluster operational system framework, and the mode of multiple symmetric cryptography is used to solve operational system Communication Security Problem.

The embodiment of the present invention first passes through LVS technology and provides external Virtual Service and access dispatching, uses Keepalived+ Nginx builds the HTTP reverse proxy layer of two-shipper duplex, optimizes system resource profit while improving O&M Web platform reliability By rate；Secondly, in conjunction with the O&M thought of service degradation, service layer is pressed level priority degree service is provided, add further Strong system reliability, it is to avoid unit overload causing trouble, it is achieved the high reliability of O&M Web platform.

Platform is managed as system reserve, the manual switching when StackSalt platform fault, it is ensured that basis by Func Management and control module highly reliable.Saltstack platform achieves the management and control module of operation management, deployment module, monitoring module, uses Many Salt-Master distributed deployment, solves single-point problem and improves service performance, it is achieved that O&M central control system highly reliable Property.

On O&M Web platform with O&M central control system telecommunication problem, calculate herein in conjunction with RC4, AES symmetric cryptography Method, is individually created encryption key at each conversation procedure, reduces the possibility that encryption is cracked.Meanwhile, SSH secure tunnel skill is introduced Art encrypted transmission passage, further ensures that data are transmitted safe and reliable.

By to the load dispatch of system, highly reliable, systemic-function completes etc., and various dimensions are tested, and verify proposed by the invention Scheme can properly settle key issue:

(1) balance dispatching of multilamellar load module can make the Access Layer node of system be in the working method of multimachine multiplexing, The most effectively achieve access load balancing, be greatly promoted the resource utilization of system simultaneously relative to hot standby working method；

(2) redundancy disaster tolerance, unsuccessfully automatic switchover, the O&M theory of service degradation are combined during O&M Web Platform Designing, On the one hand solve Single Point of Faliure problem, on the other hand ensure when fault occurs, high excellent reliability of service；

(3) O&M central control system uses distributed many Salt-Master to dispose, in conjunction with the modularized design of O&M function, Provide redundancy disaster tolerance, solve the high concurrent problem effectively solving clustered node management.

Above-described embodiment is the present invention preferably embodiment, but embodiments of the present invention are not by above-described embodiment Limit, the change made under other any spirit without departing from the present invention and principle, modify, substitute, combine, simplify, All should be the substitute mode of equivalence, within being included in protection scope of the present invention.

Claims

1. a highly reliable cluster operation management method, it is characterised in that described method comprises the following steps:

S1, the web of highly reliable cluster management and control order access the scheduling with http form and issue, and load based on LVS+Keepalive Technology builds the HTTP server of two-node cluster hot backup, supports that artificial hot-swap, fault automatically switch, based on Nginx+Tornado net Network framework technology builds cluster operation management Web platform, Nginx realize load balancing and reverse proxy；

S2, highly reliable cluster management and control order transmission with issue, management and control data transmission time respectively by aes algorithm encryption data, RC4 Encryption key, is transmitted by SSH secure tunnel after being encoded by base64, is suitable for RPYC long-range in Tornado network frame The central O&M control system of communication technology management；

S3, the execution of highly reliable cluster management and control order and feedback, central authorities' O&M control system compatibility various configurations framework, specifically wraps Including Satlstack, Func, and support to custom-configure framework, Saltstack platform realizes managing clustered node main frame Control.

A kind of highly reliable cluster operation management method the most according to claim 1, it is characterised in that described step S1, The scheduling issue that the web of highly reliable cluster management and control order accesses with http form includes:

S1.2, configuration Nginx build cluster operation platform dispatch layer HTTP server, the pass of amendment Nginx reverse proxy part Key configures, it is achieved the load balancing of rear end Web server and request scheduling；Design tornado program builds cluster operation platform Web server layer, based on MVC exploitation Web server administration interface with service logic.

A kind of highly reliable cluster operation management method the most according to claim 1, it is characterised in that described step S2, The transmission of highly reliable cluster management and control order includes with issuing:

SSH trusting relationship is set up, by the safe tunnel of SSH between S2.2, cluster operation management platform and central authorities' O&M control system Road transmission encrypted data.

A kind of highly reliable cluster operation management method the most according to claim 1, it is characterised in that described step S3, The execution of highly reliable cluster management and control order includes with feedback:

S3.1, cluster service node deployment salt-minion, func-minion client, revise key configuration, to putting up Cluster central authorities O&M control systems send certificate；

S3.2, central authorities' O&M control system manage the certificate accepting all trusted node of cluster internal, it is achieved trust joint to all The management and control of point, and the execution result of management and control order feeds back to upstream Web.

A kind of highly reliable cluster operation management method the most according to claim 1, it is characterised in that described step S1, The web of highly reliable cluster management and control order accesses during the scheduling with http form is issued and builds LVS+Keepalive two-node cluster hot backup mould Block, Nginx Http direction scheduler module, Tornado Web service degradation scheduling method, above-mentioned three's simultaneous is formed many Level Load Balancing Model.

A kind of highly reliable cluster operation management method the most according to claim 3, it is characterised in that described cluster O&M Platform uses AES+RC4 AES and base64 coded system with the data communication mode of described central authorities O&M control system RPYC remote scheduling mode, stochastic generation session key and being transmitted by SSH secure tunnel.

A kind of highly reliable cluster operation management method the most according to claim 1, it is characterised in that described step S3, With Saltstack platform in feedback, the execution of highly reliable cluster management and control order realizes that clustered node main frame is carried out management and control and specifically wraps Include: remote command calls, automatization of service deployment, service configuration management, service performance monitoring, log audit.

A kind of highly reliable cluster operation management method the most according to claim 1, it is characterised in that described step S3, The execution of highly reliable cluster management and control order and feedback use automatization's deployment of Saltstack composition, data acquisition monitoring, clothes Business configuration management, wherein said automatization disposes and uses configuration file based on yaml form to manage concentratedly.

A kind of highly reliable cluster operation management method the most according to claim 1, it is characterised in that described step S3, The execution of highly reliable cluster management and control order includes with central authorities' O&M control system compatibility volume Configuration Framework described in feedback Satlstack、Func。