CN110286921B

CN110286921B - CDH method for automatically installing distributed big data platform

Info

Publication number: CN110286921B
Application number: CN201910568151.XA
Authority: CN
Inventors: 刘洋; 沈磊; 李彦生; 郝建维; 张强; 安飞虎; 刘秋辉; 王嬛
Original assignee: Sichuan Zhongdian Aostar Information Technologies Co ltd; State Grid Information and Telecommunication Co Ltd
Current assignee: Sichuan Zhongdian Aostar Information Technologies Co ltd; State Grid Information and Telecommunication Co Ltd
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2023-11-10
Anticipated expiration: 2039-06-27
Also published as: CN110286921A

Abstract

The application discloses an automatic installation distributed big data platform CDH method, which specifically comprises the following steps: s10: preparing an environment; the method specifically comprises the following steps: checking the basic environment of each node server, and preparing to install resources; s20: script configuration; according to the configuration file requirement, the configuration of the basic information of all node servers and the user-defined information of the user is completed; s30: installing and deploying; configuring a cluster basic environment and a big data operation environment according to the operation flow of an execution script, wherein the method specifically comprises basic service configuration, NTP service establishment, SSH key-free login, java environment installation, CDH management console service installation and service state verification; s40: checking results; and logging in the CDH management console to check the result. The beneficial effects of the application are as follows: the application can effectively realize the automatic installation of the CDH management console; the application can make the large data cluster building process more efficient and reduce error rate.

Description

CDH method for automatically installing distributed big data platform

Technical Field

The application relates to the technical field of big data of computers, in particular to a CDH method for automatically installing a distributed big data platform.

Background

With the rapid development of the informatization age, the use of big data is more and more extensive, but due to the difference of installation system environments, a plurality of complex operations for installing a distributed big data platform are generated, and the more the number of clusters is, the higher the labor cost is.

Because the distributed big data platform needs to use the computer that can transversely expand, the cluster quantity is from 3 to hundreds, thousands all the more, and the more data, the higher the probability of manual error, the higher the cost of labor that consumes, and the time cost and the maloperation problem that need to solve manual operation to bring.

Disclosure of Invention

The application aims to provide an automatic CDH method for installing a distributed big data platform, which effectively reduces the error probability and labor cost and time cost.

The application is realized by the following technical scheme:

the CDH method for automatically installing the distributed big data platform specifically comprises the following steps:

s10: preparing an environment; the method specifically comprises the following steps: checking the basic environment of each node server, and preparing to install resources;

s20: script configuration; according to the configuration file requirement, the configuration of the basic information of all node servers and the user-defined information of the user is completed;

s30: installing and deploying; configuring a cluster basic environment and a big data operation environment according to the operation flow of an execution script, wherein the method specifically comprises basic service configuration, NTP service establishment, SSH key-free login, java environment installation, CDH management console service installation and service state verification;

s40: checking results; and logging in the CDH management console to check the result.

Further, in order to better implement the present application, in step S10, the checking the basic environment of each node server specifically includes:

step S11: judging whether the operating system version of each node server meets the requirements;

if not, the operating system needs to be reinstalled;

step S12: checking the system architecture of each node server operating system, and judging whether the operating system is the system architecture of x86_64;

step S13: checking the IP network of each node server operating system, and testing whether each node is communicated or not;

step S14: and checking configuration files, installation packages and execution scripts of each node server.

Further, in order to better implement the present application, the step S20 specifically includes:

step S21: checking IP addresses, SSH ports, root user passwords and custom information installed on a user big data platform of all node servers, and configuring the custom information into a deployment file;

if not, modifying the place which is not reasonably set aiming at the prompt;

step S22: checking whether configuration information of all node servers is correct or not and whether custom information configuration is reasonable or not;

if not, adjustments need to be made to the anomaly and irrational information.

Further, in order to better implement the present application, the step S30 specifically includes a stand-alone configuration installation and an online configuration installation: the stand-alone configuration installation specifically comprises the following steps:

step S31: setting the IP of each server as a static IP according to the corresponding relation of script configuration information by all nodes, and restarting NETWORK service after setting is completed;

step S32: setting the machine names of all servers as corresponding unified formats according to the corresponding relation of script configuration information and naming the corresponding unified formats according to the sequence;

step S33: modifying the hosts file by all the nodes according to the corresponding relation of the configuration file, and adding each node IP and the corresponding machine name;

step S34: each node can be communicated, and the phenomenon of packet loss is avoided, so that the normal of the network is ensured;

step S35: variable settings for all nodes: the method specifically comprises the following steps: modifying the values of all nodes swappiness and the value of the trans-parent_hugepage; all nodes are adjusted and configured to be started up and self-started up for modification;

step S36: closing firewalls and selinux of all nodes; and canceling the starting-up self-starting.

Further, for better implementing the present application, the on-line configuration installation specifically refers to: configuring cluster operation environments according to the existing flow, including NTP service, SSH key-free login, JAVA environment, installing CDH management console service, and verifying each service state operation after execution; the method specifically comprises the following steps:

step S311: configuring the running environments of all servers;

step S312: all node servers mutually trust; judging whether all nodes verify mutual trust successfully or not, and verifying whether configuration is successful or not through SSH to each node;

if not, checking the abnormal reason of the SSH state of the individual server, and checking the authority problem;

step S313: checking whether all nodes are Chinese time zones; if not, changing into the Chinese time zone;

step S314: all nodes configure yum sources of the system to manage the nodes as HTTP servers, and all other nodes are configured; verifying whether yum sources of all nodes are properly configured;

step S315: installing and constructing NTP services by all nodes, configuring time servers of all nodes as the same time server, and verifying whether the time servers are successful or not;

if not, checking the abnormality reasons of the individual servers, and checking whether the NTP component is failed to install or is not started;

step S316: installing mysql service;

step S317: installing a basic operation service component;

step S318: starting a service component; the method specifically comprises the following steps: starting CDH management console service according to the configuration management node, and adding a server into the starting self-starting process; verifying a service status based on the checking; and verifying the success of starting through log checking and port checking.

Further, in order to better implement the present application, the step S316 specifically includes:

installing a mysql database according to the configuration management node;

modifying a mysql configuration file according to the configuration management node;

setting mysql service as starting-up self-starting according to the configuration management node;

initializing mysql data, creating access users and authorizing access according to the configuration management node;

the access status of the mysql database was verified.

Further, in order to better implement the present application, the step S317 specifically includes:

installing java operation environments according to the configuration of all nodes;

installing mysql driving environment according to the configuration of all nodes;

installing CDH management console service according to the configuration management node;

and managing a metadata base used by the console service according to the configuration CDH.

Further, in order to better implement the present application, the step S14 specifically refers to:

judging whether the configuration file of the management node modifies IP users, passwords and modified machine names according to the current cluster machine condition;

judging whether the configuration file and the installation source exist under the management node directory.

Compared with the prior art, the application has the following advantages:

(1) The application can effectively realize the automatic installation of the CDH management console;

(2) The application can make the large data cluster building process more efficient and reduce error rate.

Detailed Description

The present application will be described in further detail with reference to examples, but embodiments of the present application are not limited thereto.

Example 1:

the application is realized by the following technical scheme, and the CDH method for automatically installing the distributed big data platform specifically comprises the following steps:

s10: preparing an environment; the method specifically comprises the following steps: checking the basic environment of each node server, and preparing to install resources; the node service and basic environment comprises system version, system bit number and network environment of each node server;

in the step S10, the checking of the basic environment of each node server specifically includes:

step S11: judging whether the operating system version of each node server meets the requirements; the preferred version of the operating system is the Red Hat Enterprise Linux 6/centOS6. X series version;

if not, the operating system needs to be reinstalled;

step S12: checking the system architecture of each node server operating system, and judging whether the operating system is the system architecture of x86_64; since the big data platform only supports the system architecture of x86_64, the uname-r command is used to check for this, checking if the output contains x86_64.

Step S13: checking the IP network of each node server operating system, and testing whether each node is communicated or not; if not, checking the network environment of the server;

step S14: checking configuration files, installation packages and execution scripts of each node server; the method specifically comprises the following steps:

Other portions of the present embodiment are the same as those of the above embodiment, and thus will not be described again.

Example 2:

the embodiment is further optimized based on the foregoing embodiment, and the step S20 specifically includes:

if not, modifying the place which is not reasonably set aiming at the prompt;

if not, adjustments need to be made to the anomaly and irrational information.

Example 3:

the embodiment is further optimized based on the foregoing embodiment, and the step S30 specifically includes a stand-alone configuration installation and an online configuration installation: the stand-alone configuration installation specifically means: the method for configuring the basic environment of all the servers specifically comprises the following steps:

step S31: setting the IP of each server as a static IP according to the corresponding relation of script configuration information by all nodes, and restarting NETWORK service after setting is completed; if not, checking the server network according to the abnormal condition of the individual nodes, and recovering the network again; if yes, go to step S32;

step S34: each node can be communicated, and the phenomenon of packet loss is avoided, so that the normal of the network is ensured; if the communication is impossible, checking the network condition of the individual servers and checking the hardware problem; if the communication is possible, the step S35 is entered;

step S35: variable settings for all nodes: the method specifically comprises the following steps: modifying the values of all nodes swappiness and the value of the trans-parent_hugepage; all nodes are adjusted and configured to be started up and self-started up for modification; the method specifically comprises the following steps:

modifying the values of all nodes swappiness, setting them to change the settings at runtime, and setting/proc/sys/vm/swappiness to 0;

all nodes adjust configuration to be booted up and self-start, and add/etc/sysctl. Conf to save the setting after restarting, and the setting will not lose cat >/etc/sysctl. Conf < < EOF vm swappiness=0 EOF after restarting;

all nodes adjust configuration, modify the value of the transparent_huge page, execute immediate effect

echo never > /sys/kernel/mm/transparent_hugepage/defrag；

All nodes are adjusted and configured to be started up and self-started, and a starting script is added to permanently take effect of echo >/sys/kernel/mm/transmission_huge page/defrag ">/etc/rc local;

step S36: closing firewalls and selinux of all nodes; cancelling starting up self-starting; the method specifically comprises the following steps:

firewall closing all nodes service iptables save

service iptables stop ；

chkconfig iptables off ；

service ip6tables save ；

service ip6tables stop；

chkconfig ip6tables off；

Preferably, the method further comprises:

the selinux of all nodes is turned off,

setenforce 0 sed -i.bak 's/SELINUX=enforcing/SELINUX=disabled/' /etc/sysconfig/selinux；

sed -i.bak 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config；

executing reboot restarting by all nodes;

Example 4:

the embodiment further optimizes the above embodiments, and the on-line configuration installation specifically refers to: configuring cluster operation environments according to the existing flow, including NTP service, SSH key-free login, JAVA environment, installing CDH management console service, and verifying each service state operation after execution; the method specifically comprises the following steps:

step S311: configuring the running environments of all servers;

step S312: all node servers mutually trust; the method specifically comprises the following steps: copying private keys of all nodes to each server, wherein reference commands are SSH-copy-id-i root@bigdata-a-001;

each server described herein includes a local machine, and when making a private key copy, a hostname-IP correspondence needs to be configured in/etc/hosts.

Judging whether all nodes verify mutual trust successfully or not, and verifying whether configuration is successful or not through SSH to each node;

if not, checking the abnormal reason of the SSH state of the individual server, and checking the authority problem; if the configuration is successful, step S313 is entered;

step S313: checking whether all nodes are Chinese time zones; if not, changing into the Chinese time zone; the method specifically comprises the following steps:

checking whether all nodes are Chinese time zones, checking whether returned results contain CST through date;

if not, changing into a Chinese time zone ln-sf/usr/share/zoneinfo/Asia/Shangghai/etc/localname;

all nodes verify yum status, checking if the output is normal by using yum search to verify if yum source configuration is normal;

if not, checking whether the configuration repo file under the yum source yum. Repo. D folder is wrong;

step S316: installing mysql service; specifically comprises

Installing a mysql database according to the configuration management node;

modifying a mysql configuration file according to the configuration management node; the mysql configuration file comprises mysql port numbers, setting neglect cases and character set codes;

initializing mysql data, creating access users and authorizing access according to the configuration management node; verifying the access state of the mysql database; the method specifically comprises the following steps: the management node initializing database comprises creating a CDH management console metadata database, creating a remote access user according to a user-defined remote login user name, and authorizing remote access;

all nodes verify the database connection state;

if not, checking the abnormal reasons of the individual servers, checking whether the database is started or not, or failing the remote access authorization of the user;

if yes, all nodes are connected with the java environment and the installation database, wherein yum-y sample oracle-j2sdk1.7mysql-connector-java is installed;

if not, checking the abnormal reasons of the individual servers, and checking whether the execution permission problem exists;

if yes, the management node installs CDH management console service;

if not, checking the abnormality reason of the management node server, and checking whether the problem of missing dependence exists;

if yes, the management node configures a metadata base used by the CDH management console service;

if not, checking a management node server mysql database, and checking whether the database connection is abnormal;

step S317: installing a basic operation service component;

The management node server starts CDH management console service;

if not, checking an abnormal log of the management node server to check whether the database connection is abnormal;

if yes, the starting success is verified through log checking and port checking.

Example 5:

the present embodiment further optimizes on the basis of the above embodiment, and most of the servers are currently LINUX servers, which support SHELL scripts, and the present embodiment provides a method for automatically installing a distributed big data platform CDH based on SHELL scripts under the LINUX servers, where the "SHELL scripts" technology is a programming script, is a computer program and a text file, and the content is composed of a series of SHELL commands, and operates after transliteration of the content via UNIX SHELL. Is designed as a script language, the operation mode is equivalent to that of an interpreted language, the UNIX SHELL plays the role of a command line interpreter, SHELL commands in the SHELL script are sequentially run after the SHELL script is read, and then the result is output. System management, file operation and the like can be performed by utilizing the shell script.

S10, preparing the environment;

the server is required to have installed Red Hat Enterprise Linux/centos6. X operating system and the operating system bit number must be x86_64 bits, the system IP network is normal, the firewall has been closed, and the RPM install package of yum source has been uploaded to the management node.

In the specific implementation process, the following workflow is required to be completed in step S10 "environment preparation", the Red Hat Enterprise Linux/centos6. X operating system with x86_64 bits is installed, the firewall is closed by manually executing a command, whether the network is normal is checked, all nodes can access one another through SSH, and the RPM installation package with yum sources is uploaded to the specified directory of the management node server.

S20, configuring a script;

the modification of the configuration file is required to be completed according to the conditions of all the current nodes and remarks of the configuration file.

In the specific execution process, step S20 "configure script" needs to complete the following workflow, according to the actual situation of all nodes and the remarks of the configuration file, fill in the new static IP address and machine name which need to be changed by the current IP, root password, port number and big data platform customization of each node into the configuration script file, and upload the configuration file and execution script into the management node designated directory.

S30, installing and deploying;

in the specific execution process, step S30 needs to execute a script at the management node, which obtains the basic information of the current server from the configuration file, verifies the basic environment of the system, prompts modification if an anomaly is encountered, and after the adjustment of the personnel to be maintained is completed, the script continues to execute, configures static IP and modifies machine names according to the flow, and after the completion, installs and configures the cluster operation environment according to the existing flow, including NTP service, SSH key-free login, JAVA environment, CDH management console service installation, and the like, and verifies the status after the execution is completed.

S40, checking a result, and logging in a CDH management console interface checking state;

in the embodiment of the present application, step S40 "result check", it is necessary to log on the CDH management console and verify the status thereof.

In the specific execution process, step S40 only needs to log in the CDH management console to check whether the service is started or not and whether the verification state is normal or not, and the process of automatically installing the distributed big data platform CDH is completed.

The foregoing description is only a preferred embodiment of the present application, and is not intended to limit the present application in any way, and any simple modification, equivalent variation, etc. of the above embodiment according to the technical matter of the present application fall within the scope of the present application.

Claims

1. An automatic installation distributed big data platform CDH method is characterized in that: the method specifically comprises the following steps:

step S10: preparing environment: checking the basic environment of each node server, and preparing to install resources;

step S20: script configuration: configuring basic information of all node servers and user-defined information of users according to the requirements of the configuration files;

step S30: and (3) installation and deployment: configuring a cluster basic environment and a big data operation environment according to the operation flow of an execution script, wherein the method specifically comprises basic service configuration, NTP service establishment, SSH key-free login, java environment configuration, CDH management console service installation and service state verification;

step S40: results inspection: logging in a CDH management console to check results;

the step S30 specifically includes stand-alone configuration installation and online configuration installation:

the stand-alone configuration installation specifically comprises the following steps:

step S35: variable settings for all nodes: modifying the values of all nodes swappiness and the value of the trans-parent_hugepage; all nodes are adjusted and configured to be started up and self-started up for modification;

step S36: closing firewalls and selinux of all nodes; cancelling starting up self-starting;

the on-line configuration installation specifically means that: configuring a cluster operation environment according to the existing flow, wherein the cluster operation environment comprises NTP service, SSH key-free login, JAVA environment and CDH management console service installation, and verifying each service state operation after execution is finished; the method specifically comprises the following steps:

step S311: configuring the running environments of all servers;

step S314: all nodes configure yum sources of the system to manage the nodes as HTTP servers; verifying whether yum sources of all nodes are properly configured;

step S316: installing mysql service;

step S317: installing a basic operation service component;

step S318: starting a service component: starting CDH management console service according to the configuration management node, and adding a server into the starting self-starting process; and according to the checking and verifying service state, the verification is successful through log checking and port checking.

2. An automated installation distributed big data platform CDH method according to claim 1, wherein: in the step S10, the basic environment of each node server is checked, which specifically includes the following steps:

if not, the operating system needs to be reinstalled;

3. An automated installation distributed big data platform CDH method according to claim 2, wherein: the step S20 specifically includes the following steps:

step S21: checking whether the IP addresses, SSH ports, root user passwords and custom information installed on a user big data platform of all node servers are configured in a deployment file;

if not, modifying the place which is not reasonably set aiming at the prompt;

if not, adjustments need to be made to the anomaly and irrational information.

4. An automated installation distributed big data platform CDH method according to claim 1, wherein: the step S316 specifically includes:

installing a mysql database according to the configuration management node;

the access status of the mysql database was verified.

5. An automated installation distributed big data platform CDH method according to claim 4, wherein: the step S317 specifically includes:

installing java operation environments on all nodes according to the configuration;

installing mysql drive environments at all nodes according to the configuration;

installing CDH management console services at the management node according to the configuration;

a metadata base for use by the CDH management console service is configured.

6. An automated installation distributed big data platform CDH method according to claim 5, wherein: the step S314 specifically includes:

judging whether the configuration file of the management node modifies IP users, passwords and machine names according to the current cluster machine condition;