CN107463582B

CN107463582B - Distributed Hadoop cluster deployment method and device

Info

Publication number: CN107463582B
Application number: CN201610395969.2A
Authority: CN
Inventors: 高林林
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2016-06-03
Filing date: 2016-06-03
Publication date: 2021-11-12
Anticipated expiration: 2036-06-03
Also published as: WO2017206667A1; CN107463582A

Abstract

The invention provides a distributed Hadoop cluster deployment method and a distributed Hadoop cluster deployment device, wherein the method comprises the following steps: receiving template information for deploying a Hadoop cluster, wherein the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster; acquiring parameter information of one or more hosts of a Hadoop cluster according to host information, wherein each host is used for deploying one or more components, and the components are deployed by an agent and used for executing corresponding tasks; and deploying the tasks for one or more components according to the task information and the parameter information. The invention solves the problems of complex operation and long deployment time caused by artificial deployment of the Hadoop cluster in the related art.

Description

Distributed Hadoop cluster deployment method and device

Technical Field

The invention relates to the field of communication, in particular to a distributed Hadoop cluster deployment method and device.

Background

The Hadoop of the related art is a distributed system infrastructure, developed by the Apache foundation, and is not an abbreviation but an imaginary name, purportedly possibly related to a toy name of the child of the group creator, with no practical meaning. Hadoop is a software platform and an open source software framework for developing and operating large-scale data, distributed computation of mass data in a cluster formed by a large number of computers is achieved, a user can develop distributed programs without knowing details of a distributed bottom layer, and power of the cluster is fully utilized for high-speed computation and storage.

In the related technology, management personnel for distributed deployment of the Hadoop cluster need to know about the Hadoop ecosphere and hardware resource conditions of all hosts in the cluster, and high requirements are provided for the management personnel for deployment of the Hadoop cluster, and errors are easy to occur. The manual configuration of the Hadoop cluster is adopted, so that the steps are complex, the efficiency is low, and especially under the large-scale Hadoop cluster environment, the elastic management such as dynamic capacity expansion, capacity contraction and the like is difficult.

However, the current system for implementing Hadoop automated deployment has the following problems:

before deploying a Hadoop cluster, designing a Hadoop cluster network topological structure according to cluster environment software and hardware information and deployed components; the scheme has higher requirements on cluster management personnel, and the cluster management personnel are required to be familiar with environmental software and hardware information and Hadoop ecosphere; under the condition that cluster management personnel do not intervene, the automatic deployment system randomly allocates nodes such as Master and Slave, and can not reasonably allocate and utilize cluster hardware and system load information;

the Hadoop cluster component version package has a single download source, so that the deployment time of the Hadoop cluster is not controllable.

Hadoop cluster deployment puts higher requirements on operation and maintenance personnel, and the personnel need to be familiar with a Hadoop ecological circle; the resource information of each node in the cluster is known; designing a Hadoop cluster network topology; 2. the Hadoop cluster component nodes are randomly distributed; 3. hadoop clusters are deployed for a long time.

In view of the above problems in the related art, no effective solution has been found at present.

Disclosure of Invention

The embodiment of the invention provides a distributed Hadoop cluster deploying method and device, which are used for at least solving the problems of complex operation and long deploying time caused by artificial Hadoop cluster deploying in the related art.

According to an embodiment of the invention, a method for distributed deployment of a Hadoop cluster is provided, which includes: receiving template information for deploying a Hadoop cluster, wherein the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster; acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, wherein each host is used for deploying one or more components, and the components are deployed by an agent and used for executing corresponding tasks; and deploying tasks for one or more components according to the task information and the parameter information.

Optionally, the parameter information includes at least one of: the system comprises host operating system information, host network information, host CPU information, host memory information, host CPU utilization rate, host memory utilization rate, host disk IO utilization rate, host network delay, host average IO operation waiting time, host disk information and process information of components in the host.

Optionally, deploying tasks for one or more components in the Hadoop cluster according to the task information and the parameter information includes: generating a deployment task list according to the task information and the parameter information, wherein the deployment task list comprises the task information, the parameter information required by the task execution and the priority of the task; and selecting the task with the highest priority from the deployment task list and sending the task to the corresponding component.

Optionally, the priority is related to an attribute of the task and/or the parameter information for executing the task.

Optionally, after deploying a task to one or more of the components according to the template information and the parameter information, the method further includes: monitoring task execution progress and/or log information of the one or more components.

Optionally, the template information includes at least one of: the system comprises the number of Hadoop cluster hosts, Hadoop cluster component information to be deployed, the number of Hadoop distributed file system HDFS copies, the connection number and timeout time of each component client of the Hadoop cluster, a host network address, a host user name and a password, log storage disk information, data storage disk information and metadata storage disk information.

Optionally, after receiving template information for deploying a Hadoop cluster, the method further includes: and analyzing the template information and verifying the legality of the template information.

According to another embodiment of the present invention, an apparatus for distributed deployment of a Hadoop cluster is provided, including: the system comprises a receiving module and a processing module, wherein the receiving module is used for receiving template information for deploying a Hadoop cluster, the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster; the system comprises an acquisition module, a task execution module and a task execution module, wherein the acquisition module is used for acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, each host comprises one or more components, and the components are deployed by an agent and used for executing corresponding tasks; and the deployment module is used for deploying tasks for one or more components according to the task information and the parameter information.

Optionally, the deployment module further comprises: the generating unit is used for generating a deployment task list according to the task information and the parameter information, wherein the deployment task list comprises the task information, the parameter information required by the task execution and the priority of the task; and the selection unit is used for selecting the task with the highest priority from the deployment task list and issuing the task to the corresponding component.

Optionally, the apparatus further comprises: and the monitoring module is used for monitoring the task execution progress and/or log information of one or more components after the deployment module deploys tasks on one or more components according to the template information and the parameter information.

According to still another embodiment of the present invention, there is also provided a storage medium. The storage medium is configured to store program code for performing the steps of:

receiving template information for deploying a Hadoop cluster, wherein the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster;

acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, wherein each host comprises one or more components, and the components are used for executing corresponding tasks;

and deploying tasks for one or more components according to the task information and the parameter information.

According to the method and the device, template information for deploying the Hadoop cluster is received, wherein the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster; acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, wherein each host is used for deploying one or more components, and the components are deployed by an agent and used for executing corresponding tasks; and deploying tasks for one or more components according to the task information and the parameter information. The task information and the host information are received, and the load conditions of the hosts and the components are acquired by acquiring the parameter information, so that the tasks can be reasonably deployed for the hosts and the components of the Hadoop cluster, and the problems of complex operation and long deployment time caused by artificial deployment of the Hadoop cluster in the related art can be solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a general structural framework diagram of a distributed deployment Hadoop cluster according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a method of distributed deployment of a Hadoop cluster according to an embodiment of the invention;

FIG. 3 is a block diagram of an apparatus for distributed deployment of Hadoop clusters according to an embodiment of the present invention;

FIG. 4 is a block diagram of an alternative architecture of a distributed Hadoop cluster deployment apparatus according to an embodiment of the present invention;

FIG. 5 is a block diagram of an alternative architecture of a distributed Hadoop cluster deployment apparatus according to an embodiment of the present invention;

FIG. 6 is a structural framework diagram of an agent in the distributed Hadoop deployment cluster system according to the embodiment;

FIG. 7 is a deployment flow of the agent in the initial state of the present embodiment;

FIG. 8 is a flowchart of a Hadoop cluster deployment method according to this embodiment;

fig. 9 is a timing diagram of the Hadoop cluster deployment method of the embodiment.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

The embodiment of the present application may be executed on a network architecture shown in fig. 1, where fig. 1 is a general structural framework diagram of a distributed deployment Hadoop cluster according to the embodiment of the present invention, and as shown in fig. 1, the network architecture includes: the Hadoop cluster management system comprises all functional modules and executing agent nodes, the Hadoop cluster also comprises a plurality of scattered agent nodes for executing tasks, and the deployment system is in communication connection with the Hadoop cluster.

In this embodiment, a distributed Hadoop cluster deployment method that operates in the management system for deploying Hadoop clusters is provided, and fig. 2 is a flowchart of the distributed Hadoop cluster deployment method according to the embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

step S202, receiving template information for deploying a Hadoop cluster, wherein the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster;

step S204, acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, wherein each host is used for deploying one or more components, and the components are deployed by an agent and used for executing corresponding tasks; optionally, the deployment task is performed by an agent.

And step S206, deploying tasks for one or more components according to the task information and the parameter information.

Receiving template information for deploying a Hadoop cluster, wherein the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster; acquiring parameter information of one or more hosts of a Hadoop cluster according to host information, wherein each host is used for deploying one or more components, and the components are deployed by an agent and used for executing corresponding tasks; and deploying the tasks for one or more components according to the task information and the parameter information. The task information and the host information are received, and the load conditions of the hosts and the components are acquired by acquiring the parameter information, so that the tasks can be reasonably deployed for the hosts and the components of the Hadoop cluster, and the problems of complex operation and long deployment time caused by artificial deployment of the Hadoop cluster in the related art can be solved.

Optionally, the execution subject of the above steps may be a control end, a client end, and the like of the Hadoop cluster, but is not limited thereto.

Optionally, the parameter information may be, but is not limited to: host operating system information, host network information, host CPU information (such as core number and dominant frequency size), host memory information, host CPU utilization rate, host memory utilization rate, host disk IO utilization rate, host network delay, host average IO operation waiting time, host disk information and process information of components in the host.

Optionally, the template information may be, but is not limited to: the system comprises the number of Hadoop cluster hosts, Hadoop cluster component information to be deployed, the number of Hadoop distributed file system HDFS copies, the connection number and timeout time of each component client of the Hadoop cluster, a host network address, a host user name and a password, log storage disk information, data storage disk information and metadata storage disk information.

In an optional implementation manner according to this embodiment, deploying a task to one or more components in the Hadoop cluster according to the task information and the parameter information includes:

s11, generating a deployment task list according to the task information and the parameter information, wherein the deployment task list comprises the task information, the parameter information required by task execution and the priority of the task;

and S12, selecting the task with the highest priority from the deployment task list and sending the task to the corresponding component. Optionally, the priority is related to attributes of the task and/or parameter information of the executing task.

Optionally, after deploying the task to the one or more components according to the template information and the parameter information, the method further includes:

task execution progress and/or log information of one or more components is monitored.

Optionally, after receiving template information for deploying a Hadoop cluster, the method further includes: and analyzing the template information and verifying the legality of the template information. In case the template information is legal, the subsequent steps are only performed. The legal deployment template includes at least, but is not limited to, the following: the number of Hadoop cluster nodes, Hadoop cluster component information to be deployed, the number of HDFS copies, the number of client connections and timeout time of each component of the Hadoop cluster, a host network address, a user name and a password, a log storage disk, a data storage disk, a metadata storage disk and the like.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

The embodiment also provides a device for distributed deployment of a Hadoop cluster, where the device is used to implement the foregoing embodiments and preferred embodiments, and the description of the device that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 3 is a block diagram of an apparatus for distributed deployment of Hadoop clusters according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes:

the receiving module 30 is configured to receive template information for deploying a Hadoop cluster, where the template information is used to indicate task information and host information of the Hadoop cluster, and the task information is used to describe a task that needs to be completed by the Hadoop cluster;

the acquisition module 32 is used for acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, wherein each host comprises one or more components, and the components are deployed by an agent and used for executing corresponding tasks;

and the deployment module 34 is used for deploying the tasks for one or more components according to the task information and the parameter information.

Fig. 4 is a block diagram of an alternative structure of a distributed Hadoop cluster deployment apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes, in addition to all modules shown in fig. 3, a deployment module 34 further including:

the generating unit 40 is configured to generate a deployment task list according to the task information and the parameter information, where the deployment task list includes the task information, the parameter information required for executing the task, and a priority of the task;

and the selecting unit 42 is configured to select a task with the highest priority from the deployment task list and send the task to the corresponding component.

Fig. 5 is a block diagram of an alternative structure of a distributed Hadoop cluster deployment apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus includes, in addition to all modules shown in fig. 3: and the monitoring module 50 is used for monitoring the task execution progress and/or log information of one or more components after the deployment module deploys tasks on the one or more components according to the template information and the parameter information.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

This embodiment is an alternative embodiment according to the present invention for specifically explaining and explaining the present application in detail:

the embodiment provides a distributed Hadoop cluster deploying method and system. The defects that requirements for deployment Hadoop cluster management personnel are high, Hadoop cluster component nodes are randomly distributed, and the download source of the installation package is single are overcome. The invention realizes one-click distributed deployment of the Hadoop cluster by fully utilizing hardware resources in the cluster and the load condition of each host.

A distributed deployment Hadoop cluster system of this embodiment includes the following components, as shown in fig. 1, of the framework, including:

a template analyzer: deployment templates include, but are not limited to, the following: the system comprises a host network address, a user name, a password, Hadoop component information, node number information and mounting disk information. The template analyzer analyzes the template information input by the user and verifies the legality.

A monitor: the monitor is responsible for receiving Hadoop component deployment task execution conditions and log processing sent by the agent.

A collector: the collector is responsible for receiving and persisting host information (including but not limited to operating system information, CPU information, memory information, network information, CPU utilization rate, memory utilization rate, disk IO utilization rate, network delay and the like) sent by the agent.

A task generator: and the task generator generates a Hadoop component deployment task list according to the host information and the deployment template information acquired by the collector.

And (3) a task scheduler: and the task scheduler selects a high-priority deployment task to be issued to the agent according to the host information, the host load condition and the deployment task list acquired by the collector.

The agent: the agent comprises a collector, a deployer, a parameter configurator, a monitor and the like. The collector is responsible for collecting host information at regular time and sending the host information to the collector of the system; the deployer receives and executes the tasks issued by the task scheduler; the parameter configurator is responsible for configuring configuration files of all components of Hadoop; the monitor is responsible for monitoring the execution condition of the deployment task and collecting logs, and fig. 6 is a structural framework diagram of the agent in the distributed deployment Hadoop cluster system in the embodiment, as shown in fig. 6.

Fig. 7 is a deployment flow of the agent in the initial state of this embodiment, and as shown in fig. 7, the distributed Hadoop cluster deployment method of this embodiment includes the following steps:

initialization deployment system

When the system is started, a monitor, a collector and an agent in the distributed deployment Hadoop cluster system are initialized, and a deployment template submitted by a user is prepared to be received.

Deploying agents

The agent deployment task is generated by the task generator and the task is scheduled to be executed by the task scheduler. And after the agent is deployed, the collector collects the node resource information at regular time and feeds the node resource information back to the management system.

Hadoop cluster deployment template submitted by user

And filling Hadoop cluster information to be deployed by a user according to requirements of a deployment template, and submitting the deployment template.

Parsing Hadoop cluster deployment template

A monitor of the distributed deployment Hadoop cluster system receives a deployment template submitted by a user, and an analyzer analyzes the Hadoop cluster deployment template and verifies the validity of the template.

And generating a Hadoop cluster network topological graph by the topology generator according to the deployment template and the resource information submitted by the user.

Generating Hadoop cluster component deployment tasks

And generating a component deployment task by the task generator according to the Hadoop cluster network topological graph structure.

Task scheduler executing deployment tasks

The task scheduler takes out the deployment task to be executed and the resource information of each node from the task list to generate a task sequence to be executed; and the task scheduler sequentially takes out the deployment tasks with high priority and issues the deployment tasks to the corresponding agents.

Performing deployment tasks

After the host agent receives the deployment task, the deployment task is executed by the deployment device; and the monitor of the agent feeds back the execution progress of the deployment task to the monitor of the deployment system in real time, and the monitor informs the task scheduler to continue scheduling the execution of the task. And repeating the step of executing the deployment tasks by the task scheduler until all the tasks to be deployed are executed.

According to the characteristics of each component of the Hadoop cluster, the nodes of the Hadoop cluster components are reasonably distributed by combining cluster resources; and dynamically distributing deployment tasks according to the acquired host load condition in the deployment process, thereby realizing one-key distributed deployment of the Hadoop cluster. The invention effectively overcomes the defects of complex deployment of large-scale Hadoop clusters, long deployment time, high pressure of a deployment system and the like.

Fig. 8 is a flowchart of the Hadoop cluster deployment method of the embodiment, as shown in fig. 8, and fig. 9 is a timing diagram of the Hadoop cluster deployment method of the embodiment, as shown in fig. 9, in combination with fig. 8 and 9, the embodiment includes:

initializing a system: when the distributed deployment Hadoop cluster system is started, the system needs to be initialized, and the system comprises an initialization monitor, a collector, an agent A1 and the like.

Agent deployment: the agent A1 executes the task of the deployment agent A2 in the first deployment, and after the deployment of the agent A2 is completed, the agent A2 is initialized and started; deployment agent A3, a4 tasks are then performed by agents a1, a2, and so on, until all host agents within the cluster are deployed (see fig. 7).

101. The user submits a deployment template: after the distributed deployment Hadoop cluster system is initialized, a user can submit a deployment template meeting conditions to the system. The legal deployment template includes at least, but is not limited to, the following: the number of Hadoop cluster nodes, Hadoop cluster component information to be deployed, the number of HDFS copies, the number of client connections and timeout time of each component of the Hadoop cluster, a host network address, a user name and a password, a log storage disk, a data storage disk, a metadata storage disk and the like.

102. After receiving the deployment template information, the template analyzer firstly checks the legality of the template, and if the template does not meet the agreed requirements, the deployment is finished; and if the template is analyzed by the template rule, generating a Hadoop cluster networking topological graph by the topological graph generator.

103. And according to the node resources, the deployment principle of each component of the Hadoop cluster and the deployment template information, generating a Hadoop cluster networking topological graph by the topological graph generator (such as S1). The Hadoop cluster component deployment principles include, but are not limited to, the following: 1. distributing Master and Slave nodes of the Hadoop assembly according to hardware resources and host load conditions; 2. calculating and distributing the number of ZOOKEEPER nodes according to the number of the nodes in the cluster; 3. and calculating the quantity of the journal nodes according to the quantity of the HDFS nodes and distributing. Hadoop component deployment tasks include, but are not limited to, the following information: component name (e.g., HDFS), node name (e.g., NameNode), host network address, task priority, etc.

104. The topology map generated by the topology map generator is stored.

105. And the deployment task generator generates a deployment task according to the Hadoop cluster networking topological graph.

106. And storing the deployment task list generated by the deployment task generator.

107. The task scheduler scans the deployment task list, takes out the deployment tasks which are not executed yet from the task list, calculates host loads (mainly examining average loads, memory utilization rates, disk IO utilization rates and network delay indexes) in the cluster according to the node resource information, and generates a deployment task sequence according to the priority (such as S4).

108. And the task scheduler selects the deployment tasks with high priority in sequence and issues the deployment tasks to the agents of the corresponding hosts. When the task of deploying the Hadoop component is executed for the first time, the agent a1 deploys a Hadoop cluster component deployment task of the agent a2, and the monitor of the agent a1 monitors the execution condition of the deployment task and feeds back the execution condition to the monitor of the deployment system (e.g., S10). After the monitor receives the completion situation of the execution of the deployment task, the task scheduler regenerates the task sequence according to the task list and the resource information (e.g., S5), the task scheduler selects the high-priority tasks T3 and T4, and the agents A1 and A2 deploy the tasks to the agents A3 and A4, and so on (e.g., S11 and S14). Ideally, when the t-th time (t is larger than 0), the whole cluster has 2t-1 agents executing the task of deploying the Hadoop component. Of course, each agent can start multiple threads and deploy Hadoop component tasks to multiple (for example, 2) agents, and in an ideal case, at the t-th time (t is greater than 0), 3t-1 agents of the whole Hadoop cluster execute the Hadoop component deployment tasks.

109. Agent a1 in conjunction with a distributed deployment Hadoop cluster management system.

110. And the agent is deployed at each host node in the Hadoop cluster.

Configuration generation: and the parameter configuration task completes the configuration generation of each component of the Hadoop cluster. The scheduler needs to collect deployment information (for example, host names of nodes where masters and Slave are located, log storage disks, data storage disks, metadata storage disks, and other information) of each component of the whole Hadoop cluster, and issues the deployment information together with the parameter configuration task to the parameter configurator in each host agent component. And after all the parameter configuration tasks in the cluster are executed, completing the deployment of all the components of the whole Hadoop cluster.

201. The collector in the agent component collects the hardware resources and the running state information of the host at regular time, reports the information to the collector in the deployment system and stores the node resources. The hardware resource and the operation state information include, but are not limited to, operating system information, host name, CPU information, memory information, disk, process information, CPU utilization, memory utilization, disk IO utilization, network information, average IO operation latency, and the like.

202. And storing the information of each node resource (including information of the host and the Hadoop component) collected by the monitor collector.

Example 4

The embodiment of the invention also provides a storage medium. Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:

s1, receiving template information for deploying a Hadoop cluster, wherein the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster;

s2, acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, wherein each host is used for deploying one or more components, and the components are deployed by an agent and used for executing corresponding tasks;

and S3, deploying the task for one or more components according to the task information and the parameter information.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Optionally, in this embodiment, the processor executes and receives template information for deploying a Hadoop cluster according to a program code stored in a storage medium, where the template information is used to indicate task information and host information of the Hadoop cluster, and the task information is used to describe a task that needs to be completed by the Hadoop cluster;

optionally, in this embodiment, the processor executes, according to program codes stored in the storage medium, acquiring parameter information of one or more hosts of the Hadoop cluster according to host information, where each host is used to deploy one or more components, and the components are deployed by the agent and used to execute corresponding tasks;

optionally, in this embodiment, the processor executes the deployment task for the one or more components according to the task information and the parameter information according to the program code stored in the storage medium.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for distributed deployment of Hadoop clusters is characterized by comprising the following steps:

acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, wherein the parameter information comprises load information of the hosts, each host is used for deploying one or more components, and the components are deployed by an agent and used for executing corresponding tasks;

deploying tasks for one or more of the components according to the task information and the parameter information, including:

distributing Master and Slave nodes of the Hadoop assembly according to hardware resources and host load conditions;

calculating and distributing the number of ZOOKEEPER nodes according to the number of the nodes in the cluster;

and calculating the quantity of the journal nodes according to the quantity of the HDFS nodes and distributing.

2. The method of claim 1, wherein the parameter information comprises at least one of: the system comprises host operating system information, host network information, host CPU information, host memory information, host CPU utilization rate, host memory utilization rate, host disk IO utilization rate, host network delay, host average IO operation waiting time, host disk information and process information of components in the host.

3. The method of claim 1, wherein deploying tasks for one or more components within the Hadoop cluster according to the task information and the parameter information comprises:

generating a deployment task list according to the task information and the parameter information, wherein the deployment task list comprises the task information, the parameter information required by the task execution and the priority of the task;

and selecting the task with the highest priority from the deployment task list and sending the task to the corresponding component.

4. A method according to claim 3, characterized in that said priority is related to properties of said task and/or said parameter information for executing said task.

5. The method of claim 1, wherein after deploying tasks for one or more of the components based on the template information and the parameter information, the method further comprises:

monitoring task execution progress and/or log information of the one or more components.

6. The method of claim 1, wherein the template information comprises at least one of: the system comprises the number of Hadoop cluster hosts, Hadoop cluster component information to be deployed, the number of Hadoop distributed file system HDFS copies, the connection number and timeout time of each component client of the Hadoop cluster, a host network address, a host user name and a password, log storage disk information, data storage disk information and metadata storage disk information.

7. The method of claim 1, wherein after receiving template information for deploying a Hadoop cluster, the method further comprises:

and analyzing the template information and verifying the legality of the template information.

8. A distributed Hadoop cluster deployment device, comprising:

the system comprises a receiving module and a processing module, wherein the receiving module is used for receiving template information for deploying a Hadoop cluster, the template information is used for indicating task information and host information of the Hadoop cluster, and the task information is used for describing tasks needing to be completed by the Hadoop cluster;

the system comprises an acquisition module, a task execution module and a task execution module, wherein the acquisition module is used for acquiring parameter information of one or more hosts of the Hadoop cluster according to the host information, the parameter information comprises load information of the hosts, each host comprises one or more components, and the components are deployed by an agent and used for executing corresponding tasks;

a deployment module, configured to deploy a task to one or more of the components according to the task information and the parameter information, including:

9. The apparatus of claim 8, wherein the deployment module further comprises:

the generating unit is used for generating a deployment task list according to the task information and the parameter information, wherein the deployment task list comprises the task information, the parameter information required by the task execution and the priority of the task;

and the selection unit is used for selecting the task with the highest priority from the deployment task list and issuing the task to the corresponding component.

10. The apparatus of claim 9, further comprising:

and the monitoring module is used for monitoring the task execution progress and/or log information of one or more components after the deployment module deploys tasks on one or more components according to the template information and the parameter information.